What is Multi-Modal Selection?
This feature allows you to collect and store multiple forms of data—text, images, screen recordings, audio, and browser tab content—within your workspace. It helps document meetings, provide visual context, and share insights efficiently.
How It Helps
- Integrates multiple data sources in one workspace.
- Supports AI-powered analysis across different formats.
- Improves insights by combining visual and text-based inputs.
How to Use
Step 1: Open Vision Tray from cortx interface. Choose from:

- Screen Capture – Take a screenshot of your screen or a specific window.

- Photo Capture – Take a picture using your device camera.

- Screen Recording – Record screen activity for demos or tutorials, with optional audio.

- Audio or Video Recording – Record quick verbal or visual messages.


- Tab Monitoring – Select a browser tab to capture and track web content continuously.
- Transcription – Lets you record audio discussions and convert them into actionable text in real time. It also generates summaries from transcriptions, making it ideal for meetings, brainstorming, or structured documentation.
In cortx there are two options to transcribe any conversation:
- Draft Mode – Captures and displays the conversation in real time but does not send messages automatically. Users can review and send messages manually.
- Send Mode – Actively listens and sends transcribed chat bubbles in real time as the user speaks, enabling a continuous, hands-free experience.

- Transcription Tray
The Transcription Tray is your central hub for active transcriptions. By clicking on it, you can instantly see all conversations currently being transcribed. From here, you have several convenient controls:
- Mute and Unmute: Control audio input for transcriptions directly from the tray.
- Quick Summaries: Need a quick overview without navigating away? Simply click the three-dot menu next to any conversation in the tray and select “Summarize” to get a quick summary. This feature-rich Transcription Tray provides a comprehensive view and control over all your active transcription within cortx.

Step 2: Select a combination of input types (images, PDFs, voice, etc.).
Step 3: cortx processes and extracts insights from the selected data.
Step 4: Link the processed data to a document, task, or workflow.