Voice Capture & The Orb
Kowork features a powerful voice-to-text system called "The Orb" - a floating recording interface that appears instantly when you hold the Fn key (or use the global shortcut Cmd+.).
Overview
The Orb provides:
- Hold-to-record interface - press and hold Fn key (or Cmd+.) to record, release to transcribe
- Local transcription with Whisper - privacy-first, no cloud API required
- AI enhancement - optional LLM post-processing with transcription profiles
- Auto-paste - transcribed text automatically inserts into active application
- Transcription history - all recordings saved to ~/.kollabor/transcription-history.json
- NSPanel on macOS - works above fullscreen apps
How to Use
Hold-to-Record Mode (Default)
- Press and hold the
Fnkey (orCmd+.) - The Orb appears at your cursor position
- Speak your message while holding the key
- Release the key when finished
- Transcription happens locally using Whisper
- Text automatically pastes into the active app
Sticky Mode (Double-Tap)
- Double-tap the
Fnkey (within 500ms) - The Orb stays open after recording
- Click the mic button to start/stop recording
- Record multiple takes without closing the Orb
- Press
Escapeor click close when done
Keyboard Shortcuts
| Shortcut | Action | Mode |
|---|---|---|
Fn (hold) | Start recording, stop on release | Global |
Fn (double-tap) | Toggle sticky mode | Global |
Cmd+. | Alternative to Fn key (configurable) | Global |
Space | Toggle recording (when Orb is open) | Orb |
Escape | Close Orb or profile selector | Orb |
Note: Global shortcuts can be configured in Settings > Audio > Global Shortcuts.
Whisper Models
Kollabor uses OpenAI Whisper via whisper-rs (Rust bindings to whisper.cpp) for local transcription. Models are downloaded from Hugging Face and stored in ~/.kollabor/whisper-models/.
| Model | Size | Accuracy | Speed | File |
|---|---|---|---|---|
tiny | ~75 MB | Basic | Very Fast | ggml-tiny.bin |
base | ~142 MB | Good | Fast | ggml-base.bin |
small | ~466 MB | Better | Moderate | ggml-small.bin |
medium | ~1.5 GB | Great | Slower | ggml-medium.bin |
large | ~3.1 GB | Best | Slowest | ggml-large-v3.bin |
Default: small provides the best balance of accuracy and speed for most users.
Models can be downloaded and configured in Settings > Audio > Whisper Settings.
Architecture
The Orb system consists of several components working together:
Frontend (Vue 3)
OrbStyleRouter.vue- Main orb window componentuseOrbCore.ts- Recording state composableuseVoiceRecording.ts- Unified recording interfaceWaveformVisualizer.vue- Real-time audio visualization
Backend (Rust/Tauri)
fn_key_monitor.rs- Fn key detection (macOS)whisper.rs- Whisper transcription enginepanel.rs- NSPanel configurationcommands/transcription.rs- IPC handlers
Recording Flow:
--------------
User holds Fn key
|
v
NSEvent (macOS) detects keyCode 63
|
v
fn_key_monitor.rs emits "fn-key-pressed"
|
v
main.rs creates NSPanel window at cursor
|
v
OrbStyleRouter mounts, useOrbCore.startRecording()
|
v
MediaRecorder captures audio (WebM/Opus)
|
v
User releases Fn key
|
v
Blob -> ArrayBuffer -> transcribeAudio(buffer)
|
v
whisper.rs transcribes using whisper-rs
|
v
(optional) LLM enhancement with profile prompt
|
v
Auto-paste to active application
|
v
Save to transcription-history.jsonNSPanel (macOS Fullscreen Support)
On macOS, the Orb uses NSPanel instead of a regular window. This is required for the Orb to appear above fullscreen applications.
// src-tauri/src/panel.rs
tauri_panel! {
panel!(RecordingOrb {
config: {
can_become_key_window: true,
is_floating_panel: true
}
})
}NSPanel configuration in main.rs:
NonactivatingPanel- doesn't steal focus from active appFullScreenAuxiliary- appears above fullscreen windowsMoveToActiveSpace- follows user across desktopsLevel 1000- high z-order
Audio Recording Pipeline
The audio recording uses a dual-approach system:
Browser MediaRecorder (Frontend)
Used for real-time visualization and orb UI. Records WebM/Opus audio.
- Format:
audio/webm;codecs=opus - Audio constraints: echo cancellation, noise suppression, auto gain
- Chunks collected during recording, combined into Blob on stop
Tauri Native Recording (Backend)
Optional native recording using cpal/rodio. WAV format saved to recordings directory.
- Library:
cpal(cross-platform audio I/O) - Format: 16-bit PCM WAV
- Storage:
~/.kollabor/recordings/
Local Transcription with Whisper
Transcription happens entirely locally using whisper-rs (Rust bindings to whisper.cpp). No cloud API required.
// src-tauri/src/whisper.rs
// Load Whisper model (cached in memory)
let ctx = WhisperContext::new_with_params(model_path, params)?;
// Decode WebM/Opus to PCM samples
let decoded = decode_webm_opus(buffer)?;
// Resample to 16kHz mono
let resampled = resample(&mono_samples, sample_rate, 16000);
// Run transcription
let mut state = ctx.create_state()?;
state.full(params, &resampled)?;
// Extract text from segments
for segment in state.full_n_segments() {
full_text.push_str(&segment.text);
}Supported audio formats:
- WebM/Opus (browser MediaRecorder output)
- MP4/AAC (Safari MediaRecorder output)
- WAV (16-bit PCM)
- Raw PCM (fallback)
AI Enhancement
After transcription, the raw text can be optionally enhanced using an LLM with customizable transcription profiles.
Transcription Profiles
Profiles contain a system prompt that defines how the LLM should process the transcription.
{
"id": "minimal",
"name": "Minimal",
"prompt": "Fix spelling and grammar only. Return ONLY the corrected text.",
"builtin": true
}Enhancement Flow
- Whisper produces raw transcription
- If LLM enabled, call
processAIVoicewith profile prompt - LLM returns enhanced text (grammar fixes, formatting, etc.)
- Enhanced text is pasted instead of raw transcription
Toggle AI Enhancement: Click the sparkles icon in the Orb, or configure in Settings > Audio > Orb Settings.
Transcription History
All transcriptions are automatically saved to a local JSON file with metadata.
// Location: ~/.kollabor/transcription-history.json
{
"history": [
{
"id": "trans-1234567890",
"text": "Enhanced transcription text",
"originalText": "Raw whisper transcription",
"enhanced": true,
"timestamp": 1699564800000,
"date": "Friday, November 10, 2023",
"time": "3:30 PM",
"clipboardContent": "...",
"type": "transcription"
}
]
}History is accessible in the app sidebar under "Transcriptions" and can be searched, filtered, and exported.
IPC Commands:
save_transcription_history(entry)- Save new entryload_transcription_history()- Load all entriesclear_transcription_history()- Clear all entriesadd_transcription_entry(entry)- Add entry and emit event
Auto-Paste Feature
Transcribed text automatically inserts into the active application using native system APIs.
Paste Methods
- Primary: Clipboard + Cmd+V simulation (macOS)
- Fallback: Clipboard only (user must paste manually)
Status Messages
Pasting...- Attempting auto-pasteCopied! Press Cmd+V- Fallback mode, manual paste requiredReady- Paste successful, orb hidden
Accessibility Permissions (macOS)
Fn key monitoring and auto-paste require macOS Accessibility permissions.
// src-tauri/src/fn_key_monitor.rs
// Check permissions using macOS API
extern "C" {
fn AXIsProcessTrusted() -> bool;
}
let trusted = unsafe { AXIsProcessTrusted() };How to grant permissions:
- Open System Settings
- Navigate to Privacy & Security > Accessibility
- Add Kollabor to the list and enable the checkbox
- Restart Kollabor if already running
IPC Commands:
check_fn_key_permissions()- Check current statusopen_fn_key_permissions_settings()- Open System Settings
Configuration
Orb settings are stored in ~/.kollabor/config.json:
{
"ui": {
"orb": {
"style": "sphere",
"width": 250,
"height": 250,
"autoStartRecording": true
}
},
"audio": {
"deviceId": "Built-in Microphone",
"echoCancellation": true,
"noiseSuppression": true,
"autoGainControl": true
},
"whisper": {
"model": "small"
},
"llm": {
"enabled": true
},
"transcriptionProfiles": {
"minimal": {
"id": "minimal",
"name": "Minimal",
"prompt": "Fix spelling and grammar only.",
"builtin": true
}
},
"activeTranscriptionProfile": "minimal"
}All settings can be modified in the Settings UI or by editing the JSON file directly.