Voice Capture & The Orb

Kowork features a powerful voice-to-text system called "The Orb" - a floating recording interface that appears instantly when you hold the Fn key (or use the global shortcut Cmd+.).

Overview

The Orb provides:

Hold-to-record interface - press and hold Fn key (or Cmd+.) to record, release to transcribe
Local transcription with Whisper - privacy-first, no cloud API required
AI enhancement - optional LLM post-processing with transcription profiles
Auto-paste - transcribed text automatically inserts into active application
Transcription history - all recordings saved to ~/.kollabor/transcription-history.json
NSPanel on macOS - works above fullscreen apps

How to Use

Hold-to-Record Mode (Default)

Press and hold the Fn key (or Cmd+.)
The Orb appears at your cursor position
Speak your message while holding the key
Release the key when finished
Transcription happens locally using Whisper
Text automatically pastes into the active app

Sticky Mode (Double-Tap)

Double-tap the Fn key (within 500ms)
The Orb stays open after recording
Click the mic button to start/stop recording
Record multiple takes without closing the Orb
Press Escape or click close when done

Keyboard Shortcuts

Shortcut	Action	Mode
`Fn` (hold)	Start recording, stop on release	Global
`Fn` (double-tap)	Toggle sticky mode	Global
`Cmd+.`	Alternative to Fn key (configurable)	Global
`Space`	Toggle recording (when Orb is open)	Orb
`Escape`	Close Orb or profile selector	Orb

Note: Global shortcuts can be configured in Settings > Audio > Global Shortcuts.

Whisper Models

Kollabor uses OpenAI Whisper via whisper-rs (Rust bindings to whisper.cpp) for local transcription. Models are downloaded from Hugging Face and stored in ~/.kollabor/whisper-models/.

Model	Size	Accuracy	Speed	File
`tiny`	~75 MB	Basic	Very Fast	ggml-tiny.bin
`base`	~142 MB	Good	Fast	ggml-base.bin
`small`	~466 MB	Better	Moderate	ggml-small.bin
`medium`	~1.5 GB	Great	Slower	ggml-medium.bin
`large`	~3.1 GB	Best	Slowest	ggml-large-v3.bin

Default: small provides the best balance of accuracy and speed for most users.

Models can be downloaded and configured in Settings > Audio > Whisper Settings.

Architecture

The Orb system consists of several components working together:

Frontend (Vue 3)

OrbStyleRouter.vue - Main orb window component
useOrbCore.ts - Recording state composable
useVoiceRecording.ts - Unified recording interface
WaveformVisualizer.vue - Real-time audio visualization

Backend (Rust/Tauri)

fn_key_monitor.rs - Fn key detection (macOS)
whisper.rs - Whisper transcription engine
panel.rs - NSPanel configuration
commands/transcription.rs - IPC handlers

Recording Flow:
--------------
User holds Fn key
    |
    v
NSEvent (macOS) detects keyCode 63
    |
    v
fn_key_monitor.rs emits "fn-key-pressed"
    |
    v
main.rs creates NSPanel window at cursor
    |
    v
OrbStyleRouter mounts, useOrbCore.startRecording()
    |
    v
MediaRecorder captures audio (WebM/Opus)
    |
    v
User releases Fn key
    |
    v
Blob -> ArrayBuffer -> transcribeAudio(buffer)
    |
    v
whisper.rs transcribes using whisper-rs
    |
    v
(optional) LLM enhancement with profile prompt
    |
    v
Auto-paste to active application
    |
    v
Save to transcription-history.json

NSPanel (macOS Fullscreen Support)

On macOS, the Orb uses NSPanel instead of a regular window. This is required for the Orb to appear above fullscreen applications.

// src-tauri/src/panel.rs
tauri_panel! {
    panel!(RecordingOrb {
        config: {
            can_become_key_window: true,
            is_floating_panel: true
        }
    })
}

NSPanel configuration in main.rs:

NonactivatingPanel - doesn't steal focus from active app
FullScreenAuxiliary - appears above fullscreen windows
MoveToActiveSpace - follows user across desktops
Level 1000 - high z-order

Audio Recording Pipeline

The audio recording uses a dual-approach system:

Browser MediaRecorder (Frontend)

Used for real-time visualization and orb UI. Records WebM/Opus audio.

Format: audio/webm;codecs=opus
Audio constraints: echo cancellation, noise suppression, auto gain
Chunks collected during recording, combined into Blob on stop

Tauri Native Recording (Backend)

Optional native recording using cpal/rodio. WAV format saved to recordings directory.

Library: cpal (cross-platform audio I/O)
Format: 16-bit PCM WAV
Storage: ~/.kollabor/recordings/

Local Transcription with Whisper

Transcription happens entirely locally using whisper-rs (Rust bindings to whisper.cpp). No cloud API required.

// src-tauri/src/whisper.rs

// Load Whisper model (cached in memory)
let ctx = WhisperContext::new_with_params(model_path, params)?;

// Decode WebM/Opus to PCM samples
let decoded = decode_webm_opus(buffer)?;

// Resample to 16kHz mono
let resampled = resample(&mono_samples, sample_rate, 16000);

// Run transcription
let mut state = ctx.create_state()?;
state.full(params, &resampled)?;

// Extract text from segments
for segment in state.full_n_segments() {
    full_text.push_str(&segment.text);
}

Supported audio formats:

WebM/Opus (browser MediaRecorder output)
MP4/AAC (Safari MediaRecorder output)
WAV (16-bit PCM)
Raw PCM (fallback)

AI Enhancement

After transcription, the raw text can be optionally enhanced using an LLM with customizable transcription profiles.

Transcription Profiles

Profiles contain a system prompt that defines how the LLM should process the transcription.

{
  "id": "minimal",
  "name": "Minimal",
  "prompt": "Fix spelling and grammar only. Return ONLY the corrected text.",
  "builtin": true
}

Enhancement Flow

Whisper produces raw transcription
If LLM enabled, call processAIVoice with profile prompt
LLM returns enhanced text (grammar fixes, formatting, etc.)
Enhanced text is pasted instead of raw transcription

Toggle AI Enhancement: Click the sparkles icon in the Orb, or configure in Settings > Audio > Orb Settings.

Transcription History

All transcriptions are automatically saved to a local JSON file with metadata.

// Location: ~/.kollabor/transcription-history.json

{
  "history": [
    {
      "id": "trans-1234567890",
      "text": "Enhanced transcription text",
      "originalText": "Raw whisper transcription",
      "enhanced": true,
      "timestamp": 1699564800000,
      "date": "Friday, November 10, 2023",
      "time": "3:30 PM",
      "clipboardContent": "...",
      "type": "transcription"
    }
  ]
}

History is accessible in the app sidebar under "Transcriptions" and can be searched, filtered, and exported.

IPC Commands:

save_transcription_history(entry) - Save new entry
load_transcription_history() - Load all entries
clear_transcription_history() - Clear all entries
add_transcription_entry(entry) - Add entry and emit event

Auto-Paste Feature

Transcribed text automatically inserts into the active application using native system APIs.

Paste Methods

Primary: Clipboard + Cmd+V simulation (macOS)
Fallback: Clipboard only (user must paste manually)

Status Messages

Pasting... - Attempting auto-paste
Copied! Press Cmd+V - Fallback mode, manual paste required
Ready - Paste successful, orb hidden

Accessibility Permissions (macOS)

Fn key monitoring and auto-paste require macOS Accessibility permissions.

// src-tauri/src/fn_key_monitor.rs

// Check permissions using macOS API
extern "C" {
    fn AXIsProcessTrusted() -> bool;
}

let trusted = unsafe { AXIsProcessTrusted() };

How to grant permissions:

Open System Settings
Navigate to Privacy & Security > Accessibility
Add Kollabor to the list and enable the checkbox
Restart Kollabor if already running