Voice Capture & The Orb

Kowork features a powerful voice-to-text system called "The Orb" - a floating recording interface that appears instantly when you hold the Fn key (or use the global shortcut Cmd+.).

Overview

The Orb provides:

  • Hold-to-record interface - press and hold Fn key (or Cmd+.) to record, release to transcribe
  • Local transcription with Whisper - privacy-first, no cloud API required
  • AI enhancement - optional LLM post-processing with transcription profiles
  • Auto-paste - transcribed text automatically inserts into active application
  • Transcription history - all recordings saved to ~/.kollabor/transcription-history.json
  • NSPanel on macOS - works above fullscreen apps

How to Use

Hold-to-Record Mode (Default)

  1. Press and hold the Fn key (or Cmd+.)
  2. The Orb appears at your cursor position
  3. Speak your message while holding the key
  4. Release the key when finished
  5. Transcription happens locally using Whisper
  6. Text automatically pastes into the active app

Sticky Mode (Double-Tap)

  1. Double-tap the Fn key (within 500ms)
  2. The Orb stays open after recording
  3. Click the mic button to start/stop recording
  4. Record multiple takes without closing the Orb
  5. Press Escape or click close when done

Keyboard Shortcuts

ShortcutActionMode
Fn (hold)Start recording, stop on releaseGlobal
Fn (double-tap)Toggle sticky modeGlobal
Cmd+.Alternative to Fn key (configurable)Global
SpaceToggle recording (when Orb is open)Orb
EscapeClose Orb or profile selectorOrb

Note: Global shortcuts can be configured in Settings > Audio > Global Shortcuts.

Whisper Models

Kollabor uses OpenAI Whisper via whisper-rs (Rust bindings to whisper.cpp) for local transcription. Models are downloaded from Hugging Face and stored in ~/.kollabor/whisper-models/.

ModelSizeAccuracySpeedFile
tiny~75 MBBasicVery Fastggml-tiny.bin
base~142 MBGoodFastggml-base.bin
small~466 MBBetterModerateggml-small.bin
medium~1.5 GBGreatSlowerggml-medium.bin
large~3.1 GBBestSlowestggml-large-v3.bin

Default: small provides the best balance of accuracy and speed for most users.

Models can be downloaded and configured in Settings > Audio > Whisper Settings.

Architecture

The Orb system consists of several components working together:

Frontend (Vue 3)

  • OrbStyleRouter.vue - Main orb window component
  • useOrbCore.ts - Recording state composable
  • useVoiceRecording.ts - Unified recording interface
  • WaveformVisualizer.vue - Real-time audio visualization

Backend (Rust/Tauri)

  • fn_key_monitor.rs - Fn key detection (macOS)
  • whisper.rs - Whisper transcription engine
  • panel.rs - NSPanel configuration
  • commands/transcription.rs - IPC handlers
Recording Flow:
--------------
User holds Fn key
    |
    v
NSEvent (macOS) detects keyCode 63
    |
    v
fn_key_monitor.rs emits "fn-key-pressed"
    |
    v
main.rs creates NSPanel window at cursor
    |
    v
OrbStyleRouter mounts, useOrbCore.startRecording()
    |
    v
MediaRecorder captures audio (WebM/Opus)
    |
    v
User releases Fn key
    |
    v
Blob -> ArrayBuffer -> transcribeAudio(buffer)
    |
    v
whisper.rs transcribes using whisper-rs
    |
    v
(optional) LLM enhancement with profile prompt
    |
    v
Auto-paste to active application
    |
    v
Save to transcription-history.json

NSPanel (macOS Fullscreen Support)

On macOS, the Orb uses NSPanel instead of a regular window. This is required for the Orb to appear above fullscreen applications.

// src-tauri/src/panel.rs
tauri_panel! {
    panel!(RecordingOrb {
        config: {
            can_become_key_window: true,
            is_floating_panel: true
        }
    })
}

NSPanel configuration in main.rs:

  • NonactivatingPanel - doesn't steal focus from active app
  • FullScreenAuxiliary - appears above fullscreen windows
  • MoveToActiveSpace - follows user across desktops
  • Level 1000 - high z-order

Audio Recording Pipeline

The audio recording uses a dual-approach system:

Browser MediaRecorder (Frontend)

Used for real-time visualization and orb UI. Records WebM/Opus audio.

  • Format: audio/webm;codecs=opus
  • Audio constraints: echo cancellation, noise suppression, auto gain
  • Chunks collected during recording, combined into Blob on stop

Tauri Native Recording (Backend)

Optional native recording using cpal/rodio. WAV format saved to recordings directory.

  • Library: cpal (cross-platform audio I/O)
  • Format: 16-bit PCM WAV
  • Storage: ~/.kollabor/recordings/

Local Transcription with Whisper

Transcription happens entirely locally using whisper-rs (Rust bindings to whisper.cpp). No cloud API required.

// src-tauri/src/whisper.rs

// Load Whisper model (cached in memory)
let ctx = WhisperContext::new_with_params(model_path, params)?;

// Decode WebM/Opus to PCM samples
let decoded = decode_webm_opus(buffer)?;

// Resample to 16kHz mono
let resampled = resample(&mono_samples, sample_rate, 16000);

// Run transcription
let mut state = ctx.create_state()?;
state.full(params, &resampled)?;

// Extract text from segments
for segment in state.full_n_segments() {
    full_text.push_str(&segment.text);
}

Supported audio formats:

  • WebM/Opus (browser MediaRecorder output)
  • MP4/AAC (Safari MediaRecorder output)
  • WAV (16-bit PCM)
  • Raw PCM (fallback)

AI Enhancement

After transcription, the raw text can be optionally enhanced using an LLM with customizable transcription profiles.

Transcription Profiles

Profiles contain a system prompt that defines how the LLM should process the transcription.

{
  "id": "minimal",
  "name": "Minimal",
  "prompt": "Fix spelling and grammar only. Return ONLY the corrected text.",
  "builtin": true
}

Enhancement Flow

  1. Whisper produces raw transcription
  2. If LLM enabled, call processAIVoice with profile prompt
  3. LLM returns enhanced text (grammar fixes, formatting, etc.)
  4. Enhanced text is pasted instead of raw transcription

Toggle AI Enhancement: Click the sparkles icon in the Orb, or configure in Settings > Audio > Orb Settings.

Transcription History

All transcriptions are automatically saved to a local JSON file with metadata.

// Location: ~/.kollabor/transcription-history.json

{
  "history": [
    {
      "id": "trans-1234567890",
      "text": "Enhanced transcription text",
      "originalText": "Raw whisper transcription",
      "enhanced": true,
      "timestamp": 1699564800000,
      "date": "Friday, November 10, 2023",
      "time": "3:30 PM",
      "clipboardContent": "...",
      "type": "transcription"
    }
  ]
}

History is accessible in the app sidebar under "Transcriptions" and can be searched, filtered, and exported.

IPC Commands:

  • save_transcription_history(entry) - Save new entry
  • load_transcription_history() - Load all entries
  • clear_transcription_history() - Clear all entries
  • add_transcription_entry(entry) - Add entry and emit event

Auto-Paste Feature

Transcribed text automatically inserts into the active application using native system APIs.

Paste Methods

  • Primary: Clipboard + Cmd+V simulation (macOS)
  • Fallback: Clipboard only (user must paste manually)

Status Messages

  • Pasting... - Attempting auto-paste
  • Copied! Press Cmd+V - Fallback mode, manual paste required
  • Ready - Paste successful, orb hidden

Accessibility Permissions (macOS)

Fn key monitoring and auto-paste require macOS Accessibility permissions.

// src-tauri/src/fn_key_monitor.rs

// Check permissions using macOS API
extern "C" {
    fn AXIsProcessTrusted() -> bool;
}

let trusted = unsafe { AXIsProcessTrusted() };

How to grant permissions:

  1. Open System Settings
  2. Navigate to Privacy & Security > Accessibility
  3. Add Kollabor to the list and enable the checkbox
  4. Restart Kollabor if already running

IPC Commands:

  • check_fn_key_permissions() - Check current status
  • open_fn_key_permissions_settings() - Open System Settings

Configuration

Orb settings are stored in ~/.kollabor/config.json:

{
  "ui": {
    "orb": {
      "style": "sphere",
      "width": 250,
      "height": 250,
      "autoStartRecording": true
    }
  },
  "audio": {
    "deviceId": "Built-in Microphone",
    "echoCancellation": true,
    "noiseSuppression": true,
    "autoGainControl": true
  },
  "whisper": {
    "model": "small"
  },
  "llm": {
    "enabled": true
  },
  "transcriptionProfiles": {
    "minimal": {
      "id": "minimal",
      "name": "Minimal",
      "prompt": "Fix spelling and grammar only.",
      "builtin": true
    }
  },
  "activeTranscriptionProfile": "minimal"
}

All settings can be modified in the Settings UI or by editing the JSON file directly.