# Overview

> What InkIt exposes: global hotkey dictation, Cartesia Ink-2 streaming STT, optional LLM polish, notch HUD feedback, transcript history, and Sparkle updates. Runtime requirements (macOS 14+, Apple silicon), primary entry points, and the shortest path from install to first pasted transcript.

- Repository: cartesia-ai/InkIt
- GitHub: https://github.com/cartesia-ai/InkIt
- Human docs: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b
- Complete Markdown: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/llms-full.txt

## Source Files

- `README.md`
- `InkIt/InkItApp.swift`
- `InkIt/AppCoordinator.swift`
- `project.yml`
- `InkIt/Info.plist`

---

---
title: "Overview"
description: "What InkIt exposes: global hotkey dictation, Cartesia Ink-2 streaming STT, optional LLM polish, notch HUD feedback, transcript history, and Sparkle updates. Runtime requirements (macOS 14+, Apple silicon), primary entry points, and the shortest path from install to first pasted transcript."
---

InkIt is a macOS menu-bar application (`ai.cartesia.InkIt`) that captures speech on a global hotkey, streams 16 kHz PCM audio to Cartesia Ink-2 over WebSocket, optionally rewrites the transcript through a BYOK LLM, and pastes the result at the focused editable field via synthesized Cmd+V. `AppCoordinator` owns the dictation lifecycle; `InkItApp` wires the SwiftUI shell, settings, history store, and Sparkle updater.

## Runtime requirements

| Requirement | Value |
| --- | --- |
| macOS | 14.0+ (`MACOSX_DEPLOYMENT_TARGET`, `LSMinimumSystemVersion`) |
| CPU | Apple silicon (release DMG ships as `InkIt_<version>_arm64.dmg`) |
| Xcode (source builds) | 15+ with XcodeGen |
| Network | Reachable `wss://api.cartesia.ai` for transcription; optional LLM endpoint when Polish is on |
| Permissions | Microphone (`AVCaptureDevice`) and Accessibility (`AXIsProcessTrusted`) |
| Sandbox | Disabled (`ENABLE_APP_SANDBOX: NO`) — paste synthesis requires an unsandboxed binary |

<Warning>
Running two copies of InkIt (for example, one from `/Applications` and one from a local build) breaks Accessibility grants and hotkey registration. `AppCoordinator` detects duplicate bundle instances at launch and surfaces an error.
</Warning>

## What InkIt exposes

### Dictation and transcription

- **Global hotkey** — default binding is **Fn / Globe** (`HotkeyBinding.fn`), registered through `HotkeyManager` after onboarding completes. Carbon combos and bare modifier keys are also supported.
- **Dictation modes** — **Hold to talk** (default): press to record, release to finalize and paste. **Hands-free**: press once to start, press again to stop and paste.
- **Streaming STT** — `CartesiaStreamingClient` connects to `wss://api.cartesia.ai/stt/turns/websocket` with `model=ink-2`, `encoding=pcm_s16le`, `sample_rate=16000`, `cartesia_version=2026-03-01`. Live transcript updates feed the notch HUD during recording.
- **Audio capture** — `AudioCaptureService` streams PCM frames to the WebSocket. `audioReady` gates the HUD until the mic is live (notably after Bluetooth A2DP→HFP switch).

### Optional Polish (LLM rewrite)

When `correctionEnabled` is true and a provider API key is present, `TranscriptRewriter` cleans filler, punctuation, and proper nouns. Default provider is **Groq** (`LLMProvider.groq`); Groq, Gemini, OpenAI, and Anthropic are supported. Polish failures fall back to the raw transcript — the user always gets text.

Connection prewarm starts at hotkey press so TLS setup overlaps speaking time.

### Notch HUD

`NotchHUDController` presents a borderless, always-on-top status island anchored below the camera notch (or screen top on non-notch displays). It shows recording waveform, live transcript, processing states (`Finalizing…`, `Polishing…`, `Pasting…`), brief error notices, and `Saved to History` confirmations. The HUD ignores mouse events and does not steal focus.

### Transcript history

`TranscriptHistoryStore` persists every dictation in SwiftData with per-stage latency (`transcribeMs`, `polishMs`, `pasteMs`), polish outcome, and optional before/after diff. The Home window provides searchable, sortable history and lifetime word-count stats. When no editable field is focused at release, the transcript is saved to history instead of pasted (`heldInHistory` state).

### Updates

Sparkle drives automatic update checks (`SUFeedURL` → GitHub `appcast.xml`). `UpdateManager` surfaces state through a custom pill on Home (`idle` → `available` → `updating` → `ready`) instead of Sparkle's default modal flow. Menu command **Check for Updates…** routes through the same driver.

## Architecture

```mermaid
flowchart TB
    subgraph UI["SwiftUI shell — InkItApp"]
        RootView
        OnboardingRootView
        MainWindowView
        SettingsView
    end

    subgraph Runtime["Dictation runtime — AppCoordinator"]
        HotkeyManager
        AudioCaptureService
        CartesiaStreamingClient
        TranscriptRewriter
        PasteService
        FocusedEditable
    end

    subgraph Feedback["Status surfaces"]
        NotchHUDController
        MenuBarLabel
    end

    subgraph Storage["Persistence"]
        SettingsStore
        TranscriptHistoryStore
        Keychain
    end

    subgraph External["External services"]
        CartesiaSTT["Cartesia Ink-2 STT"]
        LLM["LLM provider (optional)"]
        SparkleFeed["Sparkle appcast"]
    end

    HotkeyManager -->|press/release| Runtime
    AudioCaptureService -->|PCM 16 kHz| CartesiaStreamingClient
    CartesiaStreamingClient --> CartesiaSTT
    CartesiaStreamingClient -->|final transcript| TranscriptRewriter
    TranscriptRewriter --> LLM
    TranscriptRewriter --> PasteService
    FocusedEditable --> PasteService
    PasteService -->|Cmd+V| TargetApp["Focused app"]
    Runtime --> TranscriptHistoryStore
    Runtime --> NotchHUDController
    SettingsStore --> Keychain
    UpdateManager --> SparkleFeed
    RootView --> UI
    AppCoordinator --> Runtime
```

### Dictation state lifecycle

`AppCoordinator` drives `DictationState` through the hot path:

```mermaid
stateDiagram-v2
    [*] --> idle
    idle --> recording: hotkey press
    recording --> finalizing: hotkey release / toggle stop
    finalizing --> rewriting: transcript + polish enabled
    finalizing --> pasting: transcript, polish off
    rewriting --> pasting: rewrite complete
    pasting --> idle: paste success
    finalizing --> idle: empty transcript
    pasting --> heldInHistory: no editable focus
    heldInHistory --> idle: 2.5s timer
    recording --> error: permission / API / STT failure
    error --> idle: key release + dwell
    idle --> recording: hotkey during heldInHistory or error
```

## Primary entry points

| Surface | Identifier / path | Role |
| --- | --- | --- |
| App entry | `@main struct InkItApp` | Creates `WindowGroup("InkIt", id: "main")`, injects `AppCoordinator`, `SettingsStore`, `TranscriptHistoryStore` |
| Dictation orchestrator | `AppCoordinator` | Hotkey handlers, STT session, polish, paste, HUD registration |
| Settings | `SettingsStore.shared` | UserDefaults + Keychain persistence; hotkey, mode, API keys, appearance |
| Onboarding gate | `settings.hasCompletedOnboarding` | `RootView` shows `OnboardingRootView` until true, then `MainWindowView` |
| STT client | `CartesiaStreamingClient` | WebSocket session to Cartesia Ink-2 |
| History | `TranscriptHistoryStore.shared` | SwiftData-backed transcript log |
| Updates | `UpdateManager.shared` | Sparkle lifecycle + Home update pill |
| Debug trace | `~/Library/Logs/InkIt-debug.log` | Optional when `debugLoggingEnabled` is on |

### Default configuration

| Setting | Default |
| --- | --- |
| Hotkey | Fn / Globe (`HotkeyBinding.fn`) |
| Dictation mode | Hold (`DictationMode.hold`) |
| Polish | Off (`correctionEnabled: false`) |
| Polish provider (when enabled) | Groq |
| Feedback sounds | On |
| Notch horizontal position | `0.38` (slightly left of center) |
| Appearance | Light |
| Cartesia API key | Empty until onboarding or Settings |

<ParamField body="cartesiaAPIKey" type="string">
Cartesia API key stored in Keychain account `cartesiaAPIKey`. Required before dictation starts.
</ParamField>

<ParamField body="correctionEnabled" type="boolean">
Enables optional LLM polish via `TranscriptRewriter`. When false or no provider key is set, raw STT output is pasted.
</ParamField>

## Shortest path: install to first pasted transcript

<Steps>
<Step title="Install InkIt">

Download `InkIt.dmg` from the [latest GitHub release](https://github.com/cartesia-ai/InkIt/releases/latest/download/InkIt.dmg) and move `InkIt.app` to `/Applications`.

</Step>

<Step title="Launch and grant permissions">

Open InkIt. Onboarding walks through **welcome → permissions → API key → try it → done**. Grant **Microphone** and **Accessibility** when prompted. If Accessibility requires a silent relaunch, onboarding resumes at the permissions step automatically.

</Step>

<Step title="Enter a Cartesia API key">

Obtain a key from [play.cartesia.ai](https://play.cartesia.ai) and paste it during the API key step (or later in Settings). The key is stored in Keychain, not UserDefaults.

</Step>

<Step title="Complete the Try It trial">

On the **Try it** step, hold or press the default **Fn** hotkey, speak, and release. The notch HUD shows recording status. The trial routes transcript text into the practice card — it does not polish or paste.

</Step>

<Step title="Finish onboarding and dictate into any app">

Click through **Done** to set `hasCompletedOnboarding = true`. Focus an editable field in any app (editor, browser, chat, terminal). Hold **Fn**, speak, release. InkIt finalizes the Cartesia transcript, optionally polishes it, synthesizes Cmd+V into the focused app, and saves the row to history.

</Step>
</Steps>

<Check>
**Success signals:** menu bar label shows `● Ink` while recording, then returns to `Ink`; text appears at the cursor; a new row appears in Home history with latency stats.
</Check>

<Info>
If no editable field is focused at release, InkIt saves the transcript to history and shows **Saved to History** in the notch HUD instead of pasting. Copy from history or refocus a text field and dictate again.
</Info>

## BYOK credential model

InkIt separates transcription and polish credentials:

| Service | Keychain account | Required |
| --- | --- | --- |
| Cartesia STT | `cartesiaAPIKey` | Yes |
| LLM polish | `llm.{provider}` (e.g. `llm.groq`) | Only when `correctionEnabled` is true |

Service-issue flags (`cartesiaKeyInvalid`, `cartesiaOutOfCredits`, `polishKeyInvalid`, `polishOutOfCredits`) persist across launches and drive Home warning cards until the next successful dictation or polish clears them.

## Repository layout (high level)

:::files
InkIt/
├── InkItApp.swift          # @main, design tokens, RootView, MainWindowView
├── AppCoordinator.swift    # DictationState machine, hotkey → paste pipeline
├── CartesiaStreamingClient.swift
├── AudioCaptureService.swift
├── HotkeyManager.swift
├── PasteService.swift
├── TranscriptRewriter.swift
├── TranscriptHistoryStore.swift
├── SettingsStore.swift
├── NotchHUD.swift
├── UpdateManager.swift
└── OnboardingView.swift
project.yml                 # XcodeGen spec (bundle ID, Sparkle, entitlements)
tools/                      # make-dmg, appcast, publish-release, design-token check
:::

## Next

<CardGroup>
<Card title="Installation" href="/installation">
Install from the release DMG or build with XcodeGen. Covers bundle ID, deployment target, entitlements, and replacing `/Applications/InkIt.app` after local builds.
</Card>

<Card title="Quickstart" href="/quickstart">
First successful dictation: onboarding flow, hotkey hold/toggle, expected HUD states, and recovery when no editable field is focused.
</Card>

<Card title="Dictation pipeline" href="/dictation-pipeline">
End-to-end flow from hotkey press through audio capture, Cartesia WebSocket STT, optional polish, paste, and history persistence.
</Card>

<Card title="Configure Cartesia API key" href="/configure-cartesia-api-key">
Obtain, store, and validate the Cartesia key that powers Ink-2 transcription.
</Card>

<Card title="Dictation state machine" href="/dictation-state-machine">
`DictationState` transitions, `audioReady` coupling, and error dwell behavior.
</Card>
</CardGroup>
