# InkIt Documentation

> Reference for InkIt, a macOS 14+ dictation app that streams audio to Cartesia Ink-2 STT, optionally polishes transcripts via BYOK LLM providers, and pastes into the focused application. Covers setup, the dictation pipeline, configuration keys, API contracts, troubleshooting, and contributor build workflows.

## Context Links

- [Agent index](https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/llms.txt)
- [Human interactive docs](https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b)
- [GitHub repository](https://github.com/cartesia-ai/InkIt)

## Repository Metadata

- Repository: cartesia-ai/InkIt

- Generated: 2026-06-15T22:14:29.136Z
- Updated: 2026-06-15T22:24:41.349Z
- Runtime: Grok CLI
- Format: Documentation
- Pages: 22

## Page Index

- 01. [Overview](https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/01-overview.md) - What InkIt exposes: global hotkey dictation, Cartesia Ink-2 streaming STT, optional LLM polish, notch HUD feedback, transcript history, and Sparkle updates. Runtime requirements (macOS 14+, Apple silicon), primary entry points, and the shortest path from install to first pasted transcript.
- 02. [Installation](https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/02-installation.md) - Install InkIt from the release DMG or build from source with XcodeGen. Prerequisites, bundle identifier, deployment target, entitlements (no App Sandbox), and the requirement to replace `/Applications/InkIt.app` after local builds.
- 03. [Quickstart](https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/03-quickstart.md) - First successful dictation: launch InkIt, complete onboarding (permissions, Cartesia API key, Try It trial), hold or toggle the default hotkey, speak, release, and verify text pastes at the cursor. Expected HUD states, success signals, and recovery when no editable field is focused.
- 04. [Dictation pipeline](https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/04-dictation-pipeline.md) - End-to-end flow from hotkey press through audio capture (16 kHz PCM), Cartesia WebSocket STT, optional TranscriptRewriter polish, FocusedEditable target resolution, PasteService Cmd+V insertion, and TranscriptHistoryStore persistence. Latency stages and prewarm behavior.
- 05. [Dictation state machine](https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/05-dictation-state-machine.md) - DictationState lifecycle: idle, recording, finalizing, rewriting, pasting, heldInHistory, and error. Transitions on hotkey press/release, STT completion, polish outcome, paste result, and empty-transcript collapse. audioReady and liveTranscript HUD coupling.
- 06. [Permissions model](https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/06-permissions-model.md) - Microphone (AVCaptureDevice) and Accessibility (AXIsProcessTrusted) requirements, PermissionState tri-state (granted, notRequested, needsManual), onboarding prompt flow, polling strategy, and the silent relaunch after Accessibility grant.
- 07. [Hotkey bindings](https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/07-hotkey-bindings.md) - HotkeyBinding variants: Carbon RegisterEventHotKey combos, Fn/Globe via CGEventTap with suppression, and bare modifier keys. DictationMode hold vs toggle semantics, isValidShortcut reserved-system checks, and HotkeyManager dedicated tap threads.
- 08. [Configure Cartesia API key](https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/08-configure-cartesia-api-key.md) - Obtain a Cartesia API key, enter it during onboarding or Settings, Keychain storage (account cartesiaAPIKey), advisory validation via APIKeyValidator, and persisted cartesiaKeyInvalid / cartesiaOutOfCredits flags that drive Home service-issue cards.
- 09. [Configure Polish](https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/09-configure-polish.md) - Enable optional transcript polish: pick an LLMProvider (Groq recommended), store per-provider keys in Keychain, select rewriteModel, toggle correctionEnabled, and interpret PolishUIState (setup, on, paused, keyBroken). Onboarding Groq-only validation path.
- 10. [Configure hotkeys and dictation mode](https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/10-configure-hotkeys-and-dictation-mode.md) - Record a global shortcut in Settings, switch between hold-to-talk and hands-free toggle, validate against reserved macOS shortcuts, and verify registration after Accessibility is granted. notchHorizontalPosition and playFeedbackSounds side settings.
- 11. [Input device selection](https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/11-input-device-selection.md) - Pin a preferred microphone by CoreAudio UID in Settings, fall back to system default when unplugged, Bluetooth A2DP→HFP switch delay and audioReady HUD cue, and the readyLevelThreshold / readyFallbackDelay backstop in AudioCaptureService.
- 12. [Settings reference](https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/12-settings-reference.md) - All persisted SettingsStore keys: UserDefaults fields (appearance, hotkey, dictationMode, notch position, service-issue flags), Keychain accounts (cartesiaAPIKey, llm.{provider}), SMAppService launchAtLogin, and derived properties transcriptionIssue / polishIssue.
- 13. [Cartesia STT reference](https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/13-cartesia-stt-reference.md) - CartesiaStreamingClient WebSocket contract: wss://api.cartesia.ai/stt/turns/websocket, model ink-2, encoding pcm_s16le, sample_rate 16000, cartesia_version 2026-03-01, server events, client close message, pending audio buffer, and STTFailure classification table with notchMessage strings.
- 14. [LLM providers reference](https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/14-llm-providers-reference.md) - LLMProvider enum: Groq, Gemini, OpenAI, Anthropic endpoints, default models, rewriteTimeout ceilings, validationRequest probes, keyURL/billingURL, OpenAI-compatible vs Anthropic Messages API shapes, and RewriteFailure reason codes.
- 15. [Paste and focus reference](https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/15-paste-and-focus-reference.md) - PasteService clipboard session tagging, Cmd+V synthesis, timing constants, FocusedEditable AX budget walk, Chromium enableWebAccessibility prewarm, heldInHistory when no editable field, and TargetAppSnapshot paste-target resolution chain.
- 16. [STT troubleshooting](https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/16-stt-troubleshooting.md) - Diagnose Cartesia transcription failures: STTFailure cases (offline, serverError, rateLimited, outOfCredits, invalidKey, unknown), notch vs Home card surfacing, graceful collapse on empty or benign disconnect, and STTFailureRoutingTests regression contracts.
- 17. [Polish troubleshooting](https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/17-polish-troubleshooting.md) - Polish failure modes: RewriteFailure and PolishFailureReason, rate-limit Retry-After, timeout fallbacks to raw transcript, polishKeyInvalid and polishOutOfCredits persistence, history-row failure tooltips, and prewarm connection behavior.
- 18. [Runtime troubleshooting](https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/18-runtime-troubleshooting.md) - Non-API runtime issues: duplicate InkIt bundle instances and Accessibility grants, Bluetooth mic profile delay, debugLoggingEnabled trace file at ~/Library/Logs/InkIt-debug.log, SwiftData persistence fallback, and notch HUD positioning on non-notch displays.
- 19. [Build from source](https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/19-build-from-source.md) - Generate InkIt.xcodeproj with XcodeGen, open in Xcode 15+, Debug vs Release signing via Config/*.xcconfig, ad-hoc vs Developer ID notarized builds, Sparkle deep-sign post-build script, and install to /Applications.
- 20. [Testing and CI](https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/20-testing-and-ci.md) - Run xcodebuild test locally, CI workflow steps (design-token check, xcodegen, Debug test, Release build), test targets (STTFailureRouting, AXBudget, TranscriptRecord, DebugLogFormatter), and design-token enforcement via tools/check-design-tokens.sh.
- 21. [Release and distribution](https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/21-release-and-distribution.md) - Ship InkIt: tools/make-dmg.sh, tools/make-appcast.sh, tools/publish-release.sh, Sparkle SUFeedURL and UpdateManager custom pill flow, version bump via tools/bump-version.sh, and GitHub release assets (InkIt.dmg, appcast.xml).
- 22. [Contributing](https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/22-contributing.md) - Fork-only contribution policy, feedback form, design-system token rules (Font.ink*, Color.canvas, ds-allow escape hatch), DESIGN_SYSTEM.md reference, commit-message conventions in AGENTS.md, and CI-enforced checks.

## Source File Index

- `.github/CONTRIBUTING.md`
- `.github/workflows/ci.yml`
- `AGENTS.md`
- `Config/Debug.xcconfig`
- `Config/Signing.local.xcconfig.example`
- `Config/Signing.xcconfig`
- `Config/Sparkle.xcconfig`
- `DESIGN_SYSTEM.md`
- `InkIt/AppCoordinator.swift`
- `InkIt/AudioCaptureService.swift`
- `InkIt/AudioDeviceManager.swift`
- `InkIt/AudioPCMConverter.swift`
- `InkIt/AXTreeDumper.swift`
- `InkIt/CartesiaKeyValidator.swift`
- `InkIt/CartesiaStreamingClient.swift`
- `InkIt/ContextSnapshot.swift`
- `InkIt/DebugLog.swift`
- `InkIt/FeedbackSoundPlayer.swift`
- `InkIt/HotkeyManager.swift`
- `InkIt/Info.plist`
- `InkIt/InkIt.entitlements`
- `InkIt/InkItApp.swift`
- `InkIt/LLMProvider.swift`
- `InkIt/NotchHUD.swift`
- `InkIt/OnboardingView.swift`
- `InkIt/PasteService.swift`
- `InkIt/PermissionsService.swift`
- `InkIt/SettingsStore.swift`
- `InkIt/SettingsView.swift`
- `InkIt/TranscriptHistoryStore.swift`
- `InkIt/TranscriptRewriter.swift`
- `InkIt/TryItPracticeCard.swift`
- `InkIt/UpdateManager.swift`
- `InkItTests/AXBudgetTests.swift`
- `InkItTests/DebugLogFormatterTests.swift`
- `InkItTests/STTFailureRoutingTests.swift`
- `InkItTests/TranscriptRecordTests.swift`
- `LICENSE`
- `project.yml`
- `README.md`
- `tools/bump-version.sh`
- `tools/check-design-tokens.sh`
- `tools/make-appcast.sh`
- `tools/make-dmg.sh`
- `tools/publish-release.sh`

---

## 01. Overview

> What InkIt exposes: global hotkey dictation, Cartesia Ink-2 streaming STT, optional LLM polish, notch HUD feedback, transcript history, and Sparkle updates. Runtime requirements (macOS 14+, Apple silicon), primary entry points, and the shortest path from install to first pasted transcript.

- Page Markdown: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/01-overview.md
- Generated: 2026-06-15T22:07:43.200Z

### Source Files

- `README.md`
- `InkIt/InkItApp.swift`
- `InkIt/AppCoordinator.swift`
- `project.yml`
- `InkIt/Info.plist`

---
title: "Overview"
description: "What InkIt exposes: global hotkey dictation, Cartesia Ink-2 streaming STT, optional LLM polish, notch HUD feedback, transcript history, and Sparkle updates. Runtime requirements (macOS 14+, Apple silicon), primary entry points, and the shortest path from install to first pasted transcript."
---

InkIt is a macOS menu-bar application (`ai.cartesia.InkIt`) that captures speech on a global hotkey, streams 16 kHz PCM audio to Cartesia Ink-2 over WebSocket, optionally rewrites the transcript through a BYOK LLM, and pastes the result at the focused editable field via synthesized Cmd+V. `AppCoordinator` owns the dictation lifecycle; `InkItApp` wires the SwiftUI shell, settings, history store, and Sparkle updater.

## Runtime requirements

| Requirement | Value |
| --- | --- |
| macOS | 14.0+ (`MACOSX_DEPLOYMENT_TARGET`, `LSMinimumSystemVersion`) |
| CPU | Apple silicon (release DMG ships as `InkIt_<version>_arm64.dmg`) |
| Xcode (source builds) | 15+ with XcodeGen |
| Network | Reachable `wss://api.cartesia.ai` for transcription; optional LLM endpoint when Polish is on |
| Permissions | Microphone (`AVCaptureDevice`) and Accessibility (`AXIsProcessTrusted`) |
| Sandbox | Disabled (`ENABLE_APP_SANDBOX: NO`) — paste synthesis requires an unsandboxed binary |

<Warning>
Running two copies of InkIt (for example, one from `/Applications` and one from a local build) breaks Accessibility grants and hotkey registration. `AppCoordinator` detects duplicate bundle instances at launch and surfaces an error.
</Warning>

## What InkIt exposes

### Dictation and transcription

- **Global hotkey** — default binding is **Fn / Globe** (`HotkeyBinding.fn`), registered through `HotkeyManager` after onboarding completes. Carbon combos and bare modifier keys are also supported.
- **Dictation modes** — **Hold to talk** (default): press to record, release to finalize and paste. **Hands-free**: press once to start, press again to stop and paste.
- **Streaming STT** — `CartesiaStreamingClient` connects to `wss://api.cartesia.ai/stt/turns/websocket` with `model=ink-2`, `encoding=pcm_s16le`, `sample_rate=16000`, `cartesia_version=2026-03-01`. Live transcript updates feed the notch HUD during recording.
- **Audio capture** — `AudioCaptureService` streams PCM frames to the WebSocket. `audioReady` gates the HUD until the mic is live (notably after Bluetooth A2DP→HFP switch).

### Optional Polish (LLM rewrite)

When `correctionEnabled` is true and a provider API key is present, `TranscriptRewriter` cleans filler, punctuation, and proper nouns. Default provider is **Groq** (`LLMProvider.groq`); Groq, Gemini, OpenAI, and Anthropic are supported. Polish failures fall back to the raw transcript — the user always gets text.

Connection prewarm starts at hotkey press so TLS setup overlaps speaking time.

### Notch HUD

`NotchHUDController` presents a borderless, always-on-top status island anchored below the camera notch (or screen top on non-notch displays). It shows recording waveform, live transcript, processing states (`Finalizing…`, `Polishing…`, `Pasting…`), brief error notices, and `Saved to History` confirmations. The HUD ignores mouse events and does not steal focus.

### Transcript history

`TranscriptHistoryStore` persists every dictation in SwiftData with per-stage latency (`transcribeMs`, `polishMs`, `pasteMs`), polish outcome, and optional before/after diff. The Home window provides searchable, sortable history and lifetime word-count stats. When no editable field is focused at release, the transcript is saved to history instead of pasted (`heldInHistory` state).

### Updates

Sparkle drives automatic update checks (`SUFeedURL` → GitHub `appcast.xml`). `UpdateManager` surfaces state through a custom pill on Home (`idle` → `available` → `updating` → `ready`) instead of Sparkle's default modal flow. Menu command **Check for Updates…** routes through the same driver.

## Architecture

```mermaid
flowchart TB
    subgraph UI["SwiftUI shell — InkItApp"]
        RootView
        OnboardingRootView
        MainWindowView
        SettingsView
    end

    subgraph Runtime["Dictation runtime — AppCoordinator"]
        HotkeyManager
        AudioCaptureService
        CartesiaStreamingClient
        TranscriptRewriter
        PasteService
        FocusedEditable
    end

    subgraph Feedback["Status surfaces"]
        NotchHUDController
        MenuBarLabel
    end

    subgraph Storage["Persistence"]
        SettingsStore
        TranscriptHistoryStore
        Keychain
    end

    subgraph External["External services"]
        CartesiaSTT["Cartesia Ink-2 STT"]
        LLM["LLM provider (optional)"]
        SparkleFeed["Sparkle appcast"]
    end

    HotkeyManager -->|press/release| Runtime
    AudioCaptureService -->|PCM 16 kHz| CartesiaStreamingClient
    CartesiaStreamingClient --> CartesiaSTT
    CartesiaStreamingClient -->|final transcript| TranscriptRewriter
    TranscriptRewriter --> LLM
    TranscriptRewriter --> PasteService
    FocusedEditable --> PasteService
    PasteService -->|Cmd+V| TargetApp["Focused app"]
    Runtime --> TranscriptHistoryStore
    Runtime --> NotchHUDController
    SettingsStore --> Keychain
    UpdateManager --> SparkleFeed
    RootView --> UI
    AppCoordinator --> Runtime
```

### Dictation state lifecycle

`AppCoordinator` drives `DictationState` through the hot path:

```mermaid
stateDiagram-v2
    [*] --> idle
    idle --> recording: hotkey press
    recording --> finalizing: hotkey release / toggle stop
    finalizing --> rewriting: transcript + polish enabled
    finalizing --> pasting: transcript, polish off
    rewriting --> pasting: rewrite complete
    pasting --> idle: paste success
    finalizing --> idle: empty transcript
    pasting --> heldInHistory: no editable focus
    heldInHistory --> idle: 2.5s timer
    recording --> error: permission / API / STT failure
    error --> idle: key release + dwell
    idle --> recording: hotkey during heldInHistory or error
```

## Primary entry points

| Surface | Identifier / path | Role |
| --- | --- | --- |
| App entry | `@main struct InkItApp` | Creates `WindowGroup("InkIt", id: "main")`, injects `AppCoordinator`, `SettingsStore`, `TranscriptHistoryStore` |
| Dictation orchestrator | `AppCoordinator` | Hotkey handlers, STT session, polish, paste, HUD registration |
| Settings | `SettingsStore.shared` | UserDefaults + Keychain persistence; hotkey, mode, API keys, appearance |
| Onboarding gate | `settings.hasCompletedOnboarding` | `RootView` shows `OnboardingRootView` until true, then `MainWindowView` |
| STT client | `CartesiaStreamingClient` | WebSocket session to Cartesia Ink-2 |
| History | `TranscriptHistoryStore.shared` | SwiftData-backed transcript log |
| Updates | `UpdateManager.shared` | Sparkle lifecycle + Home update pill |
| Debug trace | `~/Library/Logs/InkIt-debug.log` | Optional when `debugLoggingEnabled` is on |

### Default configuration

| Setting | Default |
| --- | --- |
| Hotkey | Fn / Globe (`HotkeyBinding.fn`) |
| Dictation mode | Hold (`DictationMode.hold`) |
| Polish | Off (`correctionEnabled: false`) |
| Polish provider (when enabled) | Groq |
| Feedback sounds | On |
| Notch horizontal position | `0.38` (slightly left of center) |
| Appearance | Light |
| Cartesia API key | Empty until onboarding or Settings |

<ParamField body="cartesiaAPIKey" type="string">
Cartesia API key stored in Keychain account `cartesiaAPIKey`. Required before dictation starts.
</ParamField>

<ParamField body="correctionEnabled" type="boolean">
Enables optional LLM polish via `TranscriptRewriter`. When false or no provider key is set, raw STT output is pasted.
</ParamField>

## Shortest path: install to first pasted transcript

<Steps>
<Step title="Install InkIt">

Download `InkIt.dmg` from the [latest GitHub release](https://github.com/cartesia-ai/InkIt/releases/latest/download/InkIt.dmg) and move `InkIt.app` to `/Applications`.

</Step>

<Step title="Launch and grant permissions">

Open InkIt. Onboarding walks through **welcome → permissions → API key → try it → done**. Grant **Microphone** and **Accessibility** when prompted. If Accessibility requires a silent relaunch, onboarding resumes at the permissions step automatically.

</Step>

<Step title="Enter a Cartesia API key">

Obtain a key from [play.cartesia.ai](https://play.cartesia.ai) and paste it during the API key step (or later in Settings). The key is stored in Keychain, not UserDefaults.

</Step>

<Step title="Complete the Try It trial">

On the **Try it** step, hold or press the default **Fn** hotkey, speak, and release. The notch HUD shows recording status. The trial routes transcript text into the practice card — it does not polish or paste.

</Step>

<Step title="Finish onboarding and dictate into any app">

Click through **Done** to set `hasCompletedOnboarding = true`. Focus an editable field in any app (editor, browser, chat, terminal). Hold **Fn**, speak, release. InkIt finalizes the Cartesia transcript, optionally polishes it, synthesizes Cmd+V into the focused app, and saves the row to history.

</Step>
</Steps>

<Check>
**Success signals:** menu bar label shows `● Ink` while recording, then returns to `Ink`; text appears at the cursor; a new row appears in Home history with latency stats.
</Check>

<Info>
If no editable field is focused at release, InkIt saves the transcript to history and shows **Saved to History** in the notch HUD instead of pasting. Copy from history or refocus a text field and dictate again.
</Info>

## BYOK credential model

InkIt separates transcription and polish credentials:

| Service | Keychain account | Required |
| --- | --- | --- |
| Cartesia STT | `cartesiaAPIKey` | Yes |
| LLM polish | `llm.{provider}` (e.g. `llm.groq`) | Only when `correctionEnabled` is true |

Service-issue flags (`cartesiaKeyInvalid`, `cartesiaOutOfCredits`, `polishKeyInvalid`, `polishOutOfCredits`) persist across launches and drive Home warning cards until the next successful dictation or polish clears them.

## Repository layout (high level)

:::files
InkIt/
├── InkItApp.swift          # @main, design tokens, RootView, MainWindowView
├── AppCoordinator.swift    # DictationState machine, hotkey → paste pipeline
├── CartesiaStreamingClient.swift
├── AudioCaptureService.swift
├── HotkeyManager.swift
├── PasteService.swift
├── TranscriptRewriter.swift
├── TranscriptHistoryStore.swift
├── SettingsStore.swift
├── NotchHUD.swift
├── UpdateManager.swift
└── OnboardingView.swift
project.yml                 # XcodeGen spec (bundle ID, Sparkle, entitlements)
tools/                      # make-dmg, appcast, publish-release, design-token check
:::

## Next

<CardGroup>
<Card title="Installation" href="/installation">
Install from the release DMG or build with XcodeGen. Covers bundle ID, deployment target, entitlements, and replacing `/Applications/InkIt.app` after local builds.
</Card>

<Card title="Quickstart" href="/quickstart">
First successful dictation: onboarding flow, hotkey hold/toggle, expected HUD states, and recovery when no editable field is focused.
</Card>

<Card title="Dictation pipeline" href="/dictation-pipeline">
End-to-end flow from hotkey press through audio capture, Cartesia WebSocket STT, optional polish, paste, and history persistence.
</Card>

<Card title="Configure Cartesia API key" href="/configure-cartesia-api-key">
Obtain, store, and validate the Cartesia key that powers Ink-2 transcription.
</Card>

<Card title="Dictation state machine" href="/dictation-state-machine">
`DictationState` transitions, `audioReady` coupling, and error dwell behavior.
</Card>
</CardGroup>

---

## 02. Installation

> Install InkIt from the release DMG or build from source with XcodeGen. Prerequisites, bundle identifier, deployment target, entitlements (no App Sandbox), and the requirement to replace `/Applications/InkIt.app` after local builds.

- Page Markdown: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/02-installation.md
- Generated: 2026-06-15T22:07:29.975Z

### Source Files

- `README.md`
- `project.yml`
- `InkIt/Info.plist`
- `InkIt/InkIt.entitlements`
- `Config/Signing.local.xcconfig.example`

---
title: "Installation"
description: "Install InkIt from the release DMG or build from source with XcodeGen. Prerequisites, bundle identifier, deployment target, entitlements (no App Sandbox), and the requirement to replace `/Applications/InkIt.app` after local builds."
---

InkIt ships as a macOS application bundle (`InkIt.app`) with bundle identifier `ai.cartesia.InkIt`, deployment target macOS 14.0, and App Sandbox disabled so global hotkey capture and synthesized Cmd+V paste via Accessibility can run unsandboxed. End users install from a GitHub release DMG; developers generate `InkIt.xcodeproj` with XcodeGen and must copy the built app into `/Applications/InkIt.app` for changes to take effect.

## Prerequisites

| Requirement | Value | Notes |
| --- | --- | --- |
| Operating system | macOS 14.0+ | `LSMinimumSystemVersion` and `MACOSX_DEPLOYMENT_TARGET` are both `14.0` |
| Architecture | Apple silicon (`arm64`) | Release artifacts are published as `InkIt_<version>_arm64.dmg` |
| End-user install | None beyond macOS | Download the release DMG; no separate runtime |
| Source build | Xcode 15+, XcodeGen | `brew install xcodegen`; `InkIt.xcodeproj` is generated — edit `project.yml`, not the `.pbxproj` |
| Optional notarized builds | Apple Developer ID + `Config/Signing.local.xcconfig` | Maintainer path only; ad-hoc signing works for local dev |

<Warning>
InkIt targets Apple silicon Macs. Intel Macs are not supported in the published release artifacts.
</Warning>

## Install paths

<Tabs>
<Tab title="Release DMG">

The README download button resolves to the latest GitHub release asset:

```
https://github.com/cartesia-ai/InkIt/releases/latest/download/InkIt.dmg
```

<Steps>
<Step title="Download the DMG">

Download `InkIt.dmg` from the [latest GitHub release](https://github.com/cartesia-ai/InkIt/releases/latest/download/InkIt.dmg).

</Step>
<Step title="Move InkIt to Applications">

Open the DMG and drag `InkIt.app` into `/Applications`.

</Step>
<Step title="Launch and verify">

Launch InkIt from Applications. The app appears in the Dock (`LSUIElement` is `false`) and begins onboarding for Microphone and Accessibility permissions.

</Step>
</Steps>

Sparkle checks for updates automatically. The feed URL is configured in `Info.plist`:

<ParamField body="SUFeedURL" type="string">
`https://github.com/cartesia-ai/InkIt/releases/latest/download/appcast.xml`
</ParamField>

<ParamField body="SUEnableAutomaticChecks" type="boolean">
`true` — automatic update checks are enabled on launch.
</ParamField>

</Tab>
<Tab title="Build from source">

<Steps>
<Step title="Install XcodeGen">

```bash
brew install xcodegen
```

</Step>
<Step title="Generate the Xcode project">

```bash
xcodegen generate
open InkIt.xcodeproj
```

</Step>
<Step title="Build Release">

Build the **Release** configuration in Xcode, or from the command line with an explicit derived data path:

```bash
xcodebuild \
  -project InkIt.xcodeproj \
  -scheme InkIt \
  -configuration Release \
  -derivedDataPath build \
  build
```

The built app lands at `build/Build/Products/Release/InkIt.app`.

</Step>
<Step title="Install to Applications">

```bash
cp -R build/Build/Products/Release/InkIt.app /Applications/
```

</Step>
</Steps>

<Warning>
Rebuilding alone is not enough. Changes only take effect after you replace `/Applications/InkIt.app`. Quit any running InkIt instance before copying the new bundle.
</Warning>

Debug builds use ad-hoc signing with `InkIt-Debug.entitlements` (`get-task-allow=true`) for debugger and XCTest attachment. Release builds use `InkIt.entitlements` with hardened runtime enabled. See [Build from source](/build-from-source) for signing, notarization, and Sparkle deep-sign details.

</Tab>
</Tabs>

## App identity

| Field | Value |
| --- | --- |
| Display name | `InkIt` |
| Bundle identifier | `ai.cartesia.InkIt` |
| Bundle ID prefix | `ai.cartesia` |
| Deployment target | macOS `14.0` |
| Current version | `0.1.2` (`CFBundleShortVersionString`) |
| Build number | `4` (`CFBundleVersion`) |
| Test bundle ID | `ai.cartesia.InkItTests` |

`Info.plist` usage descriptions gate permission prompts at first launch:

- **Microphone** — records voice while the dictation hotkey is held and streams audio to Cartesia STT.
- **Apple Events** — pastes transcribed text into the focused application.

## Entitlements and sandbox

InkIt runs **without App Sandbox**. `ENABLE_APP_SANDBOX` is `NO` in `project.yml` because synthesized Cmd+V paste through Accessibility requires an unsandboxed binary.

Release entitlements (`InkIt/InkIt.entitlements`):

| Entitlement key | Release value | Purpose |
| --- | --- | --- |
| `com.apple.security.device.audio-input` | `true` | Microphone capture for dictation |
| `com.apple.security.automation.apple-events` | `true` | Paste into focused apps via Apple Events |
| `com.apple.security.get-task-allow` | `false` | Required for notarized distribution |

Debug builds swap in `InkIt/InkIt-Debug.entitlements`, which is identical except `get-task-allow` is `true` so `xcodebuild test` and the Xcode debugger can attach to the host app.

Signing configuration by build type:

| Configuration | Identity | Hardened runtime | Entitlements file |
| --- | --- | --- | --- |
| Debug | Ad-hoc (`-`) | Off | `InkIt/InkIt-Debug.entitlements` |
| Release (default) | Ad-hoc (`-`) | On | `InkIt/InkIt.entitlements` |
| Release (maintainer) | Developer ID Application | On | `InkIt/InkIt.entitlements` + `Config/Signing.local.xcconfig` |

To produce a notarized, distributable build, copy the example signing config and supply your Team ID:

```bash
cp Config/Signing.local.xcconfig.example Config/Signing.local.xcconfig
```

Then set `DEVELOPMENT_TEAM`, `CODE_SIGN_STYLE = Manual`, and `CODE_SIGN_IDENTITY = Developer ID Application` in the local file.

## Replace `/Applications/InkIt.app` after local builds

macOS resolves Accessibility grants and running instances by bundle path. When developing locally:

1. **Quit** any running InkIt before installing a new build.
2. **Copy** the freshly built `InkIt.app` over `/Applications/InkIt.app` — do not rely on Xcode's default DerivedData output path alone.
3. **Launch** the `/Applications` copy, not a stale build left in DerivedData or a duplicate path.

If multiple copies with bundle ID `ai.cartesia.InkIt` run simultaneously, `AppCoordinator` surfaces an error naming both paths and instructs you to quit the duplicate and grant Accessibility to only one app bundle. This commonly happens when both a release DMG install and a local build are running, or when Accessibility was granted to an old build path.

<Info>
After replacing the app bundle, you may need to re-grant Accessibility if macOS still associates the permission with a previous binary path. See [Permissions model](/permissions-model) for the onboarding and silent-relaunch flow.
</Info>

## Verify installation

<Check>
**Release install** — `InkIt.app` exists at `/Applications/InkIt.app`, launches from Dock, and onboarding prompts for Microphone access appear.
</Check>

<Check>
**Local build** — `codesign -dvv /Applications/InkIt.app` reports the expected signing identity (ad-hoc `Signature=adhoc` for contributor builds, or `Developer ID Application` for maintainer releases).
</Check>

<Check>
**Single instance** — Only one InkIt process runs; Activity Monitor shows a single `InkIt` entry under `/Applications/InkIt.app`.
</Check>

## Next

<CardGroup>
<Card title="Quickstart" href="/quickstart">
Complete onboarding, configure your Cartesia API key, and run your first dictation after install.
</Card>
<Card title="Permissions model" href="/permissions-model">
Microphone and Accessibility requirements, tri-state permission flow, and post-grant relaunch behavior.
</Card>
<Card title="Build from source" href="/build-from-source">
XcodeGen project generation, Debug vs Release signing, notarized builds, and Sparkle deep-sign post-build scripts.
</Card>
<Card title="Runtime troubleshooting" href="/runtime-troubleshooting">
Duplicate bundle instances, Accessibility grant mismatches, and other non-API runtime issues.
</Card>
</CardGroup>

---

## 03. Quickstart

> First successful dictation: launch InkIt, complete onboarding (permissions, Cartesia API key, Try It trial), hold or toggle the default hotkey, speak, release, and verify text pastes at the cursor. Expected HUD states, success signals, and recovery when no editable field is focused.

- Page Markdown: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/03-quickstart.md
- Generated: 2026-06-15T22:07:38.360Z

### Source Files

- `README.md`
- `InkIt/OnboardingView.swift`
- `InkIt/AppCoordinator.swift`
- `InkIt/PermissionsService.swift`
- `InkIt/TryItPracticeCard.swift`

---
title: "Quickstart"
description: "First successful dictation: launch InkIt, complete onboarding (permissions, Cartesia API key, Try It trial), hold or toggle the default hotkey, speak, release, and verify text pastes at the cursor. Expected HUD states, success signals, and recovery when no editable field is focused."
---

InkIt’s first-run path is `OnboardingRootView` → `hasCompletedOnboarding = true` → `MainWindowView`, with the global hotkey registered only after onboarding (except during the Try It trial). Default shortcut is **Fn** (`HotkeyBinding.fn`) in **hold-to-talk** mode (`DictationMode.hold`); release triggers Cartesia Ink-2 STT, optional polish, and `PasteService` Cmd+V insertion at the focused editable field.

## Prerequisites

| Requirement | Why |
| --- | --- |
| macOS 14+ on Apple silicon | Deployment target enforced by the app bundle |
| InkIt installed to `/Applications` | Duplicate bundle instances break Accessibility grants |
| Microphone permission | `AVCaptureDevice` audio capture for 16 kHz PCM |
| Accessibility permission | Global hotkey tap and synthetic paste require `AXIsProcessTrusted` |
| Cartesia API key | Stored in Keychain (`cartesiaAPIKey`); drives Ink-2 WebSocket STT |

<Note>
Polish is optional and off by default (`correctionEnabled` is `false` until you enable it). First dictation needs only the Cartesia key.
</Note>

## Onboarding sequence

`OnboardingStep` advances in order: `welcome` → `permissions` → `apiKey` → `tryIt` → `done`. Progress dots gate forward jumps until each step’s requirement is satisfied live (for example, clearing the API key re-locks later steps).

<Steps>
<Step title="Launch InkIt">

Open InkIt from Applications. `RootView` shows `OnboardingRootView` while `hasCompletedOnboarding` is `false`.

</Step>

<Step title="Grant permissions">

On **Permissions**, enable both cards:

- **Microphone** — `PermissionsService.requestMicrophone` fires the TCC prompt; denied/restricted states switch to `needsManual` with a link to **Privacy & Security ▸ Microphone**.
- **Accessibility** — `requestAccessibility` calls `AXIsProcessTrustedWithOptions(prompt: true)` once, pre-adds InkIt to the Accessibility list, and opens **Privacy & Security ▸ Accessibility**. After a deny or stale in-process trust bit, the card shows manual steps; granting may trigger a silent relaunch that resumes at the permissions step (`resumeOnboardingAtPermissions`).

`PermissionsService` polls every 0.5 s so toggles in System Settings are detected without restarting. **Continue** appears only when `hasMicrophone && hasAccessibility`.

</Step>

<Step title="Enter Cartesia API key">

On **API key**, paste a key from [play.cartesia.ai/keys](https://play.cartesia.ai/keys) into the masked field. `CartesiaKeyValidator` shows inline status (checking, verified, invalid, couldn’t verify). **Continue** requires a non-empty trimmed key (~15,000 words/month on the free tier).

</Step>

<Step title="Complete Try It">

On **Try it**, read the sample line aloud while holding the default shortcut. `TryItPracticeCard` calls `beginOnboardingTrial()`, registers the hotkey, and shows the real notch HUD. Hold **Fn**, speak, release; the transcript lands in **What InkIt heard** (no live word-by-word preview during the trial). Edit if needed, then send or choose **Skip for now**. Finish on **You're ready!** with **Start using InkIt**, which sets `hasCompletedOnboarding = true`.

</Step>
</Steps>

## Dictate with the default hotkey

After onboarding, `AppCoordinator.refreshHUD()` creates `NotchHUDController` and registers the stored binding.

| Setting | Default | User-facing gesture |
| --- | --- | --- |
| `hotkey` | `.fn` (displayed as `🌐 fn`) | Hold Fn while speaking |
| `dictationMode` | `.hold` | Release to finalize and paste |
| `dictationMode` | `.toggle` (if changed) | Press once to start, again to stop and paste |

<Tabs>
<Tab title="Hold to talk (default)">

1. Focus an editable field in any app (TextEdit, browser, chat, terminal).
2. Press and hold **Fn**.
3. Wait for the notch HUD waveform (see below), then speak.
4. Release **Fn**. InkIt finalizes STT, optionally polishes, and pastes via Cmd+V.

</Tab>
<Tab title="Hands-free toggle">

1. Focus an editable field.
2. Press **Fn** once to start recording.
3. Speak; press **Fn** again to stop, finalize, and paste.

</Tab>
</Tabs>

<Warning>
If Accessibility is missing when you press the hotkey, the notch shows **Accessibility needed** and System Settings opens at most once every 10 seconds. Grant Accessibility to the single InkIt bundle in `/Applications`.
</Warning>

## Notch HUD states

During onboarding Try It and post-onboarding dictation, the notch island is the live status surface. Finalizing, polishing, and pasting run silently in the background; the menu bar label still reflects them.

| Phase | `DictationState` | Notch island | Menu bar label |
| --- | --- | --- | --- |
| Mic warming up | `.recording` + `audioReady == false` | `InkIt` + pulsing preparing dot | `● Ink` |
| Capturing speech | `.recording` + `audioReady == true` | `InkIt` + live waveform (`inputLevel`) | `● Ink` |
| Released / processing | `.finalizing`, `.rewriting`, `.pasting` | Hidden (collapsed into notch) | `… Ink` / `✎ Ink` / `↩ Ink` |
| No paste target | `.heldInHistory` | `InkIt • Saved to History` (~2.5 s) | `⬇ Ink` |
| Failure | `.error(message)` | `InkIt ⚠ <message>` | `⚠ Ink` |
| Idle | `.idle` | Hidden | `Ink` |

<Tip>
`audioReady` flips when input level exceeds `readyLevelThreshold` (0.03) or after a 0.6 s fallback. On Bluetooth mics, wait for the waveform before speaking — the device may need 200–500 ms to switch from A2DP to HFP.
</Tip>

On Macs without a physical notch, the same content appears in a floating capsule below the menu bar.

## Success signals

### Try It (onboarding)

- Amber recording ring on the key cap while `state == .recording`.
- After release, the full transcript appears at once in **What InkIt heard** (trial suppresses interim `liveTranscript` updates).
- Green checkmark and enabled send button when `editedText` is non-empty and not mid-take.
- Take logged to `TranscriptHistoryStore` as polish `.off` with transcribe-only latency.

### First paste in any app

1. Open a text field in another app and place the cursor.
2. Hold **Fn**, speak a short phrase, release.
3. Polished or raw text appears at the cursor without manual copy-paste.
4. A new row appears in InkIt **History** with latency breakdown when paste succeeds.

<Check>
Paste success ends in `DictationState.idle` with no notch error. The menu bar returns to **Ink**.
</Check>

## When no editable field is focused

At release, `FocusedEditable.current()` re-checks whether an editable AX element has focus. If `focus.isEditable` is false — no text field, non-editable surface, or focus moved during dictation — InkIt does not fire Cmd+V.

Instead:

1. Transcript is saved to **History** (with polish outcome and latency).
2. `showHeldInHistoryNotice()` sets `DictationState.heldInHistory`.
3. Notch shows **Saved to History** for ~2.5 s, then returns to idle.
4. Copy the text from History or focus a field and dictate again.

<Info>
Pressing the hotkey during `heldInHistory` or `.error` starts a new take immediately — those states do not block the next dictation.
</Info>

## Quick recovery

| Symptom | Likely cause | Fix |
| --- | --- | --- |
| Notch: **Add your API key** | Empty `cartesiaAPIKey` | Settings or re-run onboarding API key step |
| Notch: **Mic access needed** | Microphone denied | System Settings ▸ Microphone → InkIt |
| Notch: **Accessibility needed** | AX not trusted | System Settings ▸ Accessibility → InkIt (one bundle only) |
| Notch: **Invalid API key** / **Out of credits** | Cartesia billing | Update key at [play.cartesia.ai/keys](https://play.cartesia.ai/keys) |
| Text in History, not at cursor | No editable focus at release | Click a text field, copy from History, or dictate again |
| Notch: **Paste failed** | Target app rejected Cmd+V | Re-focus the field and retry |
| Empty take after release | Silence or STT returned empty string | State collapses to idle with no history row |

STT errors surface briefly in the notch (1.5 s after release if shown mid-hold) and may persist service-issue flags on Home until the next successful dictation.

## End-to-end flow (post-onboarding)

```mermaid
stateDiagram-v2
    [*] --> idle
    idle --> recording: hotkey press
    recording --> finalizing: hotkey release
    finalizing --> rewriting: STT complete, polish enabled
    finalizing --> pasting: STT complete, no polish
    rewriting --> pasting: rewrite done
    pasting --> idle: Cmd+V success
    finalizing --> heldInHistory: no editable focus
    rewriting --> heldInHistory: no editable focus
    heldInHistory --> idle: 2.5s timer
    recording --> error: missing perm / STT fail
    error --> idle: 1.5s after release
    idle --> recording: hotkey during heldInHistory or error
```

## Related pages

<CardGroup>
<Card title="Installation" href="/installation">
Install from the release DMG or build with XcodeGen before running onboarding.
</Card>
<Card title="Permissions model" href="/permissions-model">
Tri-state permission cards, polling, and the Accessibility relaunch path.
</Card>
<Card title="Configure Cartesia API key" href="/configure-cartesia-api-key">
Keychain storage, validation, and service-issue flags.
</Card>
<Card title="Dictation pipeline" href="/dictation-pipeline">
Hotkey → audio → STT → polish → paste → history in full detail.
</Card>
<Card title="Paste and focus reference" href="/paste-and-focus-reference">
`FocusedEditable` checks, `heldInHistory`, and Cmd+V timing.
</Card>
<Card title="Configure hotkeys and mode" href="/configure-hotkeys-and-mode">
Change the shortcut or switch to hands-free toggle.
</Card>
</CardGroup>

---

## 04. Dictation pipeline

> End-to-end flow from hotkey press through audio capture (16 kHz PCM), Cartesia WebSocket STT, optional TranscriptRewriter polish, FocusedEditable target resolution, PasteService Cmd+V insertion, and TranscriptHistoryStore persistence. Latency stages and prewarm behavior.

- Page Markdown: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/04-dictation-pipeline.md
- Generated: 2026-06-15T22:07:44.908Z

### Source Files

- `InkIt/AppCoordinator.swift`
- `InkIt/AudioCaptureService.swift`
- `InkIt/CartesiaStreamingClient.swift`
- `InkIt/TranscriptRewriter.swift`
- `InkIt/PasteService.swift`
- `InkIt/TranscriptHistoryStore.swift`
- `InkIt/ContextSnapshot.swift`

---
title: "Dictation pipeline"
description: "End-to-end flow from hotkey press through audio capture (16 kHz PCM), Cartesia WebSocket STT, optional TranscriptRewriter polish, FocusedEditable target resolution, PasteService Cmd+V insertion, and TranscriptHistoryStore persistence. Latency stages and prewarm behavior."
---

`AppCoordinator` owns the dictation hot path: `HotkeyManager` callbacks start and stop a take, `AudioCaptureService` streams 16 kHz PCM into `CartesiaStreamingClient`, the final transcript optionally passes through `TranscriptRewriter`, `FocusedEditable` verifies an editable target at release, `PasteService` inserts via Cmd+V, and `TranscriptHistoryStore` persists the row with per-stage latency. State transitions (`DictationState`) and HUD fields (`liveTranscript`, `audioReady`, `inputLevel`) mirror that pipeline on the main actor.

## Pipeline overview

```mermaid
sequenceDiagram
    participant HK as HotkeyManager
    participant AC as AppCoordinator
    participant AU as AudioCaptureService
    participant STT as CartesiaStreamingClient
    participant RW as TranscriptRewriter
    participant FE as FocusedEditable
    participant PS as PasteService
    participant HS as TranscriptHistoryStore

    HK->>AC: onPress
    AC->>AC: resolve pasteTargetApp, TargetAppSnapshot
    AC->>FE: enableWebAccessibility(pid)
    AC->>RW: prewarm() (if polish enabled)
    AC->>STT: connect()
    AC->>AU: start(onChunk → sendAudio)
    AU-->>AC: onReady → audioReady = true
    STT-->>AC: onTranscriptUpdate → liveTranscript

    HK->>AC: onRelease (hold mode) or second press (toggle)
    AC->>AC: releaseTime = now, state = finalizing
    AC->>AU: stop()
    AC->>STT: finalizeAndClose()
    STT-->>AC: onClosed(finalText)

    AC->>RW: rewriteWithoutContext (optional)
    AC->>FE: current()
    alt editable focus
        AC->>PS: paste(text, targetApp)
        PS-->>AC: completion(true)
        AC->>HS: add(text, latency, polish)
    else no editable focus
        AC->>HS: add(text, latency, polish)
        AC->>AC: heldInHistory notice
    end
```

| Stage | Owner | Trigger | Output |
| --- | --- | --- | --- |
| Hotkey | `HotkeyManager` → `AppCoordinator` | Press / release (hold) or toggle press | `startDictation()` / `stopDictation()` |
| Audio | `AudioCaptureService` | `start()` after guards pass | 16 kHz mono `pcm_s16le` chunks |
| STT | `CartesiaStreamingClient` | `connect()` at press; `finalizeAndClose()` at release | Cumulative transcript via WebSocket |
| Polish | `TranscriptRewriter` | After `onClosed` when `correctionEnabled` and key present | Corrected text or raw fallback |
| Focus | `FocusedEditable` | After polish, before paste | `isEditable` + verified `NSRunningApplication` |
| Paste | `PasteService` | Editable focus confirmed | Cmd+V into target app |
| History | `TranscriptHistoryStore` | After polish (paste or held) | SwiftData `TranscriptRecord` row |

## Hotkey press: start dictation

`handleHotkeyPress()` branches on `SettingsStore.dictationMode`:

- **Hold** — press sets `isHotkeyHeld = true` and calls `startDictation()`.
- **Toggle** — press starts recording when idle; a second press while `.recording` calls `stopDictation()`.

`startDictation()` runs only from `.idle`, `.heldInHistory`, or `.error`. In-flight states (`.recording`, `.finalizing`, `.rewriting`, `.pasting`) ignore duplicate starts.

Pre-flight guards, in order:

1. Non-empty `cartesiaAPIKey` (Keychain).
2. Microphone permission (`permissions.hasMicrophone`).
3. Accessibility permission (`permissions.hasAccessibility`).

On success the coordinator sets `state = .recording`, `audioReady = false`, clears `liveTranscript`, optionally plays a start sound, and resolves the paste target:

- Frontmost app if it is not InkIt.
- Otherwise `lastExternalApp` (seeded at launch from frontmost or first regular non-InkIt app).

`TargetAppSnapshot.capture(from:)` records bundle ID, localized name, and PID for logging and future per-app behavior. It does not read on-screen content.

<Note>
Onboarding trial (`beginOnboardingTrial`) routes the final transcript to the Try It box only when InkIt is frontmost. Live preview is suppressed during trial; words appear all at once on release. Trial skips polish, paste, and LLM prewarm.
</Note>

## Audio capture (16 kHz PCM)

`AudioCaptureService` uses `AVAudioEngine` with a tap on the input node. Each buffer is:

1. Converted to **16 kHz, mono, 16-bit signed little-endian PCM** (`pcm_s16le`).
2. Delivered on a dedicated `inkit.audio` queue to the `onChunk` callback.
3. Metered on the main queue as a normalized 0…1 peak level (`onLevel` → `inputLevel`).

Device selection reads `settings.preferredInputDeviceUID` before `start()`. A pinned UID that is unplugged falls back to the system default input.

### `audioReady` and Bluetooth delay

`audioReady` stays `false` until the mic is genuinely producing signal:

- **Signal path** — first tap buffer with level above `readyLevelThreshold` (0.03) fires `onReady` once per take.
- **Fallback** — `readyFallbackDelay` (0.6 s) fires `onReady` anyway so a silent room does not leave the HUD stuck on "preparing."

Bluetooth mics often need 200–500 ms to switch from A2DP output to HFP input; digital silence during that window is lost at the hardware level. The HUD waits for `audioReady` so users do not speak into the dead gap.

`stop()` removes the tap, stops the engine, and **`queue.sync`** drains in-flight conversions before clearing the converter — preventing loss of the tail of the last word.

## Cartesia WebSocket STT

A fresh `CartesiaStreamingClient` is created per take with the user's Cartesia API key.

<ParamField body="endpoint" type="wss URL" required>
`wss://api.cartesia.ai/stt/turns/websocket`
</ParamField>

<ParamField body="query params" type="object" required>
`model=ink-2`, `encoding=pcm_s16le`, `sample_rate=16000`, `cartesia_version=2026-03-01`
</ParamField>

<ParamField body="headers" type="object" required>
`X-API-Key`, `Cartesia-Version: 2026-03-01`
</ParamField>

Audio flows as binary WebSocket frames. `turn.update` and `turn.eager_end` update the in-flight turn; completed `turn.end` events append to `completedTurns`. `onTranscriptUpdate` drives `liveTranscript` unless onboarding suppresses preview.

### Pending audio buffer

Audio captured before the server's `connected` event is held in `pendingAudio` and flushed in order in `handleConnected()`. Early frames are not sent until the session is ready, avoiding leading-word loss.

### Live transcript HUD

`client.onTranscriptUpdate` sets `liveTranscript` on the main actor. The notch HUD shows cumulative text during `.recording`.

## Hotkey release: finalize

`stopDictation()` runs only from `.recording`:

1. `releaseTime = DispatchTime.now()` — anchor for all latency measurements.
2. `state = .finalizing`.
3. Optional stop sound.
4. `audio.stop()` — last PCM chunks flushed.
5. `client.finalizeAndClose()` — sends `{"type":"close"}` when connected.

The client completes on the final `turn.end` after close (happy path), socket close, or a grace timer (3.0 s with content, 2.0 s when empty). `onClosed` delivers the joined, trimmed transcript.

Empty transcripts collapse to `.idle` with no history row and no error.

STT failures route through `handleSTTFailure`, which may set `cartesiaKeyInvalid` or `cartesiaOutOfCredits` and surfaces `failure.notchMessage` via `.error` state.

## Optional polish (`TranscriptRewriter`)

After `onClosed`, `correctedTranscript(raw:targetSnapshot:)` runs when both `settings.correctionEnabled` and a non-empty key for `settings.rewriteProvider` are present. Otherwise the raw transcript passes through with `PolishOutcome.off`.

When polish runs:

1. `state = .rewriting`.
2. `rewriteWithoutContext(transcript:)` POSTs to the configured `LLMProvider` endpoint (OpenAI-compatible chat completions, or Anthropic Messages API).
3. On success — polished text plus `original` for diffing (`PolishOutcome.polished`).
4. On failure — raw text pasted anyway (`PolishOutcome.failed`) with `PolishFailure` reason for the history tooltip.

Polish never blocks delivery: the user always gets at least the raw transcript.

Successful Cartesia transcription clears `cartesiaKeyInvalid` / `cartesiaOutOfCredits`. Polish outcomes update `polishKeyInvalid` / `polishOutOfCredits` on auth or billing failures.

## Focus resolution (`FocusedEditable`)

Paste target is resolved at **press**, but editability is verified at **release** via `FocusedEditable.current()`:

- Runs off the main thread under a **1.5 s** AX budget (`AX.run`).
- Fast path: system-wide focused element with settable `AXValue`.
- Chromium/Electron fallback: app-element descent, `AXFocused` subtree scan (max 1500 nodes), system-wide container descent.

If no editable field is focused, the transcript is saved to history with `pasteMs: 0`, `showHeldInHistoryNotice()` sets `.heldInHistory` for 2.5 s, and Cmd+V is not synthesized.

## Paste (`PasteService`)

When focus is editable, `state = .pasting` and `PasteService.paste(text:targetApp:)`:

1. Snapshots existing pasteboard items.
2. Writes dictated text with session tag `com.cartesia.InkIt.PasteSession` and transient type `org.nspasteboard.TransientType` (clipboard managers skip the entry).
3. Activates `targetApp` only when it is not already frontmost.
4. Waits `clipboardSettleDelay` (80 ms), plus `activationFocusDelay` (120 ms) when activation was needed.
5. Synthesizes Cmd+V via `CGEvent`.
6. Reports completion immediately (perceived paste latency ends here).
7. Restores the prior pasteboard after `clipboardRestoreDelay` (400 ms) if the session tag still matches.

Paste uses `focus.app ?? capturedTargetApp` — the release-verified app wins over the press-time target.

## History persistence (`TranscriptHistoryStore`)

`history.add(_:original:latency:polish:failure:)` inserts a `TranscriptRecord` into SwiftData and mirrors it in the published `entries` array (newest first).

<ResponseField name="Latency" type="object">
Per-stage milliseconds from hotkey release: `transcribeMs` (release → final transcript), `polishMs` (transcript → polish done), `pasteMs` (polish → paste completion). `totalMs` exposes only `transcribeMs + polishMs` for user-facing stats; `pasteMs` is recorded but excluded from totals.
</ResponseField>

<ResponseField name="Entry" type="object">
`text`, `timestamp`, optional `latency`, `original` (pre-polish), `polish` outcome (`off` / `polished` / `failed`), optional `failure` details.
</ResponseField>

On-disk persistence falls back to an in-memory SwiftData container if the on-disk store cannot open; rows survive the session but not relaunch. `lifetimeWords` in UserDefaults advances only when a row durably persists.

## Latency stages

All measurements use monotonic `DispatchTime` from `releaseTime` (set in `stopDictation()`).

| Segment | Start | End | Stored as |
| --- | --- | --- | --- |
| Transcribe | Hotkey release | `onClosed` transcript arrival | `transcribeMs` |
| Polish | Transcript arrival | Polish completion | `polishMs` (0 when off) |
| Paste | Polish completion | Paste completion callback | `pasteMs` |

Onboarding trial captures only `transcribeMs` in `lastTrialLatency` (no polish or paste).

Home "avg time to text" and history breakdowns use `totalMs = transcribeMs + polishMs`.

## Prewarm behavior

Three prewarm paths overlap work during the hold window so post-release latency stays low:

### LLM connection prewarm

At key-press, when polish is enabled and a provider key exists (and not onboarding trial), `AppCoordinator` constructs a `TranscriptRewriter` and calls `prewarm()`:

- Issues a `HEAD` request to `provider.endpoint` with a 2.5 s timeout.
- Warms DNS, TCP, and TLS on the same `URLSession` instance reused for the polish POST.
- Stored in `warmRewriter` and consumed once in `correctedTranscript`; reset on every `startDictation`.

### Chromium accessibility prewarm

At key-press, `FocusedEditable.enableWebAccessibility(pid:)` sets `AXManualAccessibility` and `AXEnhancedUserInterface` on the target app off the main thread. Chromium/Electron apps build their accessible tree lazily; without this, the release-time focus check can race the tree and wrongly hold transcripts in History.

### STT pending-audio buffer

Not a network prewarm, but the same principle: audio during WebSocket handshake is buffered and flushed after `connected`, so the first spoken words are not dropped while the socket initializes.

<Warning>
`warmRewriter` is skipped if settings change mid-hold or prewarm was not started. `correctedTranscript` falls back to a fresh `TranscriptRewriter` instance — polish still runs, but without connection reuse.
</Warning>

## Error and edge paths

| Condition | Behavior |
| --- | --- |
| Empty / whitespace transcript | `.idle`, no history row |
| STT named failure (401, 402, 429, offline, 5xx mid-hold) | `.error` with notch message; may set service-issue flags |
| STT `.unknown` with content | Deliver transcript silently |
| STT post-close 5xx with no content | Collapse to empty (rapid tap) |
| Polish failure | Raw text; `PolishOutcome.failed` in history |
| No editable focus at release | History only; `.heldInHistory` notice |
| Paste failure | `.error("Paste failed")` |
| Audio start failure | `.error`; STT session cancelled |

Duplicate InkIt bundle instances are detected at launch and surfaced as `lastError` — Accessibility grants apply per bundle path.

## Module map

```text
HotkeyManager
    └─ AppCoordinator (orchestrator, DictationState, HUD bindings)
           ├─ AudioCaptureService ──► CartesiaStreamingClient (WebSocket STT)
           ├─ TranscriptRewriter (optional polish)
           ├─ FocusedEditable + TargetAppSnapshot (target resolution)
           ├─ PasteService (clipboard + Cmd+V)
           └─ TranscriptHistoryStore (SwiftData persistence + Latency)
```

## Related pages

<CardGroup>
<Card title="Dictation state machine" href="/dictation-state-machine">
`DictationState` transitions, HUD coupling, and error dwell behavior.
</Card>
<Card title="Cartesia STT reference" href="/cartesia-stt-reference">
WebSocket contract, server events, `STTFailure` classification, and close-path metrics.
</Card>
<Card title="Paste and focus reference" href="/paste-and-focus-reference">
Clipboard session tagging, timing constants, and AX focus walk details.
</Card>
<Card title="Configure polish" href="/configure-polish">
Enable polish, pick provider and model, and interpret `PolishUIState`.
</Card>
<Card title="Input device selection" href="/input-device-selection">
Pin a microphone UID, Bluetooth profile switch, and `audioReady` thresholds.
</Card>
</CardGroup>

---

## 05. Dictation state machine

> DictationState lifecycle: idle, recording, finalizing, rewriting, pasting, heldInHistory, and error. Transitions on hotkey press/release, STT completion, polish outcome, paste result, and empty-transcript collapse. audioReady and liveTranscript HUD coupling.

- Page Markdown: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/05-dictation-state-machine.md
- Generated: 2026-06-15T22:08:29.977Z

### Source Files

- `InkIt/AppCoordinator.swift`
- `InkIt/NotchHUD.swift`
- `InkIt/SettingsStore.swift`
- `InkIt/FeedbackSoundPlayer.swift`

---
title: Dictation state machine
description: DictationState lifecycle from hotkey press through STT, polish, paste, and recovery paths — including audioReady and liveTranscript HUD coupling.
---

`DictationState` is the single source of truth for where a dictation take sits in its lifecycle. `AppCoordinator` owns the enum, publishes it to SwiftUI observers, and drives every transition from hotkey input, Cartesia STT completion, optional polish, paste outcome, and empty-transcript collapse. The notch HUD and menu bar read the same state but surface different subsets: the island stays live only during `.recording`, while post-release work runs mostly in the menu bar.

## States

`DictationState` is an `Equatable` enum with seven cases. Associated data appears only on `.error(String)`.

| State | Meaning | Menu bar label | Notch HUD |
| --- | --- | --- | --- |
| `idle` | No dictation in flight | `Ink` | Hidden (collapsed into notch) |
| `recording` | Hotkey active; audio streaming to Cartesia | `● Ink` | Live island with preparing cue or waveform |
| `finalizing` | Key released; audio stopped; STT session closing | `… Ink` | Hidden |
| `rewriting` | Optional LLM polish in progress | `✎ Ink` | Hidden |
| `pasting` | Cmd+V insertion into focused app | `↩ Ink` | Hidden |
| `heldInHistory` | Transcript saved to History; no editable field at release | `⬇ Ink` | Notice: "Saved to History" |
| `error(String)` | Recoverable failure with short message | `⚠ Ink` | Error notice with warning glyph |

`finalizing`, `rewriting`, and `pasting` are intentionally silent in the notch. The island contracts on hotkey release so polishing and pasting do not compete with the user's cursor; the menu bar still reflects those phases.

## State diagram

```mermaid
stateDiagram-v2
    [*] --> idle

    idle --> recording: hotkey start (from idle / heldInHistory / error)
    heldInHistory --> recording: hotkey start
    error --> recording: hotkey start

    recording --> finalizing: hotkey stop (hold release or toggle press)
    recording --> error: permission / API / audio / STT failure

    finalizing --> idle: empty transcript collapse
    finalizing --> rewriting: polish enabled + key present
    finalizing --> pasting: polish off or skipped, editable focus OK
    finalizing --> heldInHistory: no editable field at release
    finalizing --> idle: onboarding trial (no paste)

    rewriting --> pasting: editable focus OK
    rewriting --> heldInHistory: no editable field

    pasting --> idle: paste success
    pasting --> error: paste failed

    heldInHistory --> idle: 2.5s timer
    error --> idle: 1.5s dwell after release
```

## Hotkey-driven transitions

Hotkey semantics depend on `DictationMode` in `SettingsStore`:

<ParamField body="hold" type="DictationMode">
Press starts dictation; release stops it and begins the post-release pipeline (`finalizing` → polish/paste).
</ParamField>

<ParamField body="toggle" type="DictationMode">
First press starts; second press stops and pastes. Release is ignored — only the next press ends the take.
</ParamField>

`startDictation()` accepts entry only from `idle`, `heldInHistory`, or `error`. Any other state (`recording`, `finalizing`, `rewriting`, `pasting`) returns immediately, preventing double-starts mid-flight.

On a successful start:

1. `state` becomes `.recording`
2. `audioReady` resets to `false`
3. `liveTranscript` clears
4. Optional start feedback sound plays
5. Paste target and Chromium AX prewarm run at press time
6. Cartesia WebSocket connects and `AudioCaptureService` begins streaming 16 kHz PCM

`stopDictation()` requires `.recording`. It records `releaseTime` for latency metrics, sets `.finalizing`, plays the stop cue, stops audio, and calls `finalizeAndClose()` on the STT client.

In hold mode, releasing while already in `.error` arms the error dismiss timer instead of calling `stopDictation()` — the user has finished reacting to the failure.

## Post-release pipeline

When Cartesia delivers the final transcript via `onClosed`, the coordinator branches:

### Empty transcript collapse

If the trimmed final text is empty, state returns directly to `idle` with no history row and no notch notice. This covers silence, rapid tap-and-release, and benign STT disconnects that `CartesiaStreamingClient.reportFailureOrCollapse` routes to a silent close.

### Onboarding trial path

When `routesFinalTranscriptToOnboarding` is active and InkIt is frontmost, the raw transcript lands in `liveTranscript`, trial latency is captured, and state returns to `idle` — no polish, no paste.

### Polish branch

When correction is enabled and an LLM API key exists, `correctedTranscript` sets `.rewriting` and awaits `TranscriptRewriter.rewriteWithoutContext`. A connection warmed at key-press (`warmRewriter`) is consumed here. Polish failure never blocks the user: the coordinator falls back to the raw transcript and records the failure on the history row.

### Focus check and paste

Before pasting, `FocusedEditable.current()` re-checks whether an editable field is focused at release. If not, the transcript is added to `TranscriptHistoryStore`, `showHeldInHistoryNotice()` sets `.heldInHistory` for 2.5 seconds, then auto-clears to `idle`.

If focus is valid, state becomes `.pasting` and `PasteService` synthesizes Cmd+V. Success writes history and returns to `idle`; failure calls `setError("Paste failed")`.

## Error handling

`setError(_:)` is the central failure path:

- Clears paste-target snapshots
- Sets `lastError` and `.error(message)`
- Stops audio and cancels the STT client
- Arms a cancelable 1.5s dismiss via `armErrorDismiss()`

While `isHotkeyHeld` is true in hold mode, the error notice stays on screen; dismiss re-arms until release. After release, the 1.5s tail gives a readable confirmation before returning to `idle`.

`handleSTTFailure` maps `STTFailure` to notch copy (`notchMessage`) and persists `cartesiaKeyInvalid` / `cartesiaOutOfCredits` for fixable causes. STT errors can surface mid-hold so the user stops talking into a dead session.

Pre-flight guards in `startDictation()` also route to `.error` for missing Cartesia key, microphone denial, or missing Accessibility (with throttled Settings re-open).

## audioReady and the preparing cue

`audioReady` is separate from `DictationState` but tightly coupled to `.recording` in the notch HUD.

<ResponseField name="audioReady" type="Bool">
Published by `AppCoordinator`. `false` at dictation start; `true` after `AudioCaptureService.onReady` fires once per take.
</ResponseField>

`AudioCaptureService` signals readiness when input level exceeds `readyLevelThreshold` (0.03) or, as a backstop, after `readyFallbackDelay` (0.6 s). The threshold distinguishes real signal from the digital silence Bluetooth mics emit during A2DP→HFP profile switch (~200–500 ms).

In `NotchHUDView.liveContent`:

- `audioReady == false` → `HUDPreparingDot` (soft pulsing dot)
- `audioReady == true` → `HUDWaveform` driven by `inputLevel`

The still→moving transition is the "start speaking" cue. It prevents users from talking into the dead gap before hardware capture is live.

## liveTranscript coupling

<ResponseField name="liveTranscript" type="String">
Interim STT text from `CartesiaStreamingClient.onTranscriptUpdate`. Cleared on each `startDictation()`.
</ResponseField>

The notch HUD does **not** render `liveTranscript` — only the waveform/preparing cue. Interim words surface elsewhere:

- **Normal dictation:** streaming updates during `.recording` (when not suppressed)
- **Onboarding Try It:** `suppressLivePreview` holds interim text back; the final transcript lands all at once in `TryItPracticeCard` via `onChange(of: transcript)`
- **Post-trial:** final raw text may populate `liveTranscript` before the card copies it into the editable field

`TryItPracticeCard` treats `.finalizing`, `.rewriting`, and `.pasting` collectively as "finalizing" for UI enablement — the send button stays disabled until the pipeline completes.

## Observable surfaces

| Surface | What it reads |
| --- | --- |
| Notch HUD | `state` (live / notice / error only), `audioReady`, `inputLevel` |
| Menu bar | `state` → `menuBarLabel`, `menuBarIconName`, `statusText`, `statusColor` |
| Try It card | `state`, `liveTranscript` |
| Home / History | Written on successful paste or held-in-history path, not on empty collapse |

## Design constraints

- **Press-time vs release-time targets:** Paste target is resolved at hotkey press (with AX prewarm), but editability is verified at release. Stale focus yields `heldInHistory`, not a blind paste.
- **Captured closures:** `onClosed` captures `pasteTargetApp` and `contextTargetSnapshot` at recording start so mid-flight `setError` cannot wipe a valid target.
- **Restart from notices:** `heldInHistory` and `error` are intentionally restartable — pressing the hotkey again means "dictate again" without waiting for auto-dismiss.
- **Latency anchor:** `releaseTime` (monotonic `DispatchTime`) anchors transcribe, polish, and paste millisecond breakdowns on history rows.

## Related pages

<Card href="/dictation-pipeline" title="Dictation pipeline" icon="workflow">
End-to-end flow from hotkey through audio capture, STT, polish, paste, and history persistence.
</Card>

<Card href="/hotkey-bindings" title="Hotkey bindings" icon="keyboard">
Hold vs toggle semantics, Carbon hotkeys, and Fn/Globe event-tap registration.
</Card>

<Card href="/paste-and-focus-reference" title="Paste and focus reference" icon="clipboard">
FocusedEditable checks, paste timing, and the heldInHistory fallback when no field is focused.
</Card>

<Card href="/input-device-selection" title="Input device selection" icon="mic">
Bluetooth profile switch delay and how readyLevelThreshold / readyFallbackDelay backstop audioReady.
</Card>

---

## 06. Permissions model

> Microphone (AVCaptureDevice) and Accessibility (AXIsProcessTrusted) requirements, PermissionState tri-state (granted, notRequested, needsManual), onboarding prompt flow, polling strategy, and the silent relaunch after Accessibility grant.

- Page Markdown: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/06-permissions-model.md
- Generated: 2026-06-15T22:08:44.017Z

### Source Files

- `InkIt/PermissionsService.swift`
- `InkIt/OnboardingView.swift`
- `InkIt/AppCoordinator.swift`
- `InkIt/Info.plist`
- `InkIt/InkIt.entitlements`

---
title: "Permissions model"
description: "Microphone (AVCaptureDevice) and Accessibility (AXIsProcessTrusted) requirements, PermissionState tri-state (granted, notRequested, needsManual), onboarding prompt flow, polling strategy, and the silent relaunch after Accessibility grant."
---

InkIt gates dictation on two macOS privacy grants tracked by the singleton `PermissionsService`: microphone capture via `AVCaptureDevice` TCC and Accessibility trust via `AXIsProcessTrusted()`. Both must be granted before onboarding advances past the permissions step or before `AppCoordinator.startDictation()` runs the capture-and-paste pipeline.

## Required permissions

| Permission | API | Why InkIt needs it |
| --- | --- | --- |
| Microphone | `AVCaptureDevice.authorizationStatus(for: .audio)` | `AudioCaptureService` records 16 kHz PCM while the hotkey is held and streams it to Cartesia STT. |
| Accessibility | `AXIsProcessTrusted()` | `HotkeyManager` installs suppressing `CGEventTap`s for Fn/Globe and bare-modifier bindings; `PasteService` synthesizes Cmd+V; `FocusedEditable` walks the AX tree to find editable targets. |

Carbon `RegisterEventHotKey` combos work without Accessibility, but the default Fn binding and paste-at-cursor both depend on the trust bit.

<Note>
InkIt ships **without App Sandbox** (`ENABLE_APP_SANDBOX: NO` in `project.yml`). Synthesized paste and global event taps require an unsandboxed binary.
</Note>

### Bundle declarations

Microphone and Apple Events usage strings live in `Info.plist`:

- `NSMicrophoneUsageDescription` — explains voice capture for Cartesia STT.
- `NSAppleEventsUsageDescription` — explains pasting into the focused app.

Entitlements in `InkIt.entitlements`:

- `com.apple.security.device.audio-input` — microphone access.
- `com.apple.security.automation.apple-events` — Apple Events for paste automation.

## PermissionState tri-state

Each permission maps to a `PermissionState` enum that drives onboarding card UI and manual-fix flows:

<ResponseField name="granted" type="PermissionState">
  The OS reports the permission as active. Onboarding shows an "Enabled" checkmark; Settings shows "Enabled".
</ResponseField>

<ResponseField name="notRequested" type="PermissionState">
  The system prompt has not been fired yet (microphone: `.notDetermined`; Accessibility: no prior `requestAccessibility()` call and no `resumeOnboardingAtPermissions` flag). The card offers an **Enable** button that triggers the TCC prompt.
</ResponseField>

<ResponseField name="needsManual" type="PermissionState">
  macOS will not show the prompt again — after denial, restriction, or a prior prompt attempt. The UI switches to numbered manual steps and an **Open System Settings** button that deep-links to the correct pane without re-firing TCC.
</ResponseField>

### Mapping rules

**Microphone** derives directly from `AVCaptureDevice.authorizationStatus(for: .audio)`:

| `AVAuthorizationStatus` | `PermissionState` |
| --- | --- |
| `.authorized` | `granted` |
| `.notDetermined` | `notRequested` |
| `.denied`, `.restricted` | `needsManual` |

**Accessibility** has no denied/restricted API — `AXIsProcessTrusted()` is simply `true` or `false`. InkIt infers `needsManual` when either:

- `requestAccessibility()` has already fired `AXIsProcessTrustedWithOptions(prompt: true)`, setting in-memory `axRequestedAt` and persisting `resumeOnboardingAtPermissions` in `UserDefaults`, or
- the `resumeOnboardingAtPermissions` flag is set (survives silent relaunch).

```mermaid
stateDiagram-v2
    [*] --> notRequested
    notRequested --> granted: User grants in TCC prompt
    notRequested --> needsManual: User denies or prompt already fired
    needsManual --> granted: User toggles in System Settings
    granted --> needsManual: User revokes in System Settings
```

## PermissionsService

`PermissionsService` is a `@MainActor` `ObservableObject` singleton (`PermissionsService.shared`) exposing:

| Published property | Type | Meaning |
| --- | --- | --- |
| `hasMicrophone` | `Bool` | `AVCaptureDevice` audio authorization is `.authorized` |
| `hasAccessibility` | `Bool` | `AXIsProcessTrusted()` returns `true` |
| `microphoneState` | `PermissionState` | Tri-state for onboarding UI |
| `accessibilityState` | `PermissionState` | Tri-state for onboarding UI |

### Refresh and polling

`refresh()` re-reads both grants and updates all four published properties.

Polling runs on a **0.5 s** repeating timer added to `RunLoop.main` in `.common` mode so updates continue while InkIt is backgrounded and the user is in System Settings. `startPolling()` creates the timer; `stopPolling()` intentionally does **not** invalidate it — polling stays alive for the process lifetime so one UI surface disappearing cannot leave another showing stale state.

An additional refresh fires on `NSApplication.didBecomeActiveNotification`.

### Request methods

**`requestMicrophone(_:)`**

1. `.notDetermined` — calls `AVCaptureDevice.requestAccess(for: .audio)` and, on a background queue, briefly starts an `AVAudioEngine` to force TCC evaluation under hardened runtime.
2. `.authorized` — refreshes and completes with `true`.
3. `.denied` / `.restricted` — opens `x-apple.systempreferences:com.apple.preference.security?Privacy_Microphone` and completes with `false`.

**`requestAccessibility()`**

Fires at most once per untrusted period:

1. First call — `AXIsProcessTrustedWithOptions([kAXTrustedCheckOptionPrompt: true])`, which pre-adds InkIt to the Accessibility list (disabled) so the user can flip the toggle without using the **+** button. Sets `axRequestedAt` and `resumeOnboardingAtPermissions = true`.
2. Subsequent calls — only `openAccessibilitySettings()` (no re-prompt; avoids the "bubble keeps popping" loop).

<Warning>
Call `requestAccessibility()` only from explicit user actions (onboarding **Enable**, Settings **Enable**, throttled hotkey path). Never from the dictation hot path on every key press.
</Warning>

**`openAccessibilitySettings()`** — deep-links to `Privacy_Accessibility` without firing the TCC prompt. Used by `needsManual` cards.

**`confirmAccessibilityGrant()`** — programmatic re-check after the user returns from Settings. Refreshes; if still untrusted after more than 1 s since `axRequestedAt`, triggers silent relaunch. Not currently wired to a Settings UI button, but available for confirmation flows.

## Onboarding prompt flow

Onboarding step `.permissions` (`PermissionsStep`) presents two `PermissionCard` rows bound to `microphoneState` and `accessibilityState`.

<Steps>
<Step title="Land on permissions">
On first launch, the user reaches the permissions step after Welcome. If a silent relaunch just occurred, `OnboardingRootView` reads `resumeOnboardingAtPermissions`, clears it, and resumes directly at `.permissions`.
</Step>

<Step title="Grant microphone">
Tap **Enable** on the Microphone card. macOS shows the TCC dialog. On grant, polling flips `hasMicrophone` and the card shows **Enabled**. On deny, the card expands into the manual-fix layout with steps for **Privacy & Security ▸ Microphone**.
</Step>

<Step title="Grant Accessibility">
Tap **Enable** on the Accessibility card. InkIt fires the system "would like to control this computer" prompt and opens the Accessibility pane. Flip the InkIt toggle. Polling detects the grant via `AXIsProcessTrusted()`.
</Step>

<Step title="Continue">
**Continue** appears only when `hasMicrophone && hasAccessibility`. The progress dots gate forward navigation on the same live checks.
</Step>
</Steps>

### PermissionCard UI modes

| `PermissionState` | Card appearance | Primary action |
| --- | --- | --- |
| `notRequested` | Default row with subtitle | **Enable** → `enable` closure |
| `granted` | Green "Enabled" label | None |
| `needsManual` | Amber-tinted card with numbered steps | **Open System Settings** → `openSettings` (never re-prompts) |

`PermissionsStep.onAppear` calls `refresh()` and `startPolling()`.

## Settings surface

After onboarding, **Settings ▸ General ▸ Permissions** shows two `PermissionRow` entries with the same `requestMicrophone` / `requestAccessibility` actions. Unlike onboarding cards, Settings rows do not render the `needsManual` expanded layout — they show **Enable** until `granted` is true. Search indexes both rows under keywords like `mic`, `accessibility`, and `permission`.

`GeneralSettingsPane` starts polling on appear and calls `stopPolling()` on disappear (which only refreshes, per the shared polling policy).

## Dictation hot path

`AppCoordinator.startDictation()` refreshes permissions before recording:

1. Missing Cartesia API key → HUD error `"Add your API key"`.
2. `!hasMicrophone` → HUD error `"Mic access needed"`; calls `requestMicrophone`.
3. `!hasAccessibility` → HUD error `"Accessibility needed"`; calls `requestAccessibility()` at most once per **10 s** (`accessibilityPromptThrottle`) so repeated hotkey presses do not yank System Settings forward every time.

When Accessibility flips on mid-session, `AppCoordinator` re-registers the hotkey so Fn bindings upgrade from passive `NSEvent` monitors to suppressing `CGEventTap`s. Revocation triggers re-registration to downgrade and avoid a dead tap swallowing key presses.

### Duplicate bundle instances

At launch, `detectDuplicateRunningCopies()` warns when multiple processes share `ai.cartesia.InkIt`. Accessibility grants are per-bundle-path; running two copies (e.g. `/Applications/InkIt.app` and a local build) causes confusing trust behavior.

## Silent relaunch after Accessibility grant

macOS can leave `AXIsProcessTrusted()` stale in the running process even after the user toggles Accessibility on. When `confirmAccessibilityGrant()` determines trust is still false more than 1 s after the initial prompt, `relaunch()` runs:

```text
PermissionsService.relaunch()
  ├─ UserDefaults["resumeOnboardingAtPermissions"] = true
  ├─ NSWorkspace.openApplication(createsNewApplicationInstance: true)
  └─ NSApp.terminate(nil)   // old process exits
```

The new instance reads `resumeOnboardingAtPermissions` in `OnboardingRootView`, clears the flag, and opens at the permissions step so polling can read the fresh trust bit. On successful grant detection, `confirmAccessibilityGrant()` clears `axRequestedAt` and removes the resume flag.

<Info>
The relaunch is silent from the user's perspective: no dialog, just a brief process swap. If `openApplication` fails, InkIt falls back to opening Accessibility Settings.
</Info>

## Architecture overview

```mermaid
flowchart TB
    subgraph ui [UI surfaces]
        OB[Onboarding PermissionsStep]
        SET[Settings General Permissions]
    end

    subgraph svc [PermissionsService]
        PS[refresh / polling 0.5s]
        RM[requestMicrophone]
        RA[requestAccessibility]
        RL[relaunch]
    end

    subgraph os [macOS TCC]
        MIC[AVCaptureDevice audio TCC]
        AX[AXIsProcessTrusted]
    end

    subgraph coord [AppCoordinator]
        SD[startDictation guards]
        HK[HotkeyManager register]
    end

    OB --> PS
    SET --> PS
    OB --> RM & RA
    SET --> RM & RA
    PS --> MIC & AX
    RM --> MIC
    RA --> AX
    RA --> RL
    SD --> PS
    PS --> HK
```

## Troubleshooting signals

| Symptom | Likely cause | Fix |
| --- | --- | --- |
| **Enable** does nothing for Accessibility | Prior prompt already fired (`needsManual`) | Use **Open System Settings**; toggle InkIt in Accessibility |
| Hotkey works but Globe/Emoji still fires | Accessibility not granted; Fn tap fell back to passive monitor | Grant Accessibility; hotkey re-registers automatically |
| Grant toggled but InkIt still shows disabled | Stale in-process trust bit | Quit and relaunch, or wait for `confirmAccessibilityGrant` / silent relaunch path |
| "Multiple InkIt copies are running" | Two bundle paths both running | Quit duplicate; grant Accessibility to only one copy |
| Mic prompt never appears on first tap | Hardened-runtime TCC quirk | `requestMicrophone` also touches `AVAudioEngine`; retry or use manual Settings path |

## Related pages

<CardGroup>
<Card title="Quickstart" href="/quickstart">
Complete onboarding permissions, API key, and first dictation trial.
</Card>
<Card title="Configure hotkeys and dictation mode" href="/configure-hotkeys-and-mode">
Verify hotkey registration after Accessibility is granted.
</Card>
<Card title="Paste and focus reference" href="/paste-and-focus-reference">
How Accessibility powers Cmd+V paste and editable-field detection.
</Card>
<Card title="Runtime troubleshooting" href="/runtime-troubleshooting">
Duplicate bundle instances, stale Accessibility grants, and debug logging.
</Card>
<Card title="Installation" href="/installation">
Entitlements, no App Sandbox, and bundle identifier `ai.cartesia.InkIt`.
</Card>
</CardGroup>

---

## 07. Hotkey bindings

> HotkeyBinding variants: Carbon RegisterEventHotKey combos, Fn/Globe via CGEventTap with suppression, and bare modifier keys. DictationMode hold vs toggle semantics, isValidShortcut reserved-system checks, and HotkeyManager dedicated tap threads.

- Page Markdown: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/07-hotkey-bindings.md
- Generated: 2026-06-15T22:08:33.824Z

### Source Files

- `InkIt/SettingsStore.swift`
- `InkIt/HotkeyManager.swift`
- `InkIt/AppCoordinator.swift`
- `InkIt/SettingsView.swift`

---
title: "Hotkey bindings"
description: "HotkeyBinding variants: Carbon RegisterEventHotKey combos, Fn/Globe via CGEventTap with suppression, and bare modifier keys. DictationMode hold vs toggle semantics, isValidShortcut reserved-system checks, and HotkeyManager dedicated tap threads."
---

InkIt routes global dictation through `HotkeyBinding` (three registration backends in `HotkeyManager`), `DictationMode` (hold vs toggle gesture semantics in `AppCoordinator`), and persisted `SettingsStore` keys (`hotkeyKind`, `hotkeyKeyCode`, `hotkeyModifiers`, `dictationMode`). The default binding is Fn/Globe (`.fn`), claimed via a suppressing `CGEventTap` when Accessibility is granted.

## HotkeyBinding variants

`HotkeyBinding` is an `Equatable` enum with three cases. Each case maps to a distinct registration path in `HotkeyManager.register(binding:)`.

| Variant | Case | Registration backend | Suppression |
|---------|------|---------------------|-------------|
| Key combo | `.carbon(keyCode:modifiers:)` | Carbon `RegisterEventHotKey` | N/A (Carbon handles delivery) |
| Fn / Globe | `.fn` | `CGEventTap` at `cghidEventTap`, or passive `NSEvent` monitor | Active tap returns `nil` on Fn transitions |
| Bare modifier | `.modifierKey(keyCode:)` | Listen-only `CGEventTap`, or passive `NSEvent` monitor | Never suppressed (modifier must work in combos) |

```swift
enum HotkeyBinding: Equatable {
    case carbon(keyCode: UInt32, modifiers: UInt32)
    case fn
    case modifierKey(keyCode: UInt32)
}
```

### Carbon combos

Carbon bindings use `RegisterEventHotKey` with a fixed `EventHotKeyID` (signature `"INKS"`, id `1`). Press and release arrive as `kEventHotKeyPressed` / `kEventHotKeyReleased` without key repeat, even when another app is frontmost. An `InstallEventHandler` on `kEventClassKeyboard` dispatches `onPress` / `onRelease` to the main queue.

Carbon cannot represent Fn as a modifier, and cannot bind a lone modifier key — those cases use event taps instead.

### Fn / Globe

The Fn key (🌐 Globe on Apple keyboards) is not a standard Carbon modifier. `HotkeyManager` installs a `CGEventTap` on `flagsChanged` events, watches `.maskSecondaryFn`, and on Fn down/up:

1. Fires `onPress` / `onRelease` on the main queue.
2. Returns `nil` to swallow the event so macOS does not fire Globe actions (Emoji picker, Dictation, etc.).

<Warning>
Fn suppression requires Accessibility permission. Without it, `CGEvent.tapCreate` fails and `HotkeyManager` falls back to passive `NSEvent` global/local monitors that observe Fn but cannot suppress system Globe behavior.
</Warning>

### Bare modifier keys

Eight physical modifier keys are supported as standalone hotkeys (left and right are distinct):

| Key | `keyCode` constant |
|-----|-------------------|
| ⌘ Cmd | `kVK_Command`, `kVK_RightCommand` |
| ⌥ Opt | `kVK_Option`, `kVK_RightOption` |
| ⌃ Ctrl | `kVK_Control`, `kVK_RightControl` |
| ⇧ Shift | `kVK_Shift`, `kVK_RightShift` |

The modifier tap is **listen-only** (`options: .listenOnly`): it observes press/release for the specific `keyCode` but always passes events through so the modifier continues to work in normal shortcuts like ⌘+E.

## HotkeyManager architecture

`HotkeyManager` owns exactly one active backend at a time. `register(binding:)` calls `unregister()` first, then switches on the binding variant.

```mermaid
flowchart TB
    subgraph SettingsStore
        HB[hotkey: HotkeyBinding]
    end

    subgraph AppCoordinator
        RH[registerHotkey]
        HP[handleHotkeyPress]
        HR[handleHotkeyRelease]
    end

    subgraph HotkeyManager
        REG[register binding]
        CARBON[Carbon RegisterEventHotKey]
        FN_TAP[Fn CGEventTap suppressing]
        FN_PASS[Fn NSEvent passive monitor]
        MOD_TAP[Modifier listen-only tap]
        MOD_PASS[Modifier NSEvent passive monitor]
    end

    HB --> RH
    RH --> REG
    REG -->|carbon| CARBON
    REG -->|fn| FN_TAP
    FN_TAP -.->|no Accessibility| FN_PASS
    REG -->|modifierKey| MOD_TAP
    MOD_TAP -.->|tapCreate fails| MOD_PASS
    CARBON --> HP
    CARBON --> HR
    FN_TAP --> HP
    FN_TAP --> HR
    FN_PASS --> HP
    FN_PASS --> HR
    MOD_TAP --> HP
    MOD_TAP --> HR
    MOD_PASS --> HP
    MOD_PASS --> HR
```

### Dedicated tap threads

Fn and bare-modifier `CGEventTap` callbacks run on dedicated threads (`com.cartesia.InkIt.FnEventTap`, `com.cartesia.InkIt.ModifierEventTap`), not the main run loop. An active `flagsChanged` tap blocks every modifier-key press until its callback returns; servicing it on the main run loop would freeze Cmd/Shift/… system-wide during main-thread stalls (AX walks, heavy SwiftUI layout).

Each tap thread:

1. Adds the `CFMachPort` run-loop source to its own `CFRunLoop`.
2. Enables the tap.
3. Runs `CFRunLoopRun()` until `unregister()` stops the loop and tears down the source.

The callback itself only toggles state and async-hops to main for `onPress`/`onRelease`, keeping the tap thread responsive.

`FnKeyCapture` (used by `HotkeyRecorder` during shortcut recording) follows the same dedicated-thread pattern (`com.cartesia.InkIt.FnKeyCapture`).

### Callback contract

| Property | Type | Fired when |
|----------|------|------------|
| `onPress` | `(() -> Void)?` | Hotkey down (or toggle press while idle) |
| `onRelease` | `(() -> Void)?` | Hotkey up (hold mode only acts on release) |

Both callbacks are always dispatched to the main queue via `DispatchQueue.main.async`.

## DictationMode semantics

`DictationMode` controls how `AppCoordinator` interprets press and release events.

| Mode | Raw value | Display name | Press behavior | Release behavior |
|------|-----------|--------------|----------------|------------------|
| Hold | `hold` | Hold to talk | `startDictation()` | `stopDictation()` (unless error dwell active) |
| Toggle | `toggle` | Hands-free | If recording → `stopDictation()`; else → `startDictation()` | Ignored |

```mermaid
stateDiagram-v2
    [*] --> idle
    idle --> recording: hold press / toggle press
    recording --> finalizing: hold release / toggle press
    finalizing --> rewriting: STT complete
    rewriting --> pasting: polish done
    pasting --> idle: paste OK
    recording --> idle: empty transcript
    idle --> recording: press during heldInHistory or error
```

In hold mode, `isHotkeyHeld` keeps error notices visible for the entire press; they clear after release with a 1.5s dwell. In toggle mode, the second press is the stop signal — functionally equivalent to releasing in hold mode.

`SettingsStore.dictationModeVerb` returns `"Press"` for toggle and `"Hold"` for hold. Home, the notch HUD, and onboarding cues all read this single source of truth.

Default: `.hold`.

## isValidShortcut reserved-system checks

`HotkeyBinding.isValidShortcut` returns `true` for `.fn` and `.modifierKey` unconditionally. Only `.carbon` combos are validated against macOS-reserved shortcuts.

Rejected patterns:

| Pattern | Example | Reason |
|---------|---------|--------|
| ⌘ + common letter | ⌘C, ⌘V, ⌘Z, ⌘A, … | Standard edit/menu shortcuts |
| ⌘ + Space | ⌘Space | Spotlight |
| ⌃ + Space (no other mods) | ⌃Space | Input method switcher |
| ⌘⌥ + Esc | ⌘⌥Esc | Force Quit |
| ⌘⌃ + Q (no other mods) | ⌘⌃Q | Lock Screen |
| ⌘⇧ + 3/4/5 | ⌘⇧3 | Screenshots |
| ⌘ + Tab (no other mods) | ⌘Tab | App switcher |

On load, if a persisted carbon binding fails `isValidShortcut`, `SettingsStore` falls back to `.fn`.

The `HotkeyRecorder` rejects bare keys (no modifier) and shows a toast: `{keys} is invalid. Please try another.`

## Persistence

Hotkey state is stored in `UserDefaults` via `SettingsStore`:

<ParamField body="hotkeyKind" type="string">
Binding type: `"carbon"`, `"fn"`, or `"modifier"`.
</ParamField>

<ParamField body="hotkeyKeyCode" type="int">
Physical key code. Used for carbon combos and bare modifiers.
</ParamField>

<ParamField body="hotkeyModifiers" type="int">
Carbon modifier mask (`cmdKey`, `optionKey`, `controlKey`, `shiftKey`). Only for `"carbon"` kind.
</ParamField>

<ParamField body="dictationMode" type="string">
`"hold"` or `"toggle"`. Default `"hold"`.
</ParamField>

Default hotkey on first launch: `.fn` (`hotkeyKind` unset or any value other than `"carbon"` / `"modifier"`).

Legacy carbon default if kind is `"carbon"` but stored combo is invalid: falls back to `.fn`. Stored carbon fallback values are `kVK_Space` with `controlKey | optionKey` (⌃⌥Space).

Changing `settings.hotkey` triggers `saveHotkey()` and re-registration via `AppCoordinator.registerHotkey()`.

## Registration lifecycle

`AppCoordinator` owns a private `HotkeyManager` instance and wires callbacks at init:

```swift
hotkey.onPress = { [weak self] in Task { @MainActor in self?.handleHotkeyPress() } }
hotkey.onRelease = { [weak self] in Task { @MainActor in self?.handleHotkeyRelease() } }
```

| Phase | Registration state |
|-------|-------------------|
| Before onboarding completes | Hotkey unregistered; HUD dismissed |
| Onboarding "Try it" trial | `ensureHotkeyRegistration()` — hotkey active, HUD shown |
| After onboarding completes | `registerHotkey()` on every launch when HUD is live |
| Settings hotkey editing | `unregisterHotkey()` while recorder is active; restored on save/cancel |
| Accessibility granted | Re-register so Fn upgrades from passive monitor to suppressing tap |
| Accessibility revoked | Re-register so Fn downgrades to passive monitor (avoids dead tap swallowing keys) |

Pressing the hotkey without Accessibility shows `"Accessibility needed"` in the notch HUD and throttles opening System Settings to once per 10 seconds.

## Recording a new shortcut

`HotkeyRecorder` in Settings (Dictation pane) captures bindings through two local `NSEvent` monitors plus `FnKeyCapture`:

<Steps>
<Step title="Unregister the live hotkey">
`coordinator.unregisterHotkey()` prevents the current binding from firing while recording.
</Step>
<Step title="Capture Carbon combos">
`keyDown` monitor: requires at least one modifier; validates via `isValidShortcut`; Esc cancels.
</Step>
<Step title="Capture Fn">
`flagsChanged` with `.function`, `kVK_Function` keyDown, or `FnKeyCapture` tap — all save `.fn`.
</Step>
<Step title="Capture bare modifiers">
On `flagsChanged`, a lone modifier is held as `modifierCandidate` on press; saved as `.modifierKey(keyCode:)` on release only if no other key was pressed while it was down.
</Step>
<Step title="Persist and re-register">
`settings.hotkey = captured`; `coordinator.registerHotkey()`; toast `"Shortcut saved"`.
</Step>
</Steps>

Click outside the recorder, window resign-key, or Esc cancels editing and restores the previous registration.

## Display helpers

`HotkeyConversion` provides cross-layer utilities:

| Function | Purpose |
|----------|---------|
| `carbonModifiers(from:)` | `NSEvent.ModifierFlags` → Carbon mask |
| `displayString(keyCode:modifiers:)` | e.g. `⌃⌥S` |
| `displayTokens(for:)` | Keycap tokens for UI, e.g. `["⌃ Ctrl", "⌥ Opt", "S"]` |
| `modifierLabel(for:)` | Bare modifier label, e.g. `⌥ Opt →` for right Option |
| `isModifierKeyCode(_:)` | Whether keyCode is one of the eight bare modifiers |
| `isFunctionKey(_:)` | F1–F19 detection for recorder display |

`SettingsStore.hotkeyDisplayString` and `InkItApp` Home/status UI consume these tokens.

## Failure modes

| Symptom | Likely cause | Code behavior |
|---------|--------------|---------------|
| Fn opens Emoji/Dictation | No Accessibility; passive monitor active | Upgrade path: grant AX → auto re-register |
| Modifier keys freeze briefly | Tap serviced on main run loop (historical bug) | Fixed: dedicated tap threads; DEBUG `MainThreadWatchdog` guards regressions |
| Recorded shortcut not saved | Reserved-system combo | `isValidShortcut == false` → error toast |
| Hotkey silent after editing Settings | Recorder still active / registration not restored | Cancel or save triggers `registerHotkey()` |
| Globe works but dictation does not | Missing Cartesia key or mic permission | Separate guards in `startDictation()` |

<Note>
Carbon `RegisterEventHotKey` failure logs `InkIt: RegisterEventHotKey failed (%d)` but does not surface a user-visible error. Modifier and Fn paths degrade to passive monitors rather than failing entirely.
</Note>

## Related pages

<CardGroup>
<Card title="Configure hotkeys and mode" href="/configure-hotkeys-and-mode">
Record a shortcut in Settings, switch hold vs hands-free, and verify registration after Accessibility is granted.
</Card>
<Card title="Dictation state machine" href="/dictation-state-machine">
How hotkey press/release drives `DictationState` transitions from idle through recording, finalizing, and paste.
</Card>
<Card title="Permissions model" href="/permissions-model">
Accessibility requirement for Fn suppression and paste; microphone requirement before recording starts.
</Card>
<Card title="Settings reference" href="/settings-reference">
Full `UserDefaults` key listing including `hotkeyKind`, `hotkeyKeyCode`, `hotkeyModifiers`, and `dictationMode`.
</Card>
</CardGroup>

---

## 08. Configure Cartesia API key

> Obtain a Cartesia API key, enter it during onboarding or Settings, Keychain storage (account cartesiaAPIKey), advisory validation via APIKeyValidator, and persisted cartesiaKeyInvalid / cartesiaOutOfCredits flags that drive Home service-issue cards.

- Page Markdown: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/08-configure-cartesia-api-key.md
- Generated: 2026-06-15T22:08:42.671Z

### Source Files

- `README.md`
- `InkIt/SettingsStore.swift`
- `InkIt/CartesiaKeyValidator.swift`
- `InkIt/OnboardingView.swift`
- `InkIt/SettingsView.swift`
- `InkIt/CartesiaStreamingClient.swift`

---
title: "Configure Cartesia API key"
description: "Obtain a Cartesia API key, enter it during onboarding or Settings, Keychain storage (account cartesiaAPIKey), advisory validation via APIKeyValidator, and persisted cartesiaKeyInvalid / cartesiaOutOfCredits flags that drive Home service-issue cards."
---

InkIt stores your Cartesia credential in the macOS Keychain under account `cartesiaAPIKey`, reads it through `SettingsStore.cartesiaAPIKey` on every dictation, and passes it to `CartesiaStreamingClient` as the `X-API-Key` header on the Ink-2 STT WebSocket. Advisory validation (`CartesiaKeyValidator`) probes the key at entry time; runtime STT failures set persisted `cartesiaKeyInvalid` or `cartesiaOutOfCredits` flags that drive the Home “Dictation is paused” banner until the next successful transcription.

## Prerequisites

- A Cartesia account with an API key (free tier includes roughly 15,000 words of dictation per month).
- InkIt installed and launched at least once.
- For first-run setup: microphone and Accessibility permissions granted on the onboarding **Permissions** step before the API key step.

<Note>
Cartesia powers transcription only. Optional Polish uses separate per-provider keys (`llm.{provider}`) documented on [Configure Polish](/configure-polish).
</Note>

## Obtain a Cartesia API key

1. Open [play.cartesia.ai/keys](https://play.cartesia.ai/keys) (linked from onboarding and Settings).
2. Sign in or create a Cartesia account.
3. Create or copy an API key. Keys use the `sk_car_…` prefix shown in InkIt placeholders.

<Tip>
The README points to [play.cartesia.ai](https://play.cartesia.ai) for the free tier. InkIt’s in-app links target the `/keys` path directly.
</Tip>

## Enter the key

<Tabs>
<Tab title="Onboarding">

Onboarding step order: Welcome → Permissions → **API key** → Try It → Done.

<Steps>
<Step title="Reach the API key step">

Complete **Permissions** (microphone and Accessibility both granted), then advance to **Turn on the engine**.

</Step>
<Step title="Paste your key">

Enter the key in the masked `SecureField` (placeholder `sk_car_…`). The field binds to `settings.cartesiaAPIKey`; each change persists immediately to Keychain.

</Step>
<Step title="Review validation (optional)">

`CartesiaKeyValidator` debounces keystrokes by 0.6s and shows an inline verdict: **Verified**, **Invalid key**, or **Couldn’t verify**. Validation is advisory — it never blocks **Continue**.

</Step>
<Step title="Continue">

**Continue** enables when the trimmed key is non-empty. Emptying the key re-locks forward steps in the progress dots.

</Step>
</Steps>

</Tab>
<Tab title="Settings">

<Steps>
<Step title="Open Settings">

From Home, open Settings (gear). Go to **Dictation** → **Transcription**, or search for “Cartesia API key”.

</Step>
<Step title="Edit the key">

`CartesiaKeyField` uses `APIKeyField` with the same validator. The field masks at rest and reveals plaintext while focused (`RevealableSecureField`).

</Step>
<Step title="Confirm validation">

The trailing glyph and caption mirror onboarding states. **Invalid key** and **Couldn’t verify — check your connection** appear under the field for error states.

</Step>
</Steps>

</Tab>
</Tabs>

## Storage and persistence

```text
┌─────────────────────┐     didSet      ┌──────────────────────────────────┐
│ OnboardingView      │ ──────────────► │ SettingsStore.cartesiaAPIKey     │
│ CartesiaKeyField    │                 │ @Published                       │
└─────────────────────┘                 └──────────────┬───────────────────┘
                                                       │
                                                       ▼
                                        ┌──────────────────────────────────┐
                                        │ Keychain (account: cartesiaAPIKey) │
                                        │ service: Bundle.main.bundleId    │
                                        │ accessible: AfterFirstUnlock     │
                                        └──────────────────────────────────┘
```

### Keychain account

<ParamField body="cartesiaAPIKey" type="string" required>
Keychain generic-password account name for the Cartesia STT credential. Written on every `cartesiaAPIKey` change; an empty string removes the item.
</ParamField>

| Property | Value |
| --- | --- |
| Keychain account | `cartesiaAPIKey` |
| Keychain service | App bundle identifier (e.g. `com.cartesia.InkIt`) |
| Accessibility | `kSecAttrAccessibleAfterFirstUnlock` (readable while locked; excluded from iCloud backup) |
| Swift binding | `SettingsStore.cartesiaAPIKey` |
| Legacy UserDefaults key | `cartesiaAPIKey` (migrated to Keychain on launch, then scrubbed) |

<Warning>
Ad-hoc “Sign to Run Locally” builds use a UserDefaults fallback (`secretFallback.cartesiaAPIKey`) because Keychain items do not survive unstable code signatures. Release and Developer ID builds use Keychain normally.
</Warning>

## Advisory validation

`APIKeyValidator` is the shared validation engine; `CartesiaKeyValidator` supplies the Cartesia-specific probe.

### Probe request

:::endpoint GET /voices
Credit-free list call used only to confirm authentication. Listing voices costs nothing and requires a valid key. HTTP status is the verdict — unlike the STT WebSocket handshake, which cannot distinguish offline from rejected keys.
:::

<RequestExample>

```http
GET https://api.cartesia.ai/voices?limit=1 HTTP/1.1
X-API-Key: sk_car_…
Cartesia-Version: 2026-03-01
```

</RequestExample>

### Verdict mapping

| HTTP status | `APIKeyValidator.State` | UI label |
| --- | --- | --- |
| 2xx | `verified` | Verified |
| 400, 401, 403 | `invalidKey` | Invalid key |
| Other / transport error | `couldNotVerify` | Couldn’t verify |
| Empty input | `idle` | (hidden) |

Behavioral details:

- **Debounced**: 0.6s after the last keystroke before the network call.
- **Cached**: A settled verdict for the same key is not re-fetched.
- **Non-blocking**: Onboarding and Settings never require `verified` to save or proceed.
- **Timeout**: 8 seconds on the probe request.

<Info>
Validation success does not clear `cartesiaKeyInvalid` or `cartesiaOutOfCredits`. Only a successful live transcription clears those flags.
</Info>

## Runtime usage

When dictation starts, `AppCoordinator` guards on a non-empty key, then constructs `CartesiaStreamingClient(apiKey: settings.cartesiaAPIKey)`.

| Guard | Notch message | Persistent flag |
| --- | --- | --- |
| Empty key at hotkey press | `Add your API key` | None |
| Invalid key during STT (401/403) | `Invalid API key` | `cartesiaKeyInvalid = true` |
| Out of credits (402 / `quota_exceeded` / `plan_upgrade_required`) | `Out of credits` | `cartesiaOutOfCredits = true` |
| Offline, 5xx, rate limit | Respective `STTFailure.notchMessage` | None (transient) |

`STTFailure` classification lives in `CartesiaStreamingClient` and covers WebSocket upgrade failures (`urlSession(_:task:didCompleteWithError:)`) and in-session `error` events.

## Service-issue flags and Home banner

Two UserDefaults-backed booleans record user-fixable Cartesia problems:

<ParamField body="cartesiaKeyInvalid" type="boolean">
Set when STT returns `STTFailure.invalidKey` (401/403). Cleared on the next successful non-empty transcription.
</ParamField>

<ParamField body="cartesiaOutOfCredits" type="boolean">
Set when STT returns `STTFailure.outOfCredits` (402 or billing error codes). Cleared on the next successful non-empty transcription.
</ParamField>

`SettingsStore.transcriptionIssue` derives the active problem:

```swift
var transcriptionIssue: ServiceIssue? {
    guard !cartesiaAPIKey.trimmingCharacters(in: .whitespaces).isEmpty else { return nil }
    if cartesiaKeyInvalid { return .keyInvalid }
    if cartesiaOutOfCredits { return .outOfCredits }
    return nil
}
```

When `transcriptionIssue` is non-nil, Home shows a full-width **Dictation is paused** banner (not a transient notch flash):

| Issue | Message | CTA | Action |
| --- | --- | --- | --- |
| `keyInvalid` | Your Cartesia API key is invalid. Update it to start dictating again. | Update your Cartesia key | Opens Settings → Dictation |
| `outOfCredits` | You're out of Cartesia credits. Review your plan… | Review your Cartesia plan | Opens [play.cartesia.ai/subscription](https://play.cartesia.ai/subscription) |

`keyInvalid` takes precedence over `outOfCredits` when both flags are set. No key on file suppresses the banner entirely (treated as setup, not a fault).

```mermaid
stateDiagram-v2
    [*] --> Healthy: key set, no flags
    Healthy --> KeyInvalid: STT 401/403
    Healthy --> OutOfCredits: STT 402 / quota_exceeded
    KeyInvalid --> Healthy: successful transcription
    OutOfCredits --> Healthy: successful transcription
    note right of KeyInvalid
        cartesiaKeyInvalid = true
        Home banner + notch flash
    end note
    note right of OutOfCredits
        cartesiaOutOfCredits = true
        Home banner + notch flash
    end note
```

## Verify configuration

<Steps>
<Step title="Check inline validation">

In onboarding or Settings, confirm **Verified** after pasting the key. **Couldn’t verify** usually means offline — retry when connected.

</Step>
<Step title="Run Try It or dictate">

Complete onboarding **Try It**, or hold/toggle your hotkey and speak into any editable field. A successful session pastes text and clears any prior service-issue flags.

</Step>
<Step title="Confirm Home is clear">

With a valid key and healthy account, the **Dictation is paused** banner should not appear. If it persists after updating the key, dictate once successfully — flags clear only on live STT success, not on validation alone.

</Step>
</Steps>

## Troubleshooting

| Symptom | Likely cause | Fix |
| --- | --- | --- |
| Notch: `Add your API key` | Empty `cartesiaAPIKey` | Enter key in Settings → Dictation → Transcription |
| Notch: `Invalid API key`; Home banner | Rejected key at STT time | Replace key at [play.cartesia.ai/keys](https://play.cartesia.ai/keys), then dictate successfully |
| Notch: `Out of credits`; Home banner | Billing/quota exhausted | Review plan at [play.cartesia.ai/subscription](https://play.cartesia.ai/subscription) |
| Settings shows Verified but banner remains | Flags not cleared until live success | Complete one successful dictation after fixing the key |
| Validation: Couldn’t verify | Network or non-auth HTTP response | Check connectivity; validation is advisory |
| Key lost after local ad-hoc build | Unstable signature invalidates Keychain | Re-enter key, or use a stably signed build |

For STT failure classification, notch vs Home surfacing, and regression contracts, see [STT troubleshooting](/stt-troubleshooting).

## Related pages

<CardGroup>
<Card title="Quickstart" href="/quickstart">
First dictation walkthrough including onboarding API key entry and Try It.
</Card>
<Card title="Cartesia STT reference" href="/cartesia-stt-reference">
WebSocket contract, Ink-2 parameters, and `STTFailure` classification table.
</Card>
<Card title="Settings reference" href="/settings-reference">
All `SettingsStore` keys including service-issue flags and Keychain accounts.
</Card>
<Card title="STT troubleshooting" href="/stt-troubleshooting">
Diagnose transcription failures, notch messages, and Home card behavior.
</Card>
</CardGroup>

---

## 09. Configure Polish

> Enable optional transcript polish: pick an LLMProvider (Groq recommended), store per-provider keys in Keychain, select rewriteModel, toggle correctionEnabled, and interpret PolishUIState (setup, on, paused, keyBroken). Onboarding Groq-only validation path.

- Page Markdown: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/09-configure-polish.md
- Generated: 2026-06-15T22:09:30.541Z

### Source Files

- `README.md`
- `InkIt/LLMProvider.swift`
- `InkIt/SettingsStore.swift`
- `InkIt/TranscriptRewriter.swift`
- `InkIt/SettingsView.swift`
- `InkIt/OnboardingView.swift`

---
title: "Configure Polish"
description: "Enable optional transcript polish: pick an LLMProvider (Groq recommended), store per-provider keys in Keychain, select rewriteModel, toggle correctionEnabled, and interpret PolishUIState (setup, on, paused, keyBroken). Onboarding Groq-only validation path."
---

Polish is an optional post-STT rewrite pass: after Cartesia Ink-2 returns a raw transcript, `TranscriptRewriter` sends it to your chosen `LLMProvider` to strip fillers, fix ASR slips, and apply light punctuation — without paraphrasing. Configuration lives in **Settings → Polish** (`PolishSettingsView`), backed by `SettingsStore` (`correctionEnabled`, `rewriteProvider`, `rewriteModel`, per-provider Keychain keys). Polish is off by default; dictation works without it.

<Info>
Polish uses a separate API key from Cartesia STT. Transcription requires a Cartesia key; polish requires a key from Groq, Google Gemini, OpenAI, or Anthropic.
</Info>

## Where to configure

| Surface | When to use |
| --- | --- |
| **Settings → Polish** | Primary configuration: provider, model, API key, master toggle |
| **Home nudge** | Shown when `correctionEnabled` is false and `polishNudgeDismissed` is false; **Set up Polish** opens Settings on the Polish pane |
| **Home status card** | Appears when `polishIssue` is non-nil (invalid key or out of credits while polish is enabled) |

First-run onboarding does not include a dedicated Polish step. Optional polish is introduced after onboarding via the Home nudge or Settings. `GroqKeyValidator` exists as Groq-only advisory validation machinery intended for a focused onboarding path (pins Groq, hides the provider picker); provider switching and full validation run through `LLMKeyValidator` in Settings.

## Enable polish from Settings

<Steps>
<Step title="Open Settings → Polish">

From the Home gear button or menu bar, open Settings and select **Polish** in the sidebar.

</Step>
<Step title="Pick a provider">

Choose an `LLMProvider` in the **Provider** picker. Groq is marked **(Recommended)** (`LLMProvider.isRecommended`); defaults are `rewriteProvider = .groq` and `rewriteModel = "llama-3.3-70b-versatile"`.

</Step>
<Step title="Enter and verify an API key">

Paste the provider API key into **API key**. Keys persist in Keychain under account `llm.{provider}` (for example `llm.groq`). An empty value removes that Keychain item.

`LLMKeyValidator` debounces keystrokes (0.6s) and probes `GET` on each provider's models endpoint via `LLMProvider.validationRequest(key:)`. Validation is advisory — it never blocks typing or saving.

</Step>
<Step title="Confirm polish is on">

When validation reaches `.verified`, `PolishKeyField` calls `enablePolish(provider:)` if the pane is in **setup** or **keyBroken**. That sets `correctionEnabled = true`, aligns `rewriteModel` to the provider default if needed, and clears `polishKeyInvalid` / `polishOutOfCredits`.

</Step>
<Step title="Dictate and verify">

Hold or toggle your dictation hotkey, speak, and release. With polish on, the notch enters `.rewriting` while `TranscriptRewriter` runs, then pastes the polished text. History rows show a polish outcome indicator when rewrite succeeds.

</Step>
</Steps>

<Tip>
Groq's free tier needs no card (`LLMProvider.keyHint` for Groq: "Free tier, no card needed."). Key URLs: Groq `https://console.groq.com/keys`, Gemini `https://aistudio.google.com/apikey`, OpenAI `https://platform.openai.com/api-keys`, Anthropic `https://console.anthropic.com/settings/keys`.
</Tip>

## PolishUIState

`SettingsStore.polishUIState` drives the Polish settings pane layout. The API key is the gate: no key means setup; a stored key unlocks on, paused, or key-broken.

```mermaid
stateDiagram-v2
    [*] --> setup: no key for rewriteProvider
    setup --> on: verified key + enablePolish
    on --> paused: pausePolish (toggle off)
    paused --> on: toggle on
    on --> keyBroken: rewrite 401/403
    keyBroken --> on: verified key re-entered
```

| State | Condition | UI behavior |
| --- | --- | --- |
| `setup` | `!hasRewriteKey` | Provider picker, model row, key field — no master toggle |
| `on` | Key present, `correctionEnabled`, `!polishKeyInvalid` | Master toggle on; provider/key always editable |
| `paused` | Key present, `!correctionEnabled` | Master toggle off; key retained via `pausePolish()` |
| `keyBroken` | Key present, `correctionEnabled`, `polishKeyInvalid` | Warning banner; master toggle disabled until key is re-entered |

<ParamField body="correctionEnabled" type="boolean">
Master polish switch, persisted in UserDefaults. When false, `AppCoordinator.correctedTranscript` skips rewrite and pastes raw STT output. Default: false.
</ParamField>

<ParamField body="rewriteProvider" type="LLMProvider">
Selected LLM provider (`groq`, `gemini`, `openai`, `anthropic`). Changing provider clears `polishKeyInvalid` and `polishOutOfCredits` so the new provider starts clean.
</ParamField>

<ParamField body="rewriteModel" type="string">
Model ID sent to the provider chat endpoint. Shown read-only in Settings (one curated model per provider today). Auto-reset to `provider.defaultModel` when the stored model is not in `provider.models`.
</ParamField>

<ParamField body="llmAPIKeys" type="[String: String]">
In-memory map keyed by `LLMProvider.rawValue`. Each non-empty value syncs to Keychain account `llm.{provider}`.
</ParamField>

## Provider and model defaults

| Provider | Display name | Default model | Rewrite timeout |
| --- | --- | --- | --- |
| `groq` | Groq (Recommended) | `llama-3.3-70b-versatile` | 1.0s |
| `gemini` | Google Gemini | `gemini-2.5-flash-lite` | 2.0s |
| `openai` | OpenAI | `gpt-4.1-nano` | 2.0s |
| `anthropic` | Anthropic | `claude-haiku-4-5-20251001` | 2.0s |

Groq, Gemini, and OpenAI use OpenAI-compatible `/chat/completions` endpoints. Anthropic uses the native Messages API (`LLMProvider.isOpenAICompatible` is false for Anthropic).

## Key storage and validation

```text
SettingsStore
├── UserDefaults: correctionEnabled, rewriteProvider, rewriteModel
│                  polishKeyInvalid, polishOutOfCredits, polishNudgeDismissed
└── Keychain (account per provider)
    ├── llm.groq
    ├── llm.gemini
    ├── llm.openai
    └── llm.anthropic
```

On signed release builds, secrets stay in Keychain (`Keychain.usesKeychain`). Ad-hoc local builds fall back to namespaced UserDefaults (`secretFallback.llm.{provider}`) because ad-hoc re-signing orphans Keychain items.

### Validation probes

| Validator | Scope | Probe |
| --- | --- | --- |
| `LLMKeyValidator` | Settings Polish pane | `provider.validationRequest(key:)` → `GET` models list |
| `GroqKeyValidator` | Groq-only onboarding path (infrastructure) | `GET https://api.groq.com/openai/v1/models` with Bearer token |

`APIKeyValidator.State` values: `idle`, `checking`, `verified`, `invalidKey`, `couldNotVerify`. HTTP 2xx → verified; 400/401/403 → invalid key; transport errors → could not verify.

<Warning>
A verified key in Settings auto-enables polish from **setup** or **keyBroken** only. It does not auto-resume a deliberate **paused** state — turn the master toggle back on manually.
</Warning>

## Runtime behavior

When `correctionEnabled` is true and a key exists for `rewriteProvider`, the dictation pipeline runs polish after STT finalizes:

```mermaid
sequenceDiagram
    participant User
    participant AppCoordinator
    participant TranscriptRewriter
    participant LLM as LLM provider API

    User->>AppCoordinator: hotkey press
    AppCoordinator->>TranscriptRewriter: prewarm() (HEAD to endpoint)
    User->>AppCoordinator: speak, release
    AppCoordinator->>AppCoordinator: STT finalize
    AppCoordinator->>TranscriptRewriter: rewriteWithoutContext(raw)
    TranscriptRewriter->>LLM: POST chat completion
    LLM-->>TranscriptRewriter: polished text
    AppCoordinator->>User: paste polished transcript
```

- **Prewarm**: At hotkey press, `AppCoordinator` creates a `TranscriptRewriter` and calls `prewarm()` so TLS/TCP setup overlaps recording. Skipped for the onboarding Try It box (which never polishes).
- **Fallback**: On any `RewriteFailure`, the raw transcript is pasted. Persistent auth/billing failures set `polishKeyInvalid` (401/403) or `polishOutOfCredits` (402).
- **Success clears flags**: A polished outcome clears both `polishKeyInvalid` and `polishOutOfCredits`.

`enablePolish(provider:)` and `pausePolish()` centralize the provider/model/enabled trio so Settings and search results stay consistent.

## Service issues and Home surfacing

`SettingsStore.polishIssue` returns a `ServiceIssue` only when polish is enabled and a key exists:

| Issue | Persisted flag | Home card |
| --- | --- | --- |
| `keyInvalid` | `polishKeyInvalid` | "Polish is paused" — invalid key CTA opens Settings → Polish |
| `outOfCredits` | `polishOutOfCredits` | "Polish is paused" — billing CTA opens `rewriteProvider.billingURL` |

Unlike Cartesia transcription failures (full-width banner), polish problems are calm rail cards because raw text still pastes.

## Groq-only onboarding validation path

`GroqKeyValidator` subclasses `APIKeyValidator` with a fixed Groq `GET /openai/v1/models` probe. Comments in `LLMProvider.swift` describe the design intent: onboarding would pin Groq as the recommended provider, hide the provider picker, and validate with this focused subclass rather than the generic `LLMKeyValidator`.

Current first-run flow (`OnboardingStep`: welcome → permissions → apiKey → tryIt → done) does not wire `GroqKeyValidator`. Post-onboarding polish setup uses the Home nudge (Groq-preselected in Settings defaults) or **Settings → Polish** with the full four-provider picker and `LLMKeyValidator`.

## Verification checklist

<Check>
After configuration, confirm: `settings.polishUIState == .on`, the Polish master toggle is on, and a test dictation shows the notch briefly in rewriting before paste.
</Check>

<Check>
History records `polish: .polished` on success. If rewrite fails transiently (rate limit, timeout, offline), raw text still pastes and the row may show `polish: .failed` with a tooltip reason.
</Check>

<Check>
Toggling polish off sets `correctionEnabled = false` but retains the Keychain key. Re-enable via the master toggle without re-entering the key.
</Check>

## Related pages

<CardGroup>
<Card title="Dictation pipeline" href="/dictation-pipeline">
End-to-end flow from hotkey through STT, optional polish, paste, and history persistence.
</Card>
<Card title="LLM providers reference" href="/llm-providers-reference">
Endpoints, default models, timeouts, validation probes, and RewriteFailure reason codes.
</Card>
<Card title="Settings reference" href="/settings-reference">
All persisted SettingsStore keys including Keychain accounts and derived issue properties.
</Card>
<Card title="Polish troubleshooting" href="/polish-troubleshooting">
Rewrite failures, rate limits, timeout fallbacks, and history-row failure tooltips.
</Card>
</CardGroup>

---

## 10. Configure hotkeys and dictation mode

> Record a global shortcut in Settings, switch between hold-to-talk and hands-free toggle, validate against reserved macOS shortcuts, and verify registration after Accessibility is granted. notchHorizontalPosition and playFeedbackSounds side settings.

- Page Markdown: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/10-configure-hotkeys-and-dictation-mode.md
- Generated: 2026-06-15T22:09:44.785Z

### Source Files

- `InkIt/SettingsView.swift`
- `InkIt/SettingsStore.swift`
- `InkIt/HotkeyManager.swift`
- `InkIt/AppCoordinator.swift`
- `InkIt/FeedbackSoundPlayer.swift`

---
title: "Configure hotkeys and dictation mode"
description: "Record a global shortcut in Settings, switch between hold-to-talk and hands-free toggle, validate against reserved macOS shortcuts, and verify registration after Accessibility is granted. notchHorizontalPosition and playFeedbackSounds side settings."
---

InkIt exposes hotkey and activation controls on **Settings → Dictation**: `dictationMode`, `hotkey`, and `playFeedbackSounds` bind to `SettingsStore` and take effect immediately through `AppCoordinator.registerHotkey()` and the hotkey press/release handlers. `notchHorizontalPosition` is persisted in UserDefaults but has no Settings UI yet and is not read by `NotchHUDController`, which anchors to detected screen geometry.

## Where to configure

Open Settings from the Home gear icon (popover) or the standalone Settings window. The **Dictation** pane groups:

| Control | `SettingsStore` property | UserDefaults / storage |
| --- | --- | --- |
| Activation mode | `dictationMode` | `dictationMode` (`hold` \| `toggle`) |
| Dictation shortcut | `hotkey` | `hotkeyKind`, `hotkeyKeyCode`, `hotkeyModifiers` |
| Feedback sounds | `playFeedbackSounds` | `playFeedbackSounds` |
| Notch HUD horizontal offset | `notchHorizontalPosition` | `notchHorizontalPosition` |

Search also surfaces these rows (`general.activation`, `general.hotkey`, `general.sound`).

<Steps>
<Step title="Open Dictation settings">

Open **Settings → Dictation**. The pane shows **Activation mode** cards, a **Shortcut** group (hotkey recorder + sound toggle), then microphone and transcription sections.

</Step>
<Step title="Choose activation mode">

Pick one card:

- **Hold to talk** (`hold`, default) — press and hold the shortcut while speaking; release to finalize and paste.
- **Hands-free** (`toggle`) — first press starts recording; second press stops and pastes.

`AppCoordinator` routes press/release differently per mode: hold mode calls `startDictation()` on press and `stopDictation()` on release; toggle mode ignores release and toggles start/stop on each press.

</Step>
<Step title="Record a global shortcut">

Click the hotkey field (pencil icon). The recorder:

1. Calls `coordinator.unregisterHotkey()` so the live binding does not compete with capture.
2. Shows `press new shortcut` and listens for the next qualifying input.
3. On save, writes `settings.hotkey`, calls `coordinator.registerHotkey()`, and shows a **Shortcut saved** toast.
4. On cancel (Esc, click outside, window resigns key), restores the previous binding via `registerHotkey()`.

Press **Esc** or click outside to abandon without saving.

</Step>
<Step title="Grant Accessibility when using Fn or bare modifiers">

Fn/Globe and bare-modifier bindings use a `CGEventTap`. macOS requires **Accessibility** for a suppressing tap (Fn) or a listen-only tap (bare modifiers). After grant, `AppCoordinator` re-registers the hotkey so Fn upgrades from the passive `NSEvent` monitor to the suppressing tap.

Verify under **Settings → General → Permissions** that Accessibility shows **Enabled**, or test the shortcut in another app and confirm the notch HUD appears without macOS Globe/Emoji firing (Fn path).

</Step>
<Step title="Verify in any app">

With onboarding complete and a Cartesia API key set, switch to a text field in another app. Use the configured gesture:

- **Hold mode:** hold shortcut → speak → release → text pastes (or saves to History if no editable field is focused).
- **Toggle mode:** press once → speak → press again → paste.

Home status hint and the rotating header reflect `dictationModeVerb` (`Press` vs `Hold`) and `hotkeyDisplayString`.

</Step>
</Steps>

## Activation mode semantics

```text
Hold (.hold)                    Toggle (.toggle)
─────────────                   ───────────────
press  → startDictation()       press while idle → startDictation()
release → stopDictation()       press while recording → stopDictation()
release ignored in toggle       (no release action)
```

<ParamField body="dictationMode" type="DictationMode" required>
Persisted activation style. Values: `hold` (default), `toggle`. Drives `AppCoordinator.handleHotkeyPress()` / `handleHotkeyRelease()` and UI copy via `dictationModeVerb`.
</ParamField>

<ResponseField name="dictationModeVerb" type="string">
`"Press"` when `dictationMode == .toggle`; `"Hold"` otherwise. Used in Home hints, onboarding copy, and `TryItPracticeCard`.
</ResponseField>

## Recording shortcuts

`HotkeyRecorder` accepts three `HotkeyBinding` variants:

| Variant | Capture path | Runtime backend |
| --- | --- | --- |
| `.carbon(keyCode, modifiers)` | Key-down with ≥1 modifier (⌘⌥⌃⇧) | Carbon `RegisterEventHotKey` |
| `.fn` | Fn / Globe key-down or `flagsChanged` | `CGEventTap` (suppresses Globe when AX granted) |
| `.modifierKey(keyCode)` | Lone modifier press+release (no other key) | Listen-only `CGEventTap` or passive monitor |

**Carbon combos** require at least one modifier. A bare letter or function key without modifiers is rejected with an error toast.

**Fn** is captured via a dedicated `FnKeyCapture` tap thread (same pattern as `HotkeyManager`) or passive monitors if the tap cannot be created.

**Bare modifiers** (left/right ⌘⌥⌃⇧) record on release only when held alone; pressing another key while the modifier is down clears the candidate so `⌘E` records as a combo, not a lone `⌘`.

### Defaults and persistence

- **Fresh install default:** `.fn` (Globe / Fn key).
- **Legacy invalid carbon restore:** if stored `hotkeyKind == "carbon"` fails `isValidShortcut`, load falls back to `.fn`.
- **Display:** `hotkeyDisplayString` and keycap tokens via `HotkeyConversion.displayTokens(for:)`.

## Reserved shortcut validation

Only `.carbon` bindings are validated. Fn and bare modifiers always pass.

`HotkeyBinding.isValidShortcut` rejects combos that clash with macOS system shortcuts:

| Rejected combo | Reason |
| --- | --- |
| `⌘` + A, C, F, L, N, O, P, Q, R, S, T, V, W, X, Z | Common app shortcuts |
| `⌘` + Space | Spotlight |
| `⌃` + Space (no other modifiers) | Input source switching |
| `⌘⌥` + Esc | Force Quit |
| `⌘⌃` + Q | Lock Screen |
| `⌘⇧` + 3, 4, 5 | Screenshot shortcuts |
| `⌘` + Tab (no non-⌘ modifiers) | App switcher |

Invalid capture shows: `{keys} is invalid. Please try another.` (toast, `.error` style). Valid capture shows **Shortcut saved**.

<Warning>
Reserved checks are advisory for Carbon combos only. They do not guarantee macOS will register every combo — `RegisterEventHotKey` can still fail at runtime (logged as `InkIt: RegisterEventHotKey failed`).
</Warning>

## Hotkey registration and Accessibility

Hotkeys register only after onboarding completes (`hasCompletedOnboarding`), except during the onboarding **Try it** trial (`beginOnboardingTrial` / `endOnboardingTrial`).

```text
AppCoordinator.refreshHUD()
  ├─ onboarding incomplete → unregisterHotkey(), no HUD
  └─ onboarding complete   → NotchHUDController, registerHotkey()
         hotkey.register(binding: settings.hotkey)
```

### Fn / Globe binding

| Accessibility state | Fn backend | Side effect |
| --- | --- | --- |
| Granted | Suppressing `CGEventTap` | Globe / Emoji / Dictation suppressed |
| Not granted | Passive `NSEvent` monitor | Fn observed; system Globe may still fire |

`permissions.$hasAccessibility` triggers `registerHotkey()` when trust flips on (upgrade tap) or off (downgrade to passive monitor after tap death).

### Dictation without Accessibility

Pressing the hotkey while Accessibility is missing shows notch error **Accessibility needed**, opens System Settings (throttled to once per 10s), and does not start dictation. Microphone and API key are checked first.

<Note>
Accessibility is also required for paste (`PasteService` Cmd+V synthesis) and focus detection. Hotkey registration and paste share the same permission gate in practice.
</Note>

## Feedback sounds

<ParamField body="playFeedbackSounds" type="bool">
Audible cues on hotkey press (dictation start) and release/stop (dictation finalize). Default: `true` when the key has never been set.
</ParamField>

When enabled, `AppCoordinator` calls:

- `FeedbackSoundPlayer.shared.playStart()` in `startDictation()`
- `FeedbackSoundPlayer.shared.playStop()` in `stopDictation()`

Sounds are bundled `cue-start.aiff` and `cue-stop.aiff`, played at volume `0.4` through cached `NSSound` instances so rapid press/release does not clobber playback.

Toggle: **Play sound on press and release** in the **Shortcut** group.

## notchHorizontalPosition

<ParamField body="notchHorizontalPosition" type="double">
Normalized horizontal HUD anchor on the active screen: `0.0` = left edge, `1.0` = right edge. Default `0.38` (slightly left of center). Clamped to `[0.04, 0.96]` on write.
</ParamField>

This key is persisted and published on `SettingsStore`, but **no Settings UI control** exists yet. `NotchHUDController.reposition()` currently centers the HUD from `NotchGeometry.detect(on:)` (physical notch center or screen midpoint on non-notch displays), not from `notchHorizontalPosition`.

To set manually (advanced):

```bash
defaults write $(/usr/libexec/PlistBuddy -c 'Print :CFBundleIdentifier' "$(dirname "$(dirname "$(pgrep -l InkIt | head -1)")")/Info.plist" 2>/dev/null || echo "com.cartesia.InkIt") notchHorizontalPosition -float 0.38
```

Or edit `~/Library/Preferences/<bundle-id>.plist` after locating InkIt's bundle identifier. Restart InkIt after changing UserDefaults until a live-binding UI ships.

<Info>
On Macs without a camera notch, the HUD uses a floating capsule below the menu bar rather than merging with a physical notch. See runtime troubleshooting for display-specific behavior.
</Info>

## Verification checklist

| Signal | Expected when configured correctly |
| --- | --- |
| Settings hotkey field | Keycap tokens match your binding (e.g. `🌐 fn`, `⌃ Ctrl` + `⌥ Opt` + `S`) |
| Home status hint | `{Press\|Hold}` + keycaps + `to dictate` / `to start and stop` |
| Notch HUD on press | Live waveform island while `state == .recording` |
| Feedback sound | Rising cue on press, falling cue on stop (if toggle enabled) |
| Fn + Accessibility | Globe picker does not appear on Fn press |
| Toast on save | **Shortcut saved** after valid capture |
| General → Accessibility | **Enabled** for Fn suppression and paste |

## Troubleshooting

| Symptom | Likely cause | Fix |
| --- | --- | --- |
| Shortcut never triggers | Onboarding incomplete, or hotkey unregistered during Settings edit | Complete onboarding; close hotkey recorder (saves or cancels) |
| `Accessibility needed` on press | `AXIsProcessTrusted()` false | Settings → General → Permissions → Enable Accessibility; relaunch if stale |
| Globe / Emoji opens on Fn | Passive Fn monitor (no AX) | Grant Accessibility; verify re-registration |
| Invalid shortcut toast | Reserved macOS combo | Pick a different modifier+key pair |
| No start/stop sounds | `playFeedbackSounds == false` | Enable **Play sound on press and release** |
| Shortcut works in InkIt only | Duplicate InkIt bundles with split AX grants | Quit duplicate copies; grant AX to one bundle path |
| `RegisterEventHotKey failed` in Console | Combo taken by another app or OS | Choose another shortcut |

## Related pages

<CardGroup>
<Card title="Hotkey bindings" href="/hotkey-bindings">
Carbon, Fn/Globe, and bare-modifier backends, `HotkeyManager` tap threads, and `isValidShortcut` implementation detail.
</Card>
<Card title="Permissions model" href="/permissions-model">
Microphone and Accessibility tri-state, polling, TCC prompts, and silent relaunch after Accessibility grant.
</Card>
<Card title="Dictation state machine" href="/dictation-state-machine">
How hotkey press/release drives `DictationState` transitions through recording, finalizing, and paste.
</Card>
<Card title="Settings reference" href="/settings-reference">
Full `SettingsStore` key table including `hotkeyKind`, `dictationMode`, `playFeedbackSounds`, and `notchHorizontalPosition`.
</Card>
<Card title="Runtime troubleshooting" href="/runtime-troubleshooting">
Duplicate bundle instances, Bluetooth mic delay, debug logging, and notch HUD behavior on non-notch displays.
</Card>
</CardGroup>

---

## 11. Input device selection

> Pin a preferred microphone by CoreAudio UID in Settings, fall back to system default when unplugged, Bluetooth A2DP→HFP switch delay and audioReady HUD cue, and the readyLevelThreshold / readyFallbackDelay backstop in AudioCaptureService.

- Page Markdown: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/11-input-device-selection.md
- Generated: 2026-06-15T22:09:42.845Z

### Source Files

- `InkIt/AudioDeviceManager.swift`
- `InkIt/AudioCaptureService.swift`
- `InkIt/AudioPCMConverter.swift`
- `InkIt/SettingsStore.swift`
- `InkIt/SettingsView.swift`

---
title: "Input device selection"
description: "Pin a preferred microphone by CoreAudio UID in Settings, fall back to system default when unplugged, Bluetooth A2DP→HFP switch delay and audioReady HUD cue, and the readyLevelThreshold / readyFallbackDelay backstop in AudioCaptureService."
---

InkIt pins dictation input by CoreAudio device UID (`kAudioDevicePropertyDeviceUID`), not by the transient `AudioDeviceID`. Settings persists the UID in UserDefaults; `AudioCaptureService` resolves it at each hotkey press via `AVAudioEngine.inputNode.auAudioUnit.setDeviceID(_:)` before reading the input format and streaming mono 16 kHz PCM to Cartesia.

## Why pin a microphone

macOS routes the system default input dynamically. Bluetooth headsets and AirPods often become the default when connected, switching from stereo output (A2DP) to a narrowband hands-free mic profile (HFP) only when capture starts. That hijack can degrade transcription without any visible setting change.

Pinning decouples InkIt's capture device from macOS routing: you can keep dictating through a wired or built-in mic even when Bluetooth gear is connected.

<Note>
Pinning stores a preference, not an exclusive lock. InkIt does not change the system default input device in macOS Sound settings.
</Note>

## Choose an input device in Settings

The picker lives in **Settings → Dictation → Microphone → Input device**. Search for "mic", "input", or "bluetooth" to surface the same control inline.

<Steps>
<Step title="Open the microphone picker">
Open Settings from the Home gear icon, select the **Dictation** pane, and find the **Microphone** section.
</Step>
<Step title="Select a device">
Choose **System default** to follow macOS, or pick a specific input from the list. Bluetooth devices appear with a `(Bluetooth)` suffix.
</Step>
<Step title="Read the advisory caption">
An orange caption appears when the pinned device is disconnected (fallback active) or when a Bluetooth mic is selected (accuracy warning).
</Step>
<Step title="Verify on next dictation">
Hold or toggle your dictation shortcut. The notch HUD shows a pulsing dot until the mic is live, then switches to the live waveform — that transition is the cue to start speaking.
</Step>
</Steps>

| Picker value | Stored UID | Capture behavior |
| --- | --- | --- |
| System default | `""` (empty string) | Routes to `kAudioHardwarePropertyDefaultInputDevice` on every take |
| Named device | CoreAudio UID string | Routes to that device when attached |
| Pinned but unplugged | Stale UID (still stored) | Falls back to system default; Settings shows an orange warning |

## Persistence

<ParamField body="preferredInputDeviceUID" type="string" default='""'>
UserDefaults key written by `SettingsStore`. Empty string means follow the macOS default. Non-empty values are stable CoreAudio UIDs that survive reboot and replug; the transient `AudioDeviceID` is never saved.
</ParamField>

The picker binds directly to `settings.preferredInputDeviceUID`. Changes take effect on the next dictation — there is no separate Apply step.

## Device enumeration

Two layers cooperate: a static CoreAudio query usable off the main actor, and a main-actor manager that keeps the Settings list fresh.

```text
  Settings UI                    Capture (per take)
  ─────────────                  ──────────────────
  AudioDeviceManager             AudioDevices (static)
    │ refresh on attach/detach     │ inputDevices()
    │ + default-input change       │ deviceID(forUID:)
    ▼                              │ defaultInputDeviceID()
  MicrophonePickerRow              ▼
    Picker ← preferredInputDeviceUID   AudioCaptureService.start()
                                         setDeviceID(pinned ?? default)
```

`AudioDevices.inputDevices()` returns every hardware device with at least one input stream, reading UID and name from CoreAudio and flagging `isBluetooth` when transport is `kAudioDeviceTransportTypeBluetooth` or `BluetoothLE`.

`AudioDeviceManager` registers property listeners on:

- `kAudioHardwarePropertyDevices` — device attach and detach
- `kAudioHardwarePropertyDefaultInputDevice` — system default changes

The picker refreshes automatically while Settings is open; each mount starts and stops its own manager instance.

## Capture-time routing

On every `startDictation`, `AppCoordinator` copies the stored preference into the capture service and starts the engine:

```swift
audio.preferredDeviceUID = settings.preferredInputDeviceUID
try audio.start { data in client?.sendAudio(data) }
```

Inside `AudioCaptureService.start()`:

1. Resolve `pinnedID = AudioDevices.deviceID(forUID: preferredDeviceUID)` — returns `nil` when the UID is empty or the device is unplugged.
2. Call `setDeviceID(pinnedID ?? defaultInputDeviceID())` while the engine is stopped and **before** reading `inputFormat(forBus: 0)`.
3. Build an `AudioPCMConverter` from the active device's native format to mono `pcm_s16le` at 16 kHz.
4. Install an input tap, start the engine, and arm the readiness backstop.

<Warning>
`AVAudioEngine` is reused across takes. When the preference is cleared (System default), InkIt explicitly resets to the current default on each `start()`. Without that reset, a previously pinned device would stick on the engine instance.
</Warning>

If `start()` throws, `AppCoordinator` surfaces `Audio start failed: …` in the notch HUD and cancels the STT session.

## Bluetooth A2DP→HFP and the audioReady HUD cue

Bluetooth headsets spend roughly 200–500 ms switching from output (A2DP) to the hands-free mic profile (HFP) after capture begins. During that window the hardware emits digital silence — speech in the gap is lost at the hardware level, not in InkIt's buffer.

InkIt addresses this with a two-stage readiness signal:

| Stage | UI | Meaning |
| --- | --- | --- |
| `audioReady == false` | `HUDPreparingDot` — single softly pulsing dot | Mic profile still coming up; wait before speaking |
| `audioReady == true` | `HUDWaveform` driven by `inputLevel` | Device is delivering real audio; safe to speak |

`AppCoordinator.audioReady` resets to `false` at the start of each recording. `AudioCaptureService.onReady` fires exactly once per take on the main queue; the coordinator sets `audioReady = true`.

```mermaid
sequenceDiagram
    participant User
    participant Coordinator as AppCoordinator
    participant Capture as AudioCaptureService
    participant HUD as NotchHUD

    User->>Coordinator: Hotkey press
    Coordinator->>Coordinator: audioReady = false
    Coordinator->>Capture: preferredDeviceUID + start()
    HUD->>HUD: HUDPreparingDot (pulsing)
    Note over Capture: Bluetooth A2DP→HFP switch<br/>~200–500 ms digital silence
    alt Input level > readyLevelThreshold
        Capture->>Coordinator: onReady()
    else readyFallbackDelay elapses
        Capture->>Coordinator: onReady() (timer backstop)
    end
    Coordinator->>HUD: audioReady = true
    HUD->>HUD: HUDWaveform (live level)
    User->>User: Start speaking
```

The still→moving transition in the notch is intentional: it tells you when to begin talking, especially on Bluetooth.

<Info>
Settings warns when a Bluetooth device is pinned: *"Bluetooth mics use a narrowband profile that can lower transcription accuracy. A wired or built-in mic usually works better."* Pinning a wired mic prevents AirPods from hijacking capture even when they remain the macOS default.
</Info>

## Readiness backstop constants

Readiness is detected in the input tap on the main queue. Each buffer's peak level is log-compressed to a normalized 0…1 float (with a −50 dB floor). When level exceeds the threshold, `signalReadyIfNeeded()` fires `onReady` and cancels the timer.

<ParamField body="readyLevelThreshold" type="Float" default="0.03">
Normalized peak level above which the device is treated as genuinely delivering audio, as opposed to the digital silence a Bluetooth mic emits during profile switch. Sits just above the meter noise floor.
</ParamField>

<ParamField body="readyFallbackDelay" type="TimeInterval" default="0.6">
Seconds after `start()` before readiness is signaled unconditionally. Covers the worst-case Bluetooth profile switch so the HUD never stays stuck on the preparing cue in a silent room where no buffer crosses the threshold.
</ParamField>

`signalReadyIfNeeded()` is idempotent: whichever path wins (signal or timer), the fallback `DispatchWorkItem` is cancelled and `hasSignaledReady` prevents duplicate callbacks.

## Unplugged fallback

When a pinned UID no longer matches any attached input device:

- Capture silently routes to the system default — dictation continues.
- Settings shows: *"Pinned mic isn't connected — using the system default until it's back"*
- The stored UID is **not** cleared; reconnecting the device restores the pin automatically.

## Troubleshooting

| Symptom | Likely cause | What to check |
| --- | --- | --- |
| First word clipped on AirPods | Spoke during A2DP→HFP switch | Wait for waveform (not pulsing dot) before speaking |
| Transcription quality dropped after connecting headphones | Bluetooth narrowband HFP profile | Pin built-in or wired mic in Settings |
| Picker shows warning but dictation works | Pinned device unplugged | Reconnect device or switch to System default |
| HUD stuck on preparing dot > 1 s | Audio start failure or permission issue | Microphone permission; check for `Audio start failed` error |
| Wrong mic despite pin | Stale engine state (rare) | Preference applies every take; restart InkIt if behavior persists |

Enable **Settings → General → Advanced → Debug logging** and inspect `~/Library/Logs/InkIt-debug.log` for `startDictation` traces when diagnosing capture routing.

## Related pages

<CardGroup>
<Card title="Dictation pipeline" href="/dictation-pipeline">
Hotkey press through 16 kHz PCM capture, Cartesia STT streaming, and paste — where pinned-device audio enters the pipeline.
</Card>
<Card title="Dictation state machine" href="/dictation-state-machine">
`DictationState` lifecycle and how `audioReady` couples to the recording HUD during `.recording`.
</Card>
<Card title="Settings reference" href="/settings-reference">
Full `SettingsStore` key inventory including `preferredInputDeviceUID`.
</Card>
<Card title="Permissions model" href="/permissions-model">
Microphone permission requirements before `AudioCaptureService.start()` can succeed.
</Card>
<Card title="Runtime troubleshooting" href="/runtime-troubleshooting">
Bluetooth mic profile delay, debug logging, and other non-API runtime issues.
</Card>
</CardGroup>

---

## 12. Settings reference

> All persisted SettingsStore keys: UserDefaults fields (appearance, hotkey, dictationMode, notch position, service-issue flags), Keychain accounts (cartesiaAPIKey, llm.{provider}), SMAppService launchAtLogin, and derived properties transcriptionIssue / polishIssue.

- Page Markdown: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/12-settings-reference.md
- Generated: 2026-06-15T22:09:54.324Z

### Source Files

- `InkIt/SettingsStore.swift`
- `InkIt/LLMProvider.swift`
- `InkIt/DebugLog.swift`
- `InkIt/SettingsView.swift`
- `DESIGN_SYSTEM.md`

---
title: Settings reference
description: All persisted SettingsStore keys — UserDefaults fields, Keychain accounts, SMAppService launch-at-login, and derived service-issue properties.
slug: settings-reference
---

`SettingsStore` is InkIt's single source of truth for user preferences and API credentials. It is a `final class` singleton (`SettingsStore.shared`) exposed as an `@EnvironmentObject` throughout the app. Properties are `@Published` so SwiftUI views react to changes; each write persists immediately to the appropriate backing store.

## Storage architecture

InkIt splits settings across three backends by sensitivity and system integration:

```mermaid
flowchart LR
  SS[SettingsStore]
  UD[UserDefaults.standard]
  KC[Keychain generic passwords]
  SM[SMAppService.mainApp]
  SS --> UD
  SS --> KC
  SS --> SM
```

| Backend | What lives here | Why |
| --- | --- | --- |
| **UserDefaults** | Non-secret preferences, hotkey encoding, service-issue flags | Fast reads; plist at `~/Library/Preferences/ai.cartesia.InkIt.plist` |
| **Keychain** | `cartesiaAPIKey` and per-provider LLM keys | Secrets never touch plaintext UserDefaults on signed builds |
| **SMAppService** | Launch-at-login registration | System Login Items is the source of truth — no separate persisted flag |

API keys use the Keychain **service** `Bundle.main.bundleIdentifier` (`ai.cartesia.InkIt`) with **account** names documented below. On ad-hoc or unsigned local builds, `Keychain.usesKeychain` is `false` and secrets fall back to UserDefaults keys prefixed `secretFallback.` (for example `secretFallback.cartesiaAPIKey`).

## UserDefaults keys

All keys below are written to `UserDefaults.standard` unless noted. Boolean keys default to `false` when absent unless a custom default is listed.

### Appearance and app chrome

<ParamField body="appearancePreference" type="string">
One of `system`, `light`, or `dark`. Maps to `AppearancePreference`. Applied app-wide via `NSApp.appearance` on change and at launch. Default on first run: `light`. The always-dark notch HUD ignores this setting.
</ParamField>

<ParamField body="hasCompletedOnboarding" type="bool">
Whether the user finished the onboarding flow (permissions, Cartesia key, Try It). Default: `false`. Gates notch HUD visibility until complete.
</ParamField>

### Dictation behavior

<ParamField body="dictationMode" type="string">
`hold` (press-and-hold, release to paste) or `toggle` (hands-free: press to start, press again to paste). Maps to `DictationMode`. Default: `hold`. Read by `AppCoordinator` hotkey handlers.
</ParamField>

<ParamField body="hotkeyKind" type="string">
Discriminator for the global dictation shortcut: `fn`, `carbon`, or `modifier`. Written by `saveHotkey()` whenever `hotkey` changes.
</ParamField>

<ParamField body="hotkeyKeyCode" type="int">
Carbon virtual key code (`UInt32` stored as `Int`). Used when `hotkeyKind` is `carbon` or `modifier`. For `carbon`, defaults to Space if missing; for `modifier`, defaults to Control.
</ParamField>

<ParamField body="hotkeyModifiers" type="int">
Carbon modifier mask (`UInt32` stored as `Int`). Only used when `hotkeyKind` is `carbon`. Default combo if missing: Control + Option + Space.
</ParamField>

<ParamField body="playFeedbackSounds" type="bool">
Whether InkIt plays audio cues on hotkey press and release. Default: `true` when the key is absent.
</ParamField>

<ParamField body="preferredInputDeviceUID" type="string">
CoreAudio UID of the pinned microphone, or `""` to follow the system default. Passed to `AudioCaptureService` at record time; stale UIDs fall back safely when the device is unplugged.
</ParamField>

### Polish (LLM rewrite)

<ParamField body="correctionEnabled" type="bool">
Master toggle for transcript polish. When `false`, raw STT output is pasted. Default: `false`. Pausing via `pausePolish()` keeps the key on file.
</ParamField>

<ParamField body="rewriteProvider" type="string">
Selected `LLMProvider` raw value: `groq`, `gemini`, `openai`, or `anthropic`. Default: `groq`. Changing provider clears `polishKeyInvalid` and `polishOutOfCredits`.
</ParamField>

<ParamField body="rewriteModel" type="string">
Model ID for the active provider. Default: the provider's `defaultModel` (for example `llama-3.3-70b-versatile` for Groq). Auto-corrected at init if invalid for the selected provider.
</ParamField>

<ParamField body="polishNudgeDismissed" type="bool">
Whether the Home "Polish your dictation" nudge was dismissed. Sticky across launches. The nudge only shows when polish is off.
</ParamField>

### Notch HUD

<ParamField body="notchHorizontalPosition" type="double">
Normalized horizontal position of the notch HUD on the active screen, `0.0` (left) to `1.0` (right). Clamped to `[0.04, 0.96]` on write. Default: `0.38` (slightly left of center). Persisted for future positioning control; the current `NotchHUDController` anchors to detected notch geometry rather than reading this value.
</ParamField>

### Service-issue flags

These booleans persist account problems that drive Home status UI until cleared by a successful operation. Transient failures (offline, 5xx, rate limit) are **not** persisted — they surface only in the notch HUD and history log.

<ParamField body="cartesiaKeyInvalid" type="bool">
Set when STT fails with an invalid or expired Cartesia key (401/403). Cleared on the next successful transcription. Drives the "Dictation is paused — invalid key" Home banner via `transcriptionIssue`.
</ParamField>

<ParamField body="cartesiaOutOfCredits" type="bool">
Set when STT fails for billing reasons (402 / quota exceeded). Cleared on the next successful transcription.
</ParamField>

<ParamField body="polishKeyInvalid" type="bool">
Set when a polish rewrite fails with 401/403. Cleared on successful rewrite, provider change, or `enablePolish()`. Drives Polish settings `keyBroken` state and `polishIssue`.
</ParamField>

<ParamField body="polishOutOfCredits" type="bool">
Set when polish fails for billing (402). Cleared on successful rewrite or provider change.
</ParamField>

### Advanced

<ParamField body="debugLoggingEnabled" type="bool">
Enables developer trace logging to `~/Library/Logs/InkIt-debug.log` and unified logging. Default: `false`. Shared key constant: `DebugLog.isEnabledKey`. Traces include raw transcripts and on-screen context — opt in only while debugging.
</ParamField>

### Legacy keys (migrated, then scrubbed)

On first launch after upgrade, `SettingsStore` migrates these plaintext UserDefaults entries into Keychain and removes them:

| Legacy key | Migrated to |
| --- | --- |
| `cartesiaAPIKey` | Keychain account `cartesiaAPIKey` |
| `llmAPIKeys` (dictionary) | Keychain accounts `llm.{provider}` |
| `anthropicAPIKey` | Keychain account `llm.anthropic` |

## Hotkey binding encoding

The in-memory `hotkey` property is a `HotkeyBinding` enum. Persistence splits it across `hotkeyKind`, `hotkeyKeyCode`, and `hotkeyModifiers`:

| `hotkeyKind` | In-memory variant | Persisted fields | Default |
| --- | --- | --- | --- |
| `fn` | `.fn` | kind only | **Yes** — Fn/Globe via `CGEventTap` |
| `carbon` | `.carbon(keyCode, modifiers)` | kind + keyCode + modifiers | Control+Option+Space if stored combo is invalid |
| `modifier` | `.modifierKey(keyCode)` | kind + keyCode | Left Control |

Carbon combos run `isValidShortcut` checks (reserved system shortcuts like ⌘C, ⌘Space, screenshot keys). Invalid stored combos fall back to `.fn` at init.

## Keychain accounts

Secrets are generic-password items. Empty strings remove the item.

<ResponseField name="cartesiaAPIKey" type="string">
Cartesia STT API key. Bound to `SettingsStore.cartesiaAPIKey`. Used by `CartesiaStreamingClient` for WebSocket auth.
</ResponseField>

<ResponseField name="llm.groq" type="string">
Groq API key for polish when `rewriteProvider` is `groq`.
</ResponseField>

<ResponseField name="llm.gemini" type="string">
Google Gemini API key.
</ResponseField>

<ResponseField name="llm.openai" type="string">
OpenAI API key.
</ResponseField>

<ResponseField name="llm.anthropic" type="string">
Anthropic API key.
</ResponseField>

The in-memory `llmAPIKeys` dictionary is keyed by `LLMProvider.rawValue`. Use `apiKey(for:)` and `setAPIKey(_:for:)` for per-provider access. Writing any entry syncs all provider slots to Keychain (empty values delete their items).

### Keychain persistence across rebuilds

Keychain items are bound to the app's code signature. **Signed release builds** (Developer ID, not ad-hoc) retain keys across updates. **Ad-hoc "Sign to Run Locally"** Debug builds re-sign each compile, which orphans Keychain items — those builds use the `secretFallback.*` UserDefaults path instead.

## Launch at login (`SMAppService`)

<ResponseField name="launchAtLogin" type="bool">
Mirrors `SMAppService.mainApp.status == .enabled`. **Not stored in UserDefaults.** Toggling registers or unregisters the login item; `syncLaunchAtLoginFromSystem()` reconciles when Settings opens (the user may change Login Items in System Settings). On registration failure, the toggle reverts to the actual system state.
</ResponseField>

## Derived properties (not persisted)

These are computed from persisted state and drive UI logic. They are not written to disk.

### `transcriptionIssue` and `polishIssue`

```mermaid
flowchart TD
  subgraph transcription
    CK[cartesiaAPIKey non-empty?]
    KI[cartesiaKeyInvalid]
    OC[cartesiaOutOfCredits]
    CK -->|no| TN[nil]
    CK -->|yes| KI
    KI -->|true| TKI[keyInvalid]
    KI -->|false| OC
    OC -->|true| TOC[outOfCredits]
    OC -->|false| TH[nil healthy]
  end
  subgraph polish
    CE[correctionEnabled + hasRewriteKey]
    PK[polishKeyInvalid]
    PO[polishOutOfCredits]
    CE -->|no| PN[nil]
    CE -->|yes| PK
    PK -->|true| PKI[keyInvalid]
    PK -->|false| PO
    PO -->|true| POC[outOfCredits]
    PO -->|false| PH[nil healthy]
  end
```

Both return `SettingsStore.ServiceIssue?` — either `.keyInvalid` or `.outOfCredits`, or `nil` when healthy.

| Property | Suppressed when | Home UI |
| --- | --- | --- |
| `transcriptionIssue` | No Cartesia key set (onboarding, not a fault) | Full-width **Dictation is paused** banner |
| `polishIssue` | Polish off or no rewrite key | Calm **Polish is paused** rail card |

**Setters:** `AppCoordinator.handleSTTFailure` sets Cartesia flags for `.invalidKey` / `.outOfCredits`. Polish flags are set in the `onClosed` handler when `RewriteFailure` maps to `.invalidKey` or `.outOfCredits`.

**Clearers:** Cartesia flags clear on any successful STT transcript. Polish flags clear on `.polished` outcome, provider change, or `enablePolish()`.

### Other derived helpers

| Property | Purpose |
| --- | --- |
| `hasRewriteKey` | Whether the active `rewriteProvider` has a non-empty Keychain key |
| `polishUIState` | Polish settings pane state: `setup`, `on`, `paused`, or `keyBroken` |
| `hotkeyDisplayString` | Human-readable shortcut for UI (for example `🌐 fn`, `⌃⌥Space`) |
| `dictationModeVerb` | `"Press"` for toggle mode, `"Hold"` for hold mode — single source for gesture copy |

### `PolishUIState` truth table

| State | Condition |
| --- | --- |
| `setup` | No rewrite key for the selected provider |
| `paused` | Key exists but `correctionEnabled == false` |
| `keyBroken` | Key exists, polish enabled, `polishKeyInvalid == true` |
| `on` | Key exists, polish enabled, key healthy |

## Settings UI mapping

Settings is organized into three panes. Each persisted property maps to a control:

| Pane | Property | Control |
| --- | --- | --- |
| **General** | `appearance` | Appearance card picker |
| **General** | `launchAtLogin` | Launch at login toggle |
| **General** | `debugLoggingEnabled` | Debug logging toggle |
| **Dictation** | `dictationMode` | Activation mode cards |
| **Dictation** | `hotkey` | Hotkey recorder |
| **Dictation** | `playFeedbackSounds` | Sound on press/release toggle |
| **Dictation** | `preferredInputDeviceUID` | Microphone picker |
| **Dictation** | `cartesiaAPIKey` | Cartesia API key field |
| **Polish** | `correctionEnabled` | Polish transcripts toggle |
| **Polish** | `rewriteProvider` | Provider picker |
| **Polish** | `rewriteModel` | Read-only model row |
| **Polish** | `llmAPIKeys[provider]` | Per-provider API key field |

Permissions (microphone, accessibility) are **not** stored in `SettingsStore` — they are runtime OS grants polled by `PermissionsService`.

## Programmatic access

`AppCoordinator` holds `let settings = SettingsStore.shared`. Any feature that needs preferences should read from this singleton rather than `UserDefaults` directly, so migrations and Keychain routing stay centralized.

```swift
// Read the active polish key
let key = SettingsStore.shared.apiKey(for: .groq)

// Check whether Home should show a service card
if let issue = SettingsStore.shared.transcriptionIssue { /* … */ }
```

## Related pages

<Card href="/configure-cartesia-api-key" title="Configure Cartesia API key" icon="key">
Keychain storage, validation, and how `cartesiaKeyInvalid` / `cartesiaOutOfCredits` drive Home cards.
</Card>

<Card href="/configure-polish" title="Configure Polish" icon="wand-and-stars">
Provider selection, per-provider Keychain keys, `correctionEnabled`, and `PolishUIState`.
</Card>

<Card href="/configure-hotkeys-and-mode" title="Configure hotkeys and dictation mode" icon="keyboard">
Hotkey recording, `DictationMode`, `playFeedbackSounds`, and `notchHorizontalPosition`.
</Card>

<Card href="/hotkey-bindings" title="Hotkey bindings" icon="command">
`HotkeyBinding` variants, Carbon vs Fn vs bare modifier, and `isValidShortcut` rules.
</Card>

<Card href="/input-device-selection" title="Input device selection" icon="mic">
`preferredInputDeviceUID` pinning, fallback behavior, and Bluetooth caveats.
</Card>

<Card href="/stt-troubleshooting" title="STT troubleshooting" icon="exclamationmark-triangle">
When Cartesia failures set persisted flags vs transient notch errors.
</Card>

<Card href="/polish-troubleshooting" title="Polish troubleshooting" icon="exclamationmark-triangle">
`polishKeyInvalid`, `polishOutOfCredits`, and raw-transcript fallback behavior.
</Card>

<Card href="/runtime-troubleshooting" title="Runtime troubleshooting" icon="wrench">
Debug logging, Keychain ad-hoc fallback, and notch HUD positioning.
</Card>

---

## 13. Cartesia STT reference

> CartesiaStreamingClient WebSocket contract: wss://api.cartesia.ai/stt/turns/websocket, model ink-2, encoding pcm_s16le, sample_rate 16000, cartesia_version 2026-03-01, server events, client close message, pending audio buffer, and STTFailure classification table with notchMessage strings.

- Page Markdown: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/13-cartesia-stt-reference.md
- Generated: 2026-06-15T22:10:17.736Z

### Source Files

- `InkIt/CartesiaStreamingClient.swift`
- `InkIt/AudioCaptureService.swift`
- `InkIt/CartesiaKeyValidator.swift`
- `InkItTests/STTFailureRoutingTests.swift`
- `InkIt/AppCoordinator.swift`

---
title: "Cartesia STT reference"
description: "CartesiaStreamingClient WebSocket contract: wss://api.cartesia.ai/stt/turns/websocket, model ink-2, encoding pcm_s16le, sample_rate 16000, cartesia_version 2026-03-01, server events, client close message, pending audio buffer, and STTFailure classification table with notchMessage strings."
---

`CartesiaStreamingClient` opens a streaming WebSocket to Cartesia Ink-2 STT on each dictation take. `AppCoordinator` constructs the client with the Keychain-stored Cartesia API key, calls `connect()` at hotkey press, pipes 16 kHz mono PCM from `AudioCaptureService` via `sendAudio(_:)`, and calls `finalizeAndClose()` on release. Transcript updates flow to the notch HUD through `onTranscriptUpdate`; classified failures surface through `onError` as `STTFailure` with `notchMessage` copy.

## WebSocket endpoint

:::endpoint WSS /stt/turns/websocket
Streaming speech-to-text over WebSocket. InkIt targets the Cartesia turns API at `wss://api.cartesia.ai/stt/turns/websocket`.
:::

<ParamField body="model" type="string" required>
Fixed to `ink-2`.
</ParamField>

<ParamField body="encoding" type="string" required>
Fixed to `pcm_s16le` — signed 16-bit little-endian PCM.
</ParamField>

<ParamField body="sample_rate" type="integer" required>
Fixed to `16000`.
</ParamField>

<ParamField body="cartesia_version" type="string" required>
Fixed to `2026-03-01`. Also sent as the `Cartesia-Version` request header.
</ParamField>

### Request headers

| Header | Value |
| --- | --- |
| `X-API-Key` | Cartesia API key from Keychain (`cartesiaAPIKey`) |
| `Cartesia-Version` | `2026-03-01` |

<RequestExample>

```http
GET wss://api.cartesia.ai/stt/turns/websocket?model=ink-2&encoding=pcm_s16le&sample_rate=16000&cartesia_version=2026-03-01 HTTP/1.1
X-API-Key: sk_car_…
Cartesia-Version: 2026-03-01
```

</RequestExample>

<Note>
A rejected WebSocket upgrade (invalid key, out of credits, rate limited) is delivered through `URLSessionWebSocketDelegate.urlSession(_:task:didCompleteWithError:)`, not the receive loop. The HTTP status on `task.response` drives `STTFailure` classification.
</Note>

## Audio payload

InkIt does not send the microphone's native format directly. `AudioCaptureService` taps the input device, and `AudioPCMConverter` resamples to mono `pcm_s16le` at 16 kHz before each chunk reaches the WebSocket.

| Property | Value |
| --- | --- |
| Channels | 1 (mono) |
| Encoding | `pcm_s16le` |
| Sample rate | 16 000 Hz |
| Frame format | Raw binary WebSocket frames |
| Tap buffer size | 1024 frames (device-native rate) |

Each converted chunk is sent as a `.data` frame via `sendAudio(_:)`. Send failures classify through `STTFailure.classify(transportError:response:)`.

## Server events

Cartesia emits JSON text frames. `CartesiaStreamingClient.handleMessage(_:)` parses the `type` field and updates local transcript state.

| Event | InkIt handling |
| --- | --- |
| `connected` | Marks session ready, flushes `pendingAudio`, sends deferred close if hotkey already released |
| `turn.start` | Ignored (no state change) |
| `turn.update` | Replaces `currentTurn` with cumulative in-progress transcript; fires `onTranscriptUpdate` |
| `turn.eager_end` | Same as `turn.update` — cumulative transcript for the in-progress turn |
| `turn.resume` | Ignored — user continued a previously eager-ended turn |
| `turn.end` | Appends final turn text to `completedTurns`, clears `currentTurn`; if `awaitingClose`, completes session |
| `error` | Classifies via `status_code` and `error_code`, routes through `reportFailureOrCollapse` |

<ResponseExample>

```json
{"type":"turn.update","transcript":"hello wor"}
```

</ResponseExample>

<ResponseExample>

```json
{"type":"turn.end","transcript":"hello world"}
```

</ResponseExample>

<ResponseExample>

```json
{"type":"error","status_code":402,"error_code":"quota_exceeded","message":"…"}
```

</ResponseExample>

<Info>
Cartesia emits already-final words per turn. InkIt does not expose partial word-level tokens. The live HUD transcript is `completedTurns` joined with the latest `currentTurn`, space-separated and trimmed.
</Info>

## Client close message

When the user releases the hotkey, `AppCoordinator.stopDictation()` stops audio capture and calls `finalizeAndClose()`. Per the Cartesia STT protocol, the client sends a JSON close frame:

<RequestExample>

```json
{"type":"close"}
```

</RequestExample>

After close is requested:

1. The server processes all buffered audio and emits a final `turn.end` carrying the last word.
2. `awaitingClose` is set so the client completes on that `turn.end` rather than racing the socket close.
3. A fallback timer fires if the server never finishes: **3.0 s** when transcript content exists, **2.0 s** when it does not (silent or too-short press).

If `finalizeAndClose()` runs before the `connected` event, `pendingClose` is set and the close frame is deferred until `handleConnected()` flushes buffered audio first. Audio must be queued before the close frame.

```mermaid
sequenceDiagram
    participant AC as AppCoordinator
    participant CSC as CartesiaStreamingClient
    participant ACS as AudioCaptureService
    participant S as Cartesia STT

    AC->>CSC: connect()
    AC->>ACS: start(onChunk:)
    ACS-->>CSC: sendAudio(pcm chunks)
    Note over CSC: pendingAudio until connected
    S-->>CSC: {"type":"connected"}
    CSC->>S: flush pendingAudio
    S-->>CSC: turn.update / turn.end
    CSC-->>AC: onTranscriptUpdate
    AC->>ACS: stop()
    AC->>CSC: finalizeAndClose()
    CSC->>S: {"type":"close"}
    S-->>CSC: final turn.end
    CSC-->>AC: onClosed(transcript)
```

## Pending audio buffer

URLSession can queue early `send()` calls, but Cartesia may discard binary frames received before the session is fully initialized. InkIt buffers audio client-side until `connected` arrives.

| State | Behavior |
| --- | --- |
| `isConnected == false` | Chunks append to `pendingAudio` |
| `connected` event | `handleConnected()` sends all buffered chunks in order, then sets `isConnected = true` |
| `finalizeAndClose()` before `connected` | Sets `pendingClose`; close frame sent after flush |
| After `connected` | `sendAudio` sends directly on the WebSocket task |

Concurrent `sendAudio` callers are gated by `stateLock` so post-handshake sends cannot race ahead of the flush.

## Session completion paths

`finishClose(reason:reportClosed:)` is atomic — only one completion path fires `onClosed`. `CloseReason` values are recorded in `SessionMetrics` (stored in UserDefaults as `CartesiaCloseMetrics`).

| CloseReason | When |
| --- | --- |
| `finalTurnReceived` | Final `turn.end` after `{"type":"close"}` (happy path) |
| `serverClosed` | Server closed socket before post-close `turn.end` |
| `graceTimerExpired` | Fallback timer fired before server finished |
| `serverError` | Server sent `{"type":"error"}` |
| `silentNoAudio` | Unexplained ending or post-close 500 with no transcript content |
| `receiveFailed` | Receive loop or upgrade failure |
| `externalCancel` | `cancel()` from audio start failure or `setError` |

On error paths, `finishClose` is called with `reportClosed: false` so `onError` does not race with an empty `onClosed` that would reset coordinator state to `.idle` and wipe the notch notice.

## STTFailure classification

`STTFailure` classifies Cartesia `error` events (`status_code` + `error_code`) and transport-level `URLError` / HTTP upgrade failures. It drives the notch island message and, for two cases, persistent Home service-issue flags.

### Classification table

| `STTFailure` | `error_code` signals | HTTP `status_code` signals | Transport `URLError` | `notchMessage` |
| --- | --- | --- | --- | --- |
| `offline` | — | — | `.notConnectedToInternet`, `.networkConnectionLost`, `.cannotConnectToHost`, `.cannotFindHost`, `.dnsLookupFailed`, `.dataNotAllowed` | No internet |
| `serverError` | — | ≥ 500 | `.timedOut` | Server error |
| `rateLimited` | `concurrency_limited` | 429 | — | Too many requests |
| `outOfCredits` | `quota_exceeded`, `plan_upgrade_required` | 402 | — | Out of credits |
| `invalidKey` | — | 401, 403 | — | Invalid API key |
| `unknown` | anything else | unclassified (e.g. 400) | other codes; POSIX `ENOTCONN`/`EPIPE`/`ECONNRESET` | Couldn't transcribe |

### Failure routing (`reportFailureOrCollapse`)

Only **named** failures surface an error notch. `.unknown` never shows an error — it delivers whatever transcript exists or collapses silently.

| Condition | Outcome |
| --- | --- |
| Named failure (`!= .unknown`) | `onError(failure)` → notch shows `notchMessage`; session ends without `onClosed` |
| `.unknown` with transcript content | `onClosed(transcript)` — graceful goodbye |
| `.unknown` with no content | `onClosed("")` → coordinator returns to `.idle`, no error |
| `.serverError` after close requested, no content | Silent collapse (rapid tap-and-release; server returns 500 on ~zero-audio close) |
| `.serverError` mid-hold, no content | Surfaces `Server error` — user may still be speaking |

`STTFailureRoutingTests` locks these contracts: benign POSIX disconnects must classify as `.unknown`, named transport errors stay named, and post-close 500 on silent taps must not alarm the user.

## Coordinator integration

`AppCoordinator` wires the client callbacks during `startDictation()`:

| Callback | Coordinator action |
| --- | --- |
| `onTranscriptUpdate` | Updates `liveTranscript` for notch HUD (suppressed during onboarding trial) |
| `onError` | `handleSTTFailure` → `setError(failure.notchMessage)` |
| `onClosed` | Trims transcript; empty → `.idle`; non-empty → polish → paste or History |

`handleSTTFailure` persists fixable flags:

| Failure | Settings flag |
| --- | --- |
| `invalidKey` | `cartesiaKeyInvalid = true` |
| `outOfCredits` | `cartesiaOutOfCredits = true` |

A successful `onClosed` with non-empty text clears both flags. Errors show immediately mid-hold (via `setError`) and self-clear 1.5 s after hotkey release.

<Warning>
`CartesiaKeyValidator` probes `GET https://api.cartesia.ai/voices?limit=1` over HTTP to distinguish invalid keys from offline machines. That advisory check is separate from the live WebSocket handshake, where a rejected key and an offline host can look similar without the HTTP probe.
</Warning>

## Related pages

<CardGroup>
<Card title="Dictation pipeline" href="/dictation-pipeline">
End-to-end flow from hotkey through audio capture, STT, polish, paste, and history.
</Card>
<Card title="Configure Cartesia API key" href="/configure-cartesia-api-key">
Obtain and store the API key that authenticates the WebSocket.
</Card>
<Card title="STT troubleshooting" href="/stt-troubleshooting">
Diagnose transcription failures, notch surfacing, and regression contracts.
</Card>
<Card title="Input device selection" href="/input-device-selection">
Microphone pinning and the audio-ready HUD cue before PCM reaches the socket.
</Card>
</CardGroup>

---

## 14. LLM providers reference

> LLMProvider enum: Groq, Gemini, OpenAI, Anthropic endpoints, default models, rewriteTimeout ceilings, validationRequest probes, keyURL/billingURL, OpenAI-compatible vs Anthropic Messages API shapes, and RewriteFailure reason codes.

- Page Markdown: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/14-llm-providers-reference.md
- Generated: 2026-06-15T22:10:50.438Z

### Source Files

- `InkIt/LLMProvider.swift`
- `InkIt/TranscriptRewriter.swift`
- `InkIt/CartesiaKeyValidator.swift`
- `InkIt/SettingsStore.swift`
- `InkIt/TranscriptHistoryStore.swift`

---
title: "LLM providers reference"
description: "LLMProvider enum: Groq, Gemini, OpenAI, Anthropic endpoints, default models, rewriteTimeout ceilings, validationRequest probes, keyURL/billingURL, OpenAI-compatible vs Anthropic Messages API shapes, and RewriteFailure reason codes."
---

InkIt’s optional Polish step routes raw STT transcripts through `TranscriptRewriter`, which posts to whichever `LLMProvider` is selected in `SettingsStore.rewriteProvider`. The `LLMProvider` enum in `LLMProvider.swift` centralizes chat endpoints, curated models, per-provider timeouts, key/billing URLs, and credit-free validation probes; `TranscriptRewriter` handles the HTTP shapes and classifies failures into `RewriteFailure` cases that downstream code maps to history tooltips and persisted service-issue flags.

## Provider enum

`LLMProvider` is a `String`-backed enum with four cases: `groq`, `gemini`, `openai`, and `anthropic`. It conforms to `CaseIterable`, `Identifiable`, and `Hashable`; `id` is the `rawValue`.

| Case | `displayName` | `keyPlaceholder` | `isRecommended` | `isOpenAICompatible` |
|------|---------------|------------------|-----------------|----------------------|
| `groq` | Groq | `gsk_…` | `true` | `true` |
| `gemini` | Google Gemini | `AIza…` | `false` | `true` |
| `openai` | OpenAI | `sk-…` | `false` | `true` |
| `anthropic` | Anthropic | `sk-ant-…` | `false` | `false` |

Groq is the default `rewriteProvider` when no value is stored. Settings shows a “(Recommended)” badge on Groq via `isRecommended`.

<Info>
Each provider exposes one curated model in `models`, ordered with the recommended default first. `defaultModel` returns `models.first!`. If the stored `rewriteModel` is not in the provider’s list, Settings resets it to `defaultModel`.
</Info>

### Default models

| Provider | `defaultModel` |
|----------|----------------|
| Groq | `llama-3.3-70b-versatile` |
| Gemini | `gemini-2.5-flash-lite` |
| OpenAI | `gpt-4.1-nano` |
| Anthropic | `claude-haiku-4-5-20251001` |

## Chat endpoints

Polish uses provider-specific chat URLs. Anthropic speaks the native Messages API; all others use OpenAI-compatible `/chat/completions`.

| Provider | `endpoint` |
|----------|------------|
| Groq | `https://api.groq.com/openai/v1/chat/completions` |
| Gemini | `https://generativelanguage.googleapis.com/v1beta/openai/chat/completions` |
| OpenAI | `https://api.openai.com/v1/chat/completions` |
| Anthropic | `https://api.anthropic.com/v1/messages` |

:::endpoint POST /chat/completions (Groq, Gemini, OpenAI)
OpenAI-compatible polish request issued by `TranscriptRewriter.call`.

**Headers**
- `Content-Type: application/json`
- `Authorization: Bearer {apiKey}`

**Body fields**

<ParamField body="model" type="string" required>
Provider model ID from `rewriteModel` (e.g. `llama-3.3-70b-versatile`).
</ParamField>

<ParamField body="max_tokens" type="integer" required>
Computed as `min(1500, estimatedInputTokens * 3 + 80)` where `estimatedInputTokens = max(48, transcript.count / 3)`.
</ParamField>

<ParamField body="temperature" type="number" required>
Always `0`.
</ParamField>

<ParamField body="messages" type="array" required>
Two messages: a `system` role with flattened instruction text, and a `user` role wrapping the transcript in `<transcript>…</transcript>`.
</ParamField>

**Response extraction**
`choices[0].message.content` as a string.

:::
:::endpoint POST /v1/messages (Anthropic)
Native Messages API polish request.

**Headers**
- `Content-Type: application/json`
- `x-api-key: {apiKey}`
- `anthropic-version: 2023-06-01`

**Body fields**

<ParamField body="model" type="string" required>
Anthropic model ID (e.g. `claude-haiku-4-5-20251001`).
</ParamField>

<ParamField body="max_tokens" type="integer" required>
Same formula as OpenAI-compatible providers.
</ParamField>

<ParamField body="temperature" type="number" required>
Always `0`.
</ParamField>

<ParamField body="system" type="array" required>
Anthropic-style system blocks: `[{ "type": "text", "text": "…" }]`.
</ParamField>

<ParamField body="messages" type="array" required>
Single `user` message with `<transcript>…</transcript>` content.
</ParamField>

**Response extraction**
Concatenates all `content` blocks where `type == "text"`.

:::

## Rewrite timeouts

`rewriteTimeout` is the hard ceiling before `TranscriptRewriter` abandons polish and the caller falls back to the raw transcript. The value is applied to both `timeoutIntervalForRequest` and `timeoutIntervalForResource` on the rewriter’s `URLSession`, and to each request’s `timeoutInterval`.

| Provider | `rewriteTimeout` |
|----------|------------------|
| Groq | `1.0` s |
| Gemini, OpenAI, Anthropic | `2.0` s |

Groq’s 1.0 s ceiling is sized above observed p99 latency for the default Llama model; other providers get more headroom. On timeout, `RewriteFailure.timedOut` is returned and the raw transcript is pasted.

<Note>
`TranscriptRewriter.prewarm()` issues a `HEAD` to `provider.endpoint` with a separate 2.5 s timeout. The response is discarded; the goal is to warm DNS/TCP/TLS on the same `URLSession` before the hot-path `POST`.
</Note>

## Key and billing URLs

| Provider | `keyURL` | `billingURL` | `keyHint` |
|----------|----------|--------------|-----------|
| Groq | `https://console.groq.com/keys` | `https://console.groq.com/settings/billing` | Free tier, no card needed. |
| Gemini | `https://aistudio.google.com/apikey` | `https://aistudio.google.com/` | Free tier from Google AI Studio. |
| OpenAI | `https://platform.openai.com/api-keys` | `https://platform.openai.com/settings/organization/billing` | Uses your existing OpenAI account. |
| Anthropic | `https://console.anthropic.com/settings/keys` | `https://console.anthropic.com/settings/billing` | Uses your existing Anthropic account. |

Settings links the key field to `keyURL`. When Home surfaces a polish out-of-credits issue, the card action opens `billingURL` for the active provider.

## Key storage and settings keys

Per-provider API keys live in the macOS Keychain under account `llm.{provider.rawValue}` (e.g. `llm.groq`). `SettingsStore.llmAPIKeys` is a `[String: String]` dictionary keyed by `LLMProvider.rawValue`.

| Setting | UserDefaults key | Default |
|---------|------------------|---------|
| Active provider | `rewriteProvider` | `groq` |
| Active model | `rewriteModel` | Groq default model |
| Polish enabled | `correctionEnabled` | `false` |
| Key rejected (401/403) | `polishKeyInvalid` | `false` |
| Billing exhausted (402) | `polishOutOfCredits` | `false` |

Switching `rewriteProvider` clears `polishKeyInvalid` and `polishOutOfCredits` so verdicts do not carry across providers.

## Validation probes

InkIt validates polish keys with credit-free `GET /models` probes — no tokens spent.

### `validationRequest(key:)`

`LLMProvider.validationRequest` builds the probe for the selected provider:

| Provider | Validation URL | Auth |
|----------|----------------|------|
| Groq | `https://api.groq.com/openai/v1/models` | `Authorization: Bearer {key}` |
| Gemini | `https://generativelanguage.googleapis.com/v1beta/openai/models` | `Authorization: Bearer {key}` |
| OpenAI | `https://api.openai.com/v1/models` | `Authorization: Bearer {key}` |
| Anthropic | `https://api.anthropic.com/v1/models` | `x-api-key: {key}`, `anthropic-version: 2023-06-01` |

All probes use `GET`, `timeoutInterval = 8`, and are advisory — they never block dictation.

### Validator classes

`APIKeyValidator` debounces keystrokes by 0.6 s, ignores stale completions, and caches verdicts per key:

| HTTP status | `APIKeyValidator.State` |
|-------------|-------------------------|
| 2xx | `verified` |
| 400, 401, 403 | `invalidKey` (Gemini returns 400 for `API_KEY_INVALID`) |
| Other / transport error | `couldNotVerify` |

- **`LLMKeyValidator`** — used in Settings `PolishKeyField`; calls `provider.validationRequest` and repoints when the user switches providers.
- **`GroqKeyValidator`** — Groq-only subclass hard-wired to `GET https://api.groq.com/openai/v1/models`; intended for a focused onboarding polish path while Settings uses the provider-aware validator.

A verified key in setup or key-broken state triggers `SettingsStore.enablePolish(provider:)`.

## Rewrite pipeline

```mermaid
sequenceDiagram
    participant AC as AppCoordinator
    participant TR as TranscriptRewriter
    participant LP as LLMProvider endpoint
    participant HS as TranscriptHistoryStore

    AC->>TR: prewarm() on hotkey press
    TR->>LP: HEAD (discarded)
    AC->>TR: rewriteWithoutContext(transcript)
    TR->>LP: POST chat/messages
    alt success
        LP-->>TR: model text
        TR-->>AC: .success(cleaned)
        AC->>HS: polish=.polished, original=raw
    else failure
        LP-->>TR: HTTP/URLError
        TR-->>AC: .failure(RewriteFailure)
        AC->>HS: polish=.failed, text=raw, failure=PolishFailure
    end
```

`rewriteWithoutContext` guards against empty API keys (`.invalidKey`) and empty transcripts (`.unknown`), wraps input in `<transcript>…</transcript>`, and runs a length sanity check: responses longer than `max(120, transcript.count * 2.5 + 40)` are rejected as `.unknown`.

## `RewriteFailure` reason codes

`RewriteFailure` is the error enum returned by `TranscriptRewriter`. On any failure, `AppCoordinator` pastes the raw transcript and records a `TranscriptHistoryStore.PolishFailure` for the history row tooltip.

| `RewriteFailure` | Trigger | HTTP / network source |
|------------------|---------|----------------------|
| `rateLimited(retryAt:)` | Provider throttling | HTTP 429; `retryAt` from `Retry-After` header (seconds or HTTP date) |
| `offline` | No reachable network | `URLError`: `.notConnectedToInternet`, `.cannotConnectToHost`, `.cannotFindHost`, `.networkConnectionLost`, `.dataNotAllowed`, `.dnsLookupFailed` |
| `timedOut` | Request exceeded timeout | HTTP 408, 504; `URLError.timedOut`; also URLSession timeout |
| `invalidKey` | Auth rejected | HTTP 401, 403; empty API key before request |
| `outOfCredits` | Billing limit | HTTP 402 |
| `serverError` | Provider fault | HTTP 5xx |
| `unknown` | Unclassified | JSON parse failure, empty model output, length sanity reject, other `URLError` codes |

### Persistence side effects

Only two failure reasons set persisted Home-card flags via `AppCoordinator.correctedTranscript`:

| `RewriteFailure` | UserDefaults flag | Home `ServiceIssue` |
|------------------|-------------------|---------------------|
| `invalidKey` | `polishKeyInvalid = true` | `.keyInvalid` |
| `outOfCredits` | `polishOutOfCredits = true` | `.outOfCredits` |

A successful rewrite clears both flags. Transient failures (rate limit, offline, timeout, server error) surface in the notch moment and history tooltip but do not set persisted flags.

### History mapping

`AppCoordinator.polishResult` maps each `RewriteFailure` 1:1 to `PolishFailureReason` (same case names). `TranscriptHistoryStore.PolishFailure` adds `provider` (`displayName`) and optional `retryAt` for rate limits. History rows show an amber warning pill; hover text comes from `HistoryRow.failureMessage`, e.g.:

- Rate limit: `"{provider} rate limit — raw text pasted. {retry hint}"`
- Invalid key: `"Invalid {provider} API key — raw text pasted. Fix it in Settings."`
- Out of credits: `"Out of {provider} credits — raw text pasted. Review your {provider} plan to re-enable Polish."`

<Warning>
Polish failures never block paste. The user always receives at least the raw STT transcript; `PolishUIState.keyBroken` only reflects persisted auth failures, not transient errors.
</Warning>

## Request examples

<RequestExample>

```json title="OpenAI-compatible (Groq)"
POST https://api.groq.com/openai/v1/chat/completions
Authorization: Bearer gsk_…
Content-Type: application/json

{
  "model": "llama-3.3-70b-versatile",
  "max_tokens": 230,
  "temperature": 0,
  "messages": [
    { "role": "system", "content": "You are a transcription cleaner…" },
    { "role": "user", "content": "<transcript>\nhey um can you send the draft by friday\n</transcript>" }
  ]
}
```

</RequestExample>

<RequestExample>

```json title="Anthropic Messages API"
POST https://api.anthropic.com/v1/messages
x-api-key: sk-ant-…
anthropic-version: 2023-06-01
Content-Type: application/json

{
  "model": "claude-haiku-4-5-20251001",
  "max_tokens": 230,
  "temperature": 0,
  "system": [{ "type": "text", "text": "You are a transcription cleaner…" }],
  "messages": [
    { "role": "user", "content": "<transcript>\nhey um can you send the draft by friday\n</transcript>" }
  ]
}
```

</RequestExample>

<RequestExample>

```http title="Validation probe (any OpenAI-compatible provider)"
GET https://api.groq.com/openai/v1/models
Authorization: Bearer gsk_…
```

</RequestExample>

## Related pages

<CardGroup>
<Card title="Configure Polish" href="/configure-polish">
Enable polish, pick a provider, enter keys, and interpret PolishUIState (setup, on, paused, keyBroken).
</Card>
<Card title="Dictation pipeline" href="/dictation-pipeline">
End-to-end flow from hotkey through STT, optional TranscriptRewriter polish, and paste.
</Card>
<Card title="Settings reference" href="/settings-reference">
All persisted SettingsStore keys including rewriteProvider, rewriteModel, and Keychain accounts.
</Card>
<Card title="Polish troubleshooting" href="/polish-troubleshooting">
Failure modes, rate-limit Retry-After handling, timeout fallbacks, and prewarm behavior.
</Card>
</CardGroup>

---

## 15. Paste and focus reference

> PasteService clipboard session tagging, Cmd+V synthesis, timing constants, FocusedEditable AX budget walk, Chromium enableWebAccessibility prewarm, heldInHistory when no editable field, and TargetAppSnapshot paste-target resolution chain.

- Page Markdown: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/15-paste-and-focus-reference.md
- Generated: 2026-06-15T22:10:44.831Z

### Source Files

- `InkIt/PasteService.swift`
- `InkIt/ContextSnapshot.swift`
- `InkIt/AppCoordinator.swift`
- `InkIt/AXTreeDumper.swift`
- `InkItTests/AXBudgetTests.swift`

---
title: "Paste and focus reference"
description: "PasteService clipboard session tagging, Cmd+V synthesis, timing constants, FocusedEditable AX budget walk, Chromium enableWebAccessibility prewarm, heldInHistory when no editable field, and TargetAppSnapshot paste-target resolution chain."
---

InkIt inserts dictated text by swapping `NSPasteboard.general`, synthesizing **Cmd+V** through a HID event tap, and restoring the user's prior clipboard off the critical path. Paste only runs after `FocusedEditable.current()` confirms an editable control is focused at hotkey release; otherwise the transcript is saved to History and the UI briefly enters `heldInHistory`.

## End-to-end paste path

At **hotkey press**, `AppCoordinator.startDictation()` resolves a press-time paste target, snapshots it as `TargetAppSnapshot`, and prewarms Chromium accessibility on that PID. At **hotkey release**, STT finalizes, optional polish runs, then `FocusedEditable` re-probes the live focus. If editable, `PasteService.paste(text:targetApp:completion:)` runs; if not, `TranscriptHistoryStore.add` persists the text and `showHeldInHistoryNotice()` surfaces a 2.5s confirmation.

```mermaid
sequenceDiagram
    participant User
    participant AppCoordinator
    participant FocusedEditable
    participant PasteService
    participant TargetApp

    User->>AppCoordinator: hotkey press
    AppCoordinator->>AppCoordinator: resolve pasteTargetApp + TargetAppSnapshot
    AppCoordinator->>FocusedEditable: enableWebAccessibility(pid)
    User->>AppCoordinator: hotkey release
    AppCoordinator->>AppCoordinator: STT finalize + optional polish
    AppCoordinator->>FocusedEditable: current() [AX budget 1.5s]
    alt editable focus
        AppCoordinator->>PasteService: paste(text, focus.app ?? capturedTarget)
        PasteService->>TargetApp: clipboard swap + Cmd+V
        PasteService-->>AppCoordinator: completion(true)
        AppCoordinator->>AppCoordinator: history.add, state → idle
    else no editable focus
        AppCoordinator->>AppCoordinator: history.add (pasteMs: 0)
        AppCoordinator->>AppCoordinator: heldInHistory (2.5s)
    end
```

<Info>
Press-time target resolution is a hint for activation and logging. Release-time `FocusedEditable` is authoritative for whether paste happens and which app receives Cmd+V.
</Info>

## TargetAppSnapshot and paste-target resolution

`TargetAppSnapshot` captures bundle ID, localized name, and PID at press. It is used for debug logging and polish context tags; InkIt does not read the target app's Accessibility tree for content.

<ResponseField name="TargetAppSnapshot" type="struct">
  <ResponseField name="bundleIdentifier" type="String?" />
  <ResponseField name="localizedName" type="String" />
  <ResponseField name="processIdentifier" type="pid_t" />
</ResponseField>

### Press-time resolution chain

`AppCoordinator` resolves `pasteTargetApp` in `startDictation()`:

| Step | Condition | Result |
|------|-----------|--------|
| 1 | `NSWorkspace.shared.frontmostApplication` is not InkIt | Use frontmost app |
| 2 | InkIt is frontmost | Fall back to `lastExternalApp` |
| 3 | `lastExternalApp` still nil (no prior activation event) | `seedLastExternalApp()` picks first regular non-InkIt running app |

`lastExternalApp` updates on `NSWorkspace.didActivateApplicationNotification`, excluding InkIt's own bundle ID.

`contextTargetSnapshot = TargetAppSnapshot.capture(from: pasteTargetApp)` runs immediately after resolution. Both `pasteTargetApp` and `contextTargetSnapshot` are **captured into the `onClosed` closure** at press time so mid-flight errors or stale callbacks cannot wipe the recording-start target.

### Release-time verified target

After polish, `FocusedEditable.current()` returns `Result(isEditable:app:)`. On paste:

```swift
paste.paste(text: correction.text, targetApp: focus.app ?? capturedTargetApp)
```

The release-verified `focus.app` wins; `capturedTargetApp` is the fallback when the AX probe returns a PID but no `NSRunningApplication`.

## PasteService

`PasteService` is the sole text-insertion primitive. It does not use AX value injection — only clipboard swap plus synthetic paste.

### Clipboard session tagging

Before Cmd+V, `PasteService`:

1. Snapshots existing `NSPasteboardItem` data (preserving non-string types).
2. Clears the pasteboard and writes the dictated string.
3. Tags the write with a per-paste UUID on `com.cartesia.InkIt.PasteSession`.
4. Marks the entry transient via `org.nspasteboard.TransientType` so clipboard-history apps skip dictated text.

Restore runs **0.4s after Cmd+V** (off the critical path). It proceeds only when both the string content and session UUID still match — if the user or another app wrote to the clipboard in between, InkIt leaves their data intact.

### Cmd+V synthesis

`synthesizeCmdV()` builds paired key-down/key-up `CGEvent`s for `kVK_ANSI_V` with `.maskCommand`, sourced from `.hidSystemState`, and posts to `.cghidEventTap`. Accessibility permission is required for this path (dictation already gates on AX at start).

### Timing constants

| Constant | Value | When paid |
|----------|-------|-----------|
| `clipboardSettleDelay` | 80 ms | Always, before Cmd+V |
| `activationFocusDelay` | 120 ms | Only when target is not already frontmost |
| `clipboardRestoreDelay` | 400 ms | After Cmd+V, before restore |

**Pre-delay** before Cmd+V:

- Target already frontmost (or `targetApp == nil`): `0.08s`
- Target needs `activate()`: `0.08s + 0.12s = 0.20s`

`activate()` is skipped when the target is already active or terminated. InkIt's recorder is a non-activating panel, so the common case pays only the 80 ms settle delay.

`completion(true)` fires immediately after Cmd+V — that marks the end of perceived paste latency. Clipboard restore does not block the user-visible result.

<ParamField body="paste(text:targetApp:completion:)" type="method">
  <ParamField body="text" type="String" required>
    Final transcript (polished or raw) to insert.
  </ParamField>
  <ParamField body="targetApp" type="NSRunningApplication?" required={false}>
    App to activate before paste. `nil` means paste into whatever is frontmost with no activation wait.
  </ParamField>
  <ParamField body="completion" type="(Bool) -> Void" required>
    Called on main after Cmd+V. `false` only when `setString` fails; restore failures are silent.
  </ParamField>
</ParamField>

## FocusedEditable and AX budget

`FocusedEditable` answers one question at release: **is an editable control focused right now?** Press-time target resolution can be stale if the user clicked a non-editable surface or focus moved during dictation.

### AX.run primitive

All Accessibility walks route through `AX.run(budget:_:)`, which:

- Spawns a detached `Task` at `.userInitiated` priority (never main thread).
- Passes a `deadline` exactly `budget` seconds from invocation.
- Returns whatever the closure produced when the deadline expires.

`FocusedEditable.current()` uses a **1.5 s** budget. The closure must check `deadline` inside its own loops; `Task` cancellation cannot preempt synchronous AX IPC already in flight.

`AXBudgetTests` locks this contract: off-main execution, deadline placement, and partial results when time expires.

### Editability probe resolution chain

Inside `resolve(deadline:)`:

| Stage | Mechanism | Purpose |
|-------|-----------|---------|
| Fast path | System-wide `kAXFocusedUIElementAttribute` → `isEditable` | Native fields, most Electron inputs |
| App descent | `AXUIElementCreateApplication(pid)` → `kAXFocusedUIElementAttribute` chain (`maxHops: 6`) | Chromium stops at web-area container |
| Subtree scan | Focused window BFS, `maxNodes: 1500`, match `kAXFocused` + editable | Slack Lexical-style composers |
| Container descent | Descend from system-wide focused element | Last resort |

`isEditable` primary signal: `kAXValueAttribute` is settable. Fallback roles: `kAXTextFieldRole`, `kAXTextAreaRole`, `kAXComboBoxRole`.

If `AXIsProcessTrusted()` is false at release (revoked mid-session edge), `current()` returns `isEditable: false, app: nil` — transcript goes to History rather than gambling Cmd+V.

<Warning>
Synchronous AX on the main thread stalls the UI and the Fn event tap, which can freeze modifier keys system-wide. Do not hand-roll AX walks outside `AX.run`.
</Warning>

## Chromium enableWebAccessibility prewarm

Chromium and Electron apps build their accessible tree lazily. Until marked accessibility-active, the system-wide focus query often stops at a web-area shell and the real textbox is invisible.

At **hotkey press**, when `pasteTargetApp` has a PID:

```swift
FocusedEditable.enableWebAccessibility(pid: targetPid)
```

This sets on the application element (off main thread, idempotent):

- `AXManualAccessibility` → `true`
- `AXEnhancedUserInterface` → `true`

Native apps ignore these harmlessly. At **release**, `resolve` re-asserts the same attributes on the app element for the focus-moved-mid-dictation case.

Prewarm at press overlaps tree construction with speaking time so the release-time probe is less likely to false-negative into `heldInHistory`.

## heldInHistory when no editable field

When `focus.isEditable` is false after polish:

1. `TranscriptHistoryStore.add` persists the text with `pasteMs: 0` in latency.
2. `showHeldInHistoryNotice()` sets `state = .heldInHistory`.
3. After **2.5 s**, state returns to `.idle` if still in `heldInHistory`.

HUD and menu bar feedback:

| Surface | Signal |
|---------|--------|
| Notch HUD | Notice pill: "Saved to History" |
| Menu bar label | `⬇ Ink` |
| Menu bar icon | `tray.and.arrow.down` |
| `statusText` | "Saved to History" |

The transcript is copyable from History. Pressing the hotkey again from `heldInHistory` starts a new dictation (same as from `idle` or `error`).

<Note>
`heldInHistory` is not an error state. It is a deliberate hold when Cmd+V would land in the wrong app or nowhere.
</Note>

## Latency accounting

`TranscriptHistoryStore.Latency` records per-stage milliseconds from hotkey release:

| Field | heldInHistory | Successful paste |
|-------|---------------|------------------|
| `transcribeMs` | release → final transcript | same |
| `polishMs` | transcript → polish finish | same |
| `pasteMs` | `0` | polish finish → paste completion |

`totalMs` (user-facing) is `transcribeMs + polishMs` only; `pasteMs` is diagnostic.

## Debug tooling

`AXTreeDumper` (Settings Diagnostics) walks the resolved target app's AX tree to `~/Library/Logs/InkIt-debug.log`. It uses separate limits (`nodeLimit: 5000`, `wallClockLimit: 5.0s`) from the paste-time `FocusedEditable` probe and is not on the dictation hot path.

## Failure modes

| Symptom | Cause | Behavior |
|---------|-------|------------|
| Text in History, not at cursor | No editable focus at release | `heldInHistory`, copy from History |
| "Paste failed" notch error | `setString` returned false | `setError`, no history row on that path |
| Paste into wrong app | Would occur without release probe | Prevented by `FocusedEditable` gate |
| Chromium app always held | Tree not built before probe | Press-time `enableWebAccessibility` prewarm |
| Slow AX tree | Deep DOM | Budget-bounded walk; 1.5s generous for real apps |
| Clipboard not restored | User copied during 400ms window | Session UUID mismatch; user's clipboard kept |

## Related pages

<CardGroup>
  <Card title="Dictation pipeline" href="/dictation-pipeline">
    End-to-end flow from hotkey through STT, polish, focus check, paste, and history persistence.
  </Card>
  <Card title="Dictation state machine" href="/dictation-state-machine">
    `DictationState` lifecycle including `pasting`, `heldInHistory`, and transition triggers.
  </Card>
  <Card title="Permissions model" href="/permissions-model">
    Accessibility requirement for Cmd+V synthesis and AX focus probes.
  </Card>
  <Card title="Quickstart" href="/quickstart">
    First dictation verification and recovery when no editable field is focused.
  </Card>
  <Card title="Runtime troubleshooting" href="/runtime-troubleshooting">
    Duplicate bundle instances, debug logging, and non-API paste/focus issues.
  </Card>
  <Card title="Testing and CI" href="/testing-and-ci">
    `AXBudgetTests` regression contracts for the shared AX traversal primitive.
  </Card>
</CardGroup>

---

## 16. STT troubleshooting

> Diagnose Cartesia transcription failures: STTFailure cases (offline, serverError, rateLimited, outOfCredits, invalidKey, unknown), notch vs Home card surfacing, graceful collapse on empty or benign disconnect, and STTFailureRoutingTests regression contracts.

- Page Markdown: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/16-stt-troubleshooting.md
- Generated: 2026-06-15T22:10:52.929Z

### Source Files

- `InkIt/CartesiaStreamingClient.swift`
- `InkIt/AppCoordinator.swift`
- `InkItTests/STTFailureRoutingTests.swift`
- `InkIt/SettingsStore.swift`
- `InkIt/NotchHUD.swift`

---
title: "STT troubleshooting"
description: "Diagnose Cartesia transcription failures: STTFailure cases (offline, serverError, rateLimited, outOfCredits, invalidKey, unknown), notch vs Home card surfacing, graceful collapse on empty or benign disconnect, and STTFailureRoutingTests regression contracts."
---

InkIt classifies every Cartesia STT session failure through `STTFailure` in `CartesiaStreamingClient`, routes named failures to the notch HUD via `AppCoordinator.handleSTTFailure`, and persists only key and credit problems to the Home banner. Transient blips stay momentary; account problems stay sticky until the next successful dictation.

## Failure taxonomy

`STTFailure` is the single enum that drives both the notch copy (`notchMessage`) and, for two cases, persisted Home state. Classification happens in two paths: Cartesia `{"type":"error"}` events (`status_code` + optional `error_code`) and transport-level errors (HTTP upgrade response or `URLError`).

| Case | Typical signals | Notch message | Home persistence |
|------|-----------------|---------------|------------------|
| `offline` | `URLError`: `.notConnectedToInternet`, `.networkConnectionLost`, `.cannotConnectToHost`, `.cannotFindHost`, `.dnsLookupFailed`, `.dataNotAllowed` | No internet | None |
| `serverError` | HTTP 5xx; `URLError.timedOut` | Server error | None |
| `rateLimited` | HTTP 429; `error_code` `concurrency_limited` | Too many requests | None |
| `outOfCredits` | HTTP 402; `error_code` `quota_exceeded` or `plan_upgrade_required` | Out of credits | `cartesiaOutOfCredits` |
| `invalidKey` | HTTP 401 or 403 | Invalid API key | `cartesiaKeyInvalid` |
| `unknown` | Unclassified 400; POSIX `ENOTCONN`/`EPIPE`/`ECONNRESET`; other unmapped errors | Couldn't transcribe | None |

<ParamField body="error_code" type="string">
Cartesia error-event field. `quota_exceeded` and `plan_upgrade_required` map to `outOfCredits`; `concurrency_limited` maps to `rateLimited`. Checked before HTTP status.
</ParamField>

<ParamField body="status_code" type="int">
HTTP status on Cartesia error events or a failed WebSocket upgrade (`urlSession(_:task:didCompleteWithError:)`). Used when `error_code` does not match a known bucket.
</ParamField>

Rejected WebSocket upgrades (invalid key, rate limit, out of credits before the receive loop starts) are handled in `urlSession(_:task:didCompleteWithError:)` so they are not swallowed silently.

## Where failures surface

InkIt uses two UI channels with different lifetimes.

### Notch HUD (momentary)

When `CartesiaStreamingClient.onError` fires, `AppCoordinator.handleSTTFailure` calls `setError(failure.notchMessage)`, which:

- Sets `DictationState` to `.error(message)`
- Stops audio and cancels the WebSocket (`client.cancel()`)
- Renders `NotchHUD` as `.errorNotice` — `InkIt ⚠ <notchMessage>` with a red warning glyph
- Self-clears to `.idle` after **1.5 s** post-release (`armErrorDismiss`); while the hotkey is held (`isHotkeyHeld`), the error stays visible until release

<Warning>
On the error path, `finishClose` runs with `reportClosed: false` so an empty `onClosed` callback does not reset state to `.idle` and wipe the notch notice before the user reads it.
</Warning>

### Home banner (persistent)

Only `invalidKey` and `outOfCredits` set persisted flags:

<ParamField body="cartesiaKeyInvalid" type="bool">
UserDefaults key `cartesiaKeyInvalid`. Set on `STTFailure.invalidKey`. Cleared when `onClosed` delivers a non-empty transcript.
</ParamField>

<ParamField body="cartesiaOutOfCredits" type="bool">
UserDefaults key `cartesiaOutOfCredits`. Set on `STTFailure.outOfCredits`. Cleared on the next successful transcription.
</ParamField>

`SettingsStore.transcriptionIssue` derives from these flags (suppressed when no Cartesia key is set — that is onboarding, not a fault). When non-nil, Home shows a full-width **Dictation is paused** banner with CTA:

| `ServiceIssue` | Message | CTA action |
|----------------|---------|------------|
| `keyInvalid` | Your Cartesia API key is invalid… | Opens Settings → Dictation pane |
| `outOfCredits` | You're out of Cartesia credits… | Opens `https://play.cartesia.ai/subscription` |

Transient failures (`offline`, `serverError`, `rateLimited`, `unknown` when surfaced) appear only in the notch. They do not set Home flags.

```mermaid
stateDiagram-v2
    [*] --> Classify: error or transport failure
    Classify --> NamedFailure: offline, serverError, rateLimited, outOfCredits, invalidKey
    Classify --> Unknown: unclassified / benign disconnect

    NamedFailure --> SurfaceNotch: onError → setError(notchMessage)
    SurfaceNotch --> PersistHome: invalidKey or outOfCredits only
    PersistHome --> IdleAfterDismiss: armErrorDismiss 1.5s

    Unknown --> HasContent: transcript in hand
    Unknown --> NoContent: empty session
    HasContent --> Deliver: onClosed with words, no error
    NoContent --> Collapse: onClosed empty → idle, no notch error
```

## Graceful collapse

`reportFailureOrCollapse` is the core decision function. Only **named** failures show an error; `.unknown` never does.

**Priority order:**

1. **Named failure** → `onError` + `finishClose(reportClosed: false)`. User sees the notch message.
2. **`.unknown` with transcript content** → `onClosed` delivers words (graceful goodbye after a normal server close or benign POSIX disconnect).
3. **`.unknown` with no content** → silent collapse: `onClosed("")` → `AppCoordinator` sets `.idle` with no error.

**Post-close 500 carve-out:** If `failure == .serverError`, `awaitingClose == true`, and there is no transcript content, the session collapses silently (rapid tap-and-release where the server returns 500 instead of an empty turn). A mid-hold 5xx with no content still surfaces; a post-close 5xx with words in hand still surfaces.

**Empty transcript at coordinator:** When `onClosed` delivers an empty string after trimming, the coordinator resets to `.idle` immediately — no polish, no paste, no error state.

<Note>
A too-short or silent press can make Cartesia reject the session with HTTP 400 (→ `.unknown`, collapse) or, on instant close, HTTP 500 (→ post-close carve-out, collapse). Neither should alarm the user.
</Note>

## Symptom → cause → fix

<Steps>
<Step title="Notch shows No internet">
Check network connectivity. `STTFailure.offline` maps from `URLError` codes that indicate no route to `api.cartesia.ai`. Retry dictation when online. No Home flag is set.
</Step>

<Step title="Notch shows Server error or Too many requests">
Transient Cartesia or network issue. `serverError` covers 5xx and timeouts; `rateLimited` covers 429 and `concurrency_limited`. Wait and retry. If persistent, check Cartesia status. No Home persistence.
</Step>

<Step title="Notch shows Invalid API key and Home shows Dictation is paused">
`cartesiaKeyInvalid` is set. Open Settings → Dictation, re-enter the key (Keychain account `cartesiaAPIKey`). See [Configure Cartesia API key](/configure-cartesia-api-key). Flag clears on the next successful dictation.
</Step>

<Step title="Notch shows Out of credits and Home banner persists">
`cartesiaOutOfCredits` is set. Review billing at Cartesia subscription. Flag clears when a non-empty transcript returns.
</Step>

<Step title="Notch shows Couldn't transcribe">
Named `unknown` failure with no graceful-collapse path — e.g. unclassified 400 **after** partial content arrived. Check `~/Library/Logs/InkIt-debug.log` with `debugLoggingEnabled` on for `STT error event:` and `reportFailureOrCollapse:` lines.
</Step>

<Step title="Release hotkey and nothing happens (no error)">
Expected for silent / too-short press: `.unknown` with no content, or post-close 500 carve-out. `onClosed` fires with `""` and state returns to `.idle`. Not a bug unless speech was clearly captured (check live transcript during hold).
</Step>
</Steps>

## Debug trace

Enable **Debug logging** in Settings (`debugLoggingEnabled` → `~/Library/Logs/InkIt-debug.log`). Relevant log prefixes:

| Log pattern | Meaning |
|-------------|---------|
| `STT error event: status=… code=…` | Cartesia `{"type":"error"}` before classification |
| `reportFailureOrCollapse: failure=… decision=…` | Collapse vs surface decision |
| `receive failure: domain=… code=… http=…` | Transport error in receive loop |
| `WS close: reason=… elapsed=…` | `SessionMetrics` close reason (`CloseReason`) |
| `setError: …` | Notch error message applied |

`SessionMetrics` stores up to 500 `CloseMetric` entries in UserDefaults (`CartesiaCloseMetrics`). Close reasons include `finalTurnReceived`, `serverClosed`, `graceTimerExpired`, `serverError`, `silentNoAudio`, `receiveFailed`, and `externalCancel`.

## Regression contracts (`STTFailureRoutingTests`)

The `STTFailureRouting` test target locks behavior that must not regress:

| Test | Contract |
|------|----------|
| `testUnknownWithContentDeliversTranscriptAndNeverErrors` | Benign disconnect with words → `onClosed` delivers transcript, `onError` nil |
| `testUnknownWithNoContentCollapsesSilently` | Silent press → `onClosed("")`, no `onError` |
| `testNamedFailureSurfacesError` | All named cases surface via `onError`; `onClosed` must **not** fire (would wipe notice) |
| `testServerErrorAfterCloseWithNoContentCollapsesSilently` | Post-close 500 on empty session → silent collapse |
| `testServerErrorMidHoldWithNoContentStillSurfaces` | 5xx while still holding → `serverError` reaches user |
| `testServerErrorAfterCloseWithContentStillSurfaces` | Post-close 500 with partial transcript → still surfaces |
| `testBenignSocketDisconnectsClassifyAsUnknown` | POSIX 57/32/54 → `.unknown` (forgiveness path) |
| `testRealTransportErrorsStayNamed` | Real `URLError` offline/timeout stay named |

Run locally:

```sh
xcodegen generate
xcodebuild -project InkIt.xcodeproj -scheme InkIt -configuration Debug \
  -destination 'platform=macOS' -only-testing:InkItTests/STTFailureRoutingTests test
```

<Check>
CI runs the full `InkItTests` target including `STTFailureRoutingTests` on every PR.
</Check>

## End-to-end failure flow

```mermaid
sequenceDiagram
    participant User
    participant Coordinator as AppCoordinator
    participant Client as CartesiaStreamingClient
    participant Cartesia as Cartesia STT API
    participant Notch as NotchHUD
    participant Home as Home banner

    User->>Coordinator: Release hotkey
    Coordinator->>Client: finalizeAndClose()
    Cartesia-->>Client: error event or disconnect
    Client->>Client: reportFailureOrCollapse(STTFailure)

    alt Named failure
        Client->>Coordinator: onError(failure)
        Coordinator->>Coordinator: handleSTTFailure
        Coordinator->>Home: Set cartesiaKeyInvalid / cartesiaOutOfCredits if applicable
        Coordinator->>Notch: setError(notchMessage)
        Note over Client: finishClose(reportClosed: false)
    else Unknown, no content
        Client->>Coordinator: onClosed("")
        Coordinator->>Coordinator: state = .idle
    else Unknown, has content
        Client->>Coordinator: onClosed(transcript)
        Coordinator->>Coordinator: polish → paste pipeline
        Coordinator->>Home: Clear cartesiaKeyInvalid / cartesiaOutOfCredits
    end
```

## Related pages

<CardGroup>
<Card title="Cartesia STT reference" href="/cartesia-stt-reference">
WebSocket contract, server events, and the full `STTFailure` classification table with `notchMessage` strings.
</Card>
<Card title="Configure Cartesia API key" href="/configure-cartesia-api-key">
Keychain storage, validation, and how `cartesiaKeyInvalid` / `cartesiaOutOfCredits` drive Home cards.
</Card>
<Card title="Dictation state machine" href="/dictation-state-machine">
`DictationState` transitions including `.error`, `.finalizing`, and empty-transcript collapse to `.idle`.
</Card>
<Card title="Settings reference" href="/settings-reference">
`transcriptionIssue`, persisted service-issue flags, and `debugLoggingEnabled`.
</Card>
<Card title="Testing and CI" href="/testing-and-ci">
Running `STTFailureRoutingTests` and the full CI test workflow.
</Card>
<Card title="Runtime troubleshooting" href="/runtime-troubleshooting">
Non-API issues: mic delay, duplicate app instances, debug log location, notch on non-notch displays.
</Card>
</CardGroup>

---

## 17. Polish troubleshooting

> Polish failure modes: RewriteFailure and PolishFailureReason, rate-limit Retry-After, timeout fallbacks to raw transcript, polishKeyInvalid and polishOutOfCredits persistence, history-row failure tooltips, and prewarm connection behavior.

- Page Markdown: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/17-polish-troubleshooting.md
- Generated: 2026-06-15T22:11:45.264Z

### Source Files

- `InkIt/TranscriptRewriter.swift`
- `InkIt/AppCoordinator.swift`
- `InkIt/TranscriptHistoryStore.swift`
- `InkIt/SettingsStore.swift`
- `InkIt/LLMProvider.swift`

---
title: "Polish troubleshooting"
description: "Polish failure modes: RewriteFailure and PolishFailureReason, rate-limit Retry-After, timeout fallbacks to raw transcript, polishKeyInvalid and polishOutOfCredits persistence, history-row failure tooltips, and prewarm connection behavior."
---

Optional LLM polish runs in `AppCoordinator.correctedTranscript` after Cartesia STT returns a final transcript. `TranscriptRewriter` classifies provider errors into `RewriteFailure`; the coordinator maps those to persisted `PolishFailureReason` values, always pastes the raw transcript on failure, and records the outcome in `TranscriptHistoryStore`. Only invalid-key and out-of-credits failures set durable `UserDefaults` flags that drive Home and Settings UI.

<Note>
Polish failures never block dictation. Unlike Cartesia STT errors, they do not call `setError` or flash the notch HUD. The user always receives at least the raw ASR text.
</Note>

## Failure flow

```mermaid
sequenceDiagram
    participant User
    participant AppCoordinator
    participant TranscriptRewriter
    participant Provider as LLM provider
    participant History as TranscriptHistoryStore

    User->>AppCoordinator: Release hotkey
    AppCoordinator->>AppCoordinator: STT final transcript
    AppCoordinator->>TranscriptRewriter: rewriteWithoutContext(raw)
    TranscriptRewriter->>Provider: POST /chat/completions or /messages
    alt HTTP 2xx + valid response
        Provider-->>TranscriptRewriter: polished text
        TranscriptRewriter-->>AppCoordinator: .success
        AppCoordinator->>AppCoordinator: Clear polishKeyInvalid / polishOutOfCredits
        AppCoordinator->>History: polish=.polished, original=raw
    else Any failure
        Provider-->>TranscriptRewriter: error status or timeout
        TranscriptRewriter-->>AppCoordinator: .failure(RewriteFailure)
        AppCoordinator->>AppCoordinator: Map to PolishFailureReason
        opt invalidKey or outOfCredits
            AppCoordinator->>AppCoordinator: Set polishKeyInvalid or polishOutOfCredits
        end
        AppCoordinator->>History: polish=.failed, text=raw, failure=details
        AppCoordinator->>User: Paste raw transcript
    end
```

While polish runs, `DictationState` is `.rewriting` and the notch HUD shows **Polishing…**. On failure the state advances to `.pasting` with the raw text — there is no `.error` transition for polish alone.

## `RewriteFailure` and `PolishFailureReason`

`TranscriptRewriter` returns `Result<String, RewriteFailure>`. `AppCoordinator.polishResult` maps each case 1:1 into `TranscriptHistoryStore.PolishFailureReason` and wraps provider metadata in `PolishFailure`.

| `RewriteFailure` | Trigger | `PolishFailureReason` | Persists Home flag? | User surfacing |
|---|---|---|---|---|
| `rateLimited(retryAt:)` | HTTP 429 | `rateLimited` | No | History tooltip with live retry hint |
| `offline` | `URLError` (no route, DNS, connection lost) | `offline` | No | History tooltip |
| `timedOut` | `URLError.timedOut`, HTTP 408/504, or exceeds `rewriteTimeout` | `timedOut` | No | History tooltip |
| `invalidKey` | HTTP 401/403, or empty API key before request | `invalidKey` | Yes → `polishKeyInvalid` | Home card + Settings `keyBroken` |
| `outOfCredits` | HTTP 402 | `outOfCredits` | Yes → `polishOutOfCredits` | Home card |
| `serverError` | HTTP 5xx | `serverError` | No | History tooltip |
| `unknown` | JSON parse failure, empty response, length sanity reject | `unknown` | No | History tooltip |

The length sanity check rejects polish output when `cleaned.count > max(120, Int(Double(transcript.count) * 2.5) + 40)`, treating runaway model output as `unknown`.

### HTTP and network classification

`TranscriptRewriter.failure(forStatus:headers:)` maps status codes and parses `Retry-After`:

- **429** → `rateLimited(retryAt:)` — `retryAt` from header
- **401, 403** → `invalidKey`
- **402** → `outOfCredits`
- **408, 504** → `timedOut`
- **500–599** → `serverError`
- **Other** → `unknown`

`failure(forURLError:)` maps `URLError.timedOut` to `timedOut` and connectivity codes (`notConnectedToInternet`, `cannotConnectToHost`, `cannotFindHost`, `networkConnectionLost`, `dataNotAllowed`, `dnsLookupFailed`) to `offline`.

## Rate-limit `Retry-After`

On HTTP 429, `TranscriptRewriter.retryAt(from:)` parses the `Retry-After` header in two forms:

1. **Delta-seconds** — numeric string added to `Date()`
2. **HTTP-date** — `EEE, dd MMM yyyy HH:mm:ss zzz` (POSIX locale)

The parsed absolute time is stored on `PolishFailure.retryAt` and survives SwiftData persistence (`TranscriptRecordTests.testFailedRecordRoundTripsFailureDetails`).

`TranscriptHistoryRow.failureMessage` computes a live countdown at tooltip open time:

| Condition | Tooltip suffix |
|---|---|
| `retryAt` is nil | "Retry soon or switch provider." |
| `retryAt` ≤ 5 s away | "Try again now or switch provider." |
| ~1 minute away | "Try again in ~1 min or switch provider." |
| Longer | "Try again in ~N min or switch provider." |

Full rate-limit message pattern: `{Provider} rate limit — raw text pasted. {retry hint}`.

## Timeout fallbacks

Two independent timeout layers apply:

<ParamField body="LLMProvider.rewriteTimeout" type="TimeInterval">
Per-provider ceiling on `URLSessionConfiguration.timeoutIntervalForRequest` and `timeoutIntervalForResource`, also passed as the request `timeoutInterval`.
</ParamField>

| Provider | `rewriteTimeout` |
|---|---|
| `groq` | 1.0 s |
| `gemini`, `openai`, `anthropic` | 2.0 s |

Groq's 1.0 s ceiling is sized above observed p99 latency; a tighter value would downgrade healthy rewrites to raw text.

When a timeout fires — via `URLError.timedOut`, HTTP 408/504, or the session ceiling — `TranscriptRewriter` returns `.timedOut`. The coordinator pastes the raw transcript, records `polish=.failed` with `reason: .timedOut`, and does not set persistent flags. The latency popover labels the polish stage **Polish attempt** (not **Polish**) when `polishFailed` is true, reflecting time spent without a successful rewrite.

<Warning>
A timed-out polish still costs wall-clock time in `Latency.polishMs`. The total shown to the user includes transcribe + polish attempt, not paste.
</Warning>

## Persistent flags: `polishKeyInvalid` and `polishOutOfCredits`

Both flags live in `UserDefaults` and survive relaunch.

<ParamField body="polishKeyInvalid" type="Bool">
Set when the last polish attempt returned `invalidKey`. Cleared on the next successful polish, when `rewriteProvider` changes, or when `enablePolish(provider:)` runs.
</ParamField>

<ParamField body="polishOutOfCredits" type="Bool">
Set when the last polish attempt returned `outOfCredits`. Cleared on the next successful polish or provider change.
</ParamField>

`SettingsStore.polishIssue` surfaces a Home rail card only when **all** of the following hold:

- `correctionEnabled` is true
- `hasRewriteKey` is true (non-empty key for `rewriteProvider`)
- One of the flags above is set

Transient failures (rate limit, offline, timeout, server error, unknown) intentionally do **not** set these flags — they appear only in history.

### Settings and Home UI

| Flag | `polishUIState` | Home card | Settings behavior |
|---|---|---|---|
| `polishKeyInvalid` | `.keyBroken` | "Polish is paused" — invalid key | Amber card; master toggle disabled until key re-entered |
| `polishOutOfCredits` | `.on` (unchanged) | "Polish is paused" — out of credits | Polish stays enabled; CTA opens `rewriteProvider.billingURL` |

Re-entering a verified key in Settings (`LLMKeyValidator` → `.verified`) calls `enablePolish(provider:)`, which clears both flags and turns polish on from `.setup` or `.keyBroken`.

## History-row failure tooltips

Failed polish entries are stored with:

- `polish: .failed`
- `text`: raw ASR transcript (what was pasted)
- `original: nil` (no before/after diff)
- `failure`: `PolishFailure { reason, provider, retryAt? }`

The history row shows an amber `exclamationmark.triangle.fill` chip on hover. Clicking it opens a popover with `TranscriptHistoryRow.failureMessage(failure)`:

| `PolishFailureReason` | Message pattern |
|---|---|
| `rateLimited` | `{Provider} rate limit — raw text pasted. {retry hint}` |
| `offline` | "No internet — raw text pasted. Reconnect and re-dictate." |
| `timedOut` | "Polish timed out — raw text pasted. Re-dictate to retry." |
| `invalidKey` | "Invalid {Provider} API key — raw text pasted. Fix it in Settings." |
| `outOfCredits` | "Out of {Provider} credits — raw text pasted. Review your {Provider} plan to re-enable Polish." |
| `serverError` | `{Provider} server error — raw text pasted. Try again shortly." |
| `unknown` | "Polish failed — raw text pasted. Re-dictate to retry." |
| `failure` is nil (legacy) | "Polish failed — raw text pasted. Re-dictate to retry." |

Legacy entries without a stored `polish` field fall back to showing a polished indicator only when `original != nil`.

## Prewarm connection behavior

At hotkey press (`startDictation`), when polish is eligible, `AppCoordinator` builds a `TranscriptRewriter` and calls `prewarm()` before audio capture begins:

**Prewarm runs when:**

- Not routing to the onboarding Try It box (`routeToOnboardingBox == false`)
- `settings.correctionEnabled` is true
- API key for `settings.rewriteProvider` is non-empty

**Prewarm is skipped when:**

- Onboarding trial (never polishes)
- Polish disabled
- No provider key configured

`TranscriptRewriter.prewarm()` issues a `HEAD` request to `provider.endpoint` with a 2.5 s timeout. The response body and status are discarded — even 404/405 warms DNS + TCP + TLS. The **same** `URLSession` instance on that `TranscriptRewriter` is later reused for the real `POST`, so the pooled connection lands on the polish hot path.

The warmed instance is stored in `warmRewriter` and consumed once in `correctedTranscript`. If prewarm was skipped (e.g. settings changed mid-hold), a fresh `TranscriptRewriter` is constructed at release. `warmRewriter` is reset to `nil` on every `startDictation`.

<Info>
Prewarm overlaps connection setup with the time the user is still speaking, removing roughly one round trip from the post-release polish stage. It does not send the rewrite prompt or spend tokens.
</Info>

## Diagnostic workflow

<Steps>
<Step title="Confirm polish is enabled and keyed">
Open Settings → Polish. Verify `correctionEnabled` is on, a provider is selected, and the key field is populated. If `polishUIState` is `.keyBroken`, re-enter and verify the key.
</Step>

<Step title="Dictate and check what pasted">
On any polish failure, raw ASR text still pastes. Compare the pasted text against the history row — failed entries show raw text with no sparkles diff.
</Step>

<Step title="Inspect the history failure chip">
Hover the transcript row, click the amber warning triangle, and read the provider-specific reason. For rate limits, note the live retry countdown.
</Step>

<Step title="Check persistent vs transient">
If Home shows **Polish is paused**, fix the key (`polishKeyInvalid`) or billing (`polishOutOfCredits`). For transient errors (timeout, offline, 5xx, rate limit), re-dictate or switch provider — no flag is set.
</Step>

<Step title="Enable debug logging for HTTP detail">
Turn on `debugLoggingEnabled` in Settings. Polish HTTP status, classified `RewriteFailure`, and prewarm logs appear in `~/Library/Logs/InkIt-debug.log` with redacted API keys.
</Step>
</Steps>

### Expected signals by failure type

| Symptom | Likely cause | Fix |
|---|---|---|
| Raw text, no Home card, history shows rate-limit tooltip | Provider 429 | Wait for `Retry-After` hint or switch provider |
| Raw text, "Polish timed out" in history | Hung or slow provider | Re-dictate; consider Groq (1.0 s ceiling) or check network |
| Raw text, Home "invalid key" card | 401/403 from provider | Update key in Settings → Polish |
| Raw text, Home "out of credits" card | HTTP 402 | Open billing URL from Home CTA |
| Polished text with sparkles diff | Success | No action needed — flags cleared automatically |

## Comparison with STT failures

Polish and transcription failures use parallel classification enums but different surfacing rules:

| Aspect | STT (`STTFailure`) | Polish (`RewriteFailure`) |
|---|---|---|
| Blocks paste | Yes — session aborts | No — raw text always pastes |
| Notch HUD error | Yes (`setError`) | No |
| Persistent flags | `cartesiaKeyInvalid`, `cartesiaOutOfCredits` | `polishKeyInvalid`, `polishOutOfCredits` |
| Transient errors | Notch only | History tooltip only |
| Primary log | Notch + optional Home card | History row + optional Home card |

## Related pages

<CardGroup>
<Card title="Configure Polish" href="/configure-polish">
Enable polish, pick a provider, store keys, and interpret `PolishUIState`.
</Card>
<Card title="LLM providers reference" href="/llm-providers-reference">
Endpoints, default models, `rewriteTimeout` ceilings, and validation probes per `LLMProvider`.
</Card>
<Card title="Dictation pipeline" href="/dictation-pipeline">
End-to-end flow including polish placement, latency stages, and prewarm timing.
</Card>
<Card title="Settings reference" href="/settings-reference">
All `UserDefaults` keys including `polishKeyInvalid` and `polishOutOfCredits`.
</Card>
<Card title="STT troubleshooting" href="/stt-troubleshooting">
Cartesia transcription failures — parallel failure model with different surfacing rules.
</Card>
</CardGroup>

---

## 18. Runtime troubleshooting

> Non-API runtime issues: duplicate InkIt bundle instances and Accessibility grants, Bluetooth mic profile delay, debugLoggingEnabled trace file at ~/Library/Logs/InkIt-debug.log, SwiftData persistence fallback, and notch HUD positioning on non-notch displays.

- Page Markdown: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/18-runtime-troubleshooting.md
- Generated: 2026-06-15T22:14:29.116Z

### Source Files

- `InkIt/AppCoordinator.swift`
- `InkIt/DebugLog.swift`
- `InkIt/AudioCaptureService.swift`
- `InkIt/TranscriptHistoryStore.swift`
- `InkIt/NotchHUD.swift`
- `InkItTests/DebugLogFormatterTests.swift`

---
title: "Runtime troubleshooting"
description: "Non-API runtime issues: duplicate InkIt bundle instances and Accessibility grants, Bluetooth mic profile delay, debugLoggingEnabled trace file at ~/Library/Logs/InkIt-debug.log, SwiftData persistence fallback, and notch HUD positioning on non-notch displays."
---

`AppCoordinator` coordinates the main runtime failure surfaces outside Cartesia STT and LLM polish: it detects duplicate bundle instances at launch, gates dictation on `PermissionsService.hasAccessibility`, drives `audioReady` from `AudioCaptureService`, routes trace output through `DebugLog`, and hosts `NotchHUDController` after onboarding completes. `TranscriptHistoryStore` owns SwiftData persistence with an in-memory fallback when the on-disk store cannot open.

## Symptom map

| Symptom | Likely runtime cause | Primary signal |
| --- | --- | --- |
| Hotkey fires but paste never works | Accessibility granted to a different `.app` bundle than the running copy | `setError("Accessibility needed")` in notch HUD; `AXIsProcessTrusted()` false |
| Two menu-bar icons or conflicting hotkeys | Multiple processes share `Bundle.main.bundleIdentifier` but different `bundlePath` values | `detectDuplicateRunningCopies()` sets `lastError` with tilde-shortened paths |
| First words missing on AirPods / Bluetooth headset | A2DP→HFP profile switch emits digital silence for ~200–500 ms | `audioReady` stays false; `HUDPreparingDot` instead of waveform |
| History rows vanish after quit | SwiftData fell back to in-memory store | Transcripts visible this session only; `isPersistent == false` internally |
| HUD looks like a black blob on menu bar | Machine has no physical notch | `NotchGeometry.hasPhysicalNotch == false` → `floatingIsland` capsule |

## Duplicate InkIt bundle instances

At `AppCoordinator` init, `detectDuplicateRunningCopies()` enumerates `NSWorkspace.shared.runningApplications` for other processes with the same `bundleIdentifier` and a different PID. When matches exist, it records:

```
Multiple InkIt copies are running. Current: <path>. Also running: <paths>. Quit the duplicate copy and grant Accessibility to only one app bundle.
```

in `lastError`. Startup is not blocked; dictation may still register, but two copies compete for the global hotkey and paste injection.

<Warning>
macOS TCC grants Accessibility per app **bundle path**, not per bundle ID alone. A release build at `/Applications/InkIt.app` and a Debug build under `~/Library/Developer/Xcode/DerivedData/.../InkIt.app` are separate entries in System Settings → Privacy & Security → Accessibility.
</Warning>

<Steps>
<Step title="Identify running copies">

Open Activity Monitor and filter for **InkIt**, or run:

```bash
pgrep -lf InkIt
```

Note each distinct `.app` path.

</Step>
<Step title="Quit duplicates">

Leave only the bundle you intend to use (typically `/Applications/InkIt.app` after a release install, or a single local build during development). Quit extras from the menu bar or Activity Monitor.

</Step>
<Step title="Re-grant Accessibility to the surviving bundle">

In System Settings → Privacy & Security → Accessibility, enable **only** the InkIt entry that matches the path you kept. Remove stale entries for old DerivedData builds if present.

</Step>
<Step title="Verify">

Relaunch the single copy. Press the dictation hotkey in a focused text field. The notch HUD should show the live island without an immediate `Accessibility needed` error.

</Step>
</Steps>

`PermissionsService.appIdentityDescription` returns `CFBundleName`, `bundleIdentifier`, and `bundlePath` for confirming which binary macOS is evaluating.

## Accessibility grants and duplicate bundles

Dictation checks `permissions.hasAccessibility` on every `startDictation()` call. When false:

- `setError("Accessibility needed")` surfaces a brief notch error notice.
- `permissions.requestAccessibility()` runs at most once every `accessibilityPromptThrottle` (10 seconds) to avoid yanking System Settings on repeated hotkey mashes.
- The HUD error still appears on every blocked press.

`PermissionsService.requestAccessibility()` fires `AXIsProcessTrustedWithOptions(prompt: true)` once, then opens the Accessibility pane. After a Deny or repeat tap, later calls only re-open Settings (`needsManual` state). `confirmAccessibilityGrant()` may silently relaunch via `NSWorkspace.openApplication` with `createsNewApplicationInstance: true` when the in-process trust bit stays stale after the user toggled the switch.

When Accessibility flips on, `AppCoordinator.refreshHUD()` re-registers the hotkey so Fn/Globe bindings upgrade from passive monitor to suppressing `CGEventTap`.

## Bluetooth mic profile delay

`AudioCaptureService` converts input to 16 kHz PCM and signals readiness separately from hotkey press:

| Constant | Value | Role |
| --- | --- | --- |
| `readyLevelThreshold` | `0.03` | Normalized peak above noise floor → device is live |
| `readyFallbackDelay` | `0.6` s | Timer backstop when room is silent or threshold never crosses |

On `start()`, Bluetooth devices often emit digital silence while switching from output (A2DP) to input (HFP). The first buffer with `level > readyLevelThreshold` calls `signalReadyIfNeeded()`, which fires `onReady` once per take. If no signal arrives within 600 ms, the fallback timer signals ready anyway so the HUD never sticks on preparing.

`AppCoordinator` resets `audioReady = false` at each `startDictation()` and sets it true when `audio.onReady` fires. `NotchHUDView` shows `HUDPreparingDot` until `coordinator.audioReady`, then `HUDWaveform` — the intentional “start speaking” cue.

<Note>
Words spoken during the dead gap before HFP activation are lost at the hardware level; wait for the waveform before speaking on Bluetooth mics.
</Note>

See [Input device selection](/input-device-selection) for pinning a preferred CoreAudio UID and unplug fallback behavior.

## Debug trace file (`debugLoggingEnabled`)

`DebugLog` mirrors developer traces to unified logging (`os.Logger`, subsystem = bundle identifier, category `trace`) **and** a flat append-only file. Logging is **off by default** because traces can include raw transcripts and on-screen context.

<ParamField body="debugLoggingEnabled" type="boolean">
UserDefaults key shared by `SettingsStore.debugLoggingEnabled` and `DebugLog.isEnabledKey`. When false, `DebugLog.info` / `DebugLog.error` return immediately without writing.
</ParamField>

Enable via Settings → **Debug logging** (caption: writes to `~/Library/Logs/InkIt-debug.log`).

**Log path:** `~/Library/Logs/InkIt-debug.log`  
**Line format:** `[ISO8601-with-fractional-seconds] <message>\n`  
**Write model:** serial `DispatchQueue`; append via `FileHandle.seekToEndOfFile()`, atomic create on first write.

Utility helpers used across the dictation pipeline:

| API | Purpose |
| --- | --- |
| `boundedBlock(title:text:limit:)` | Metadata (`bytes`, FNV-style `hash`, `truncated` flag) plus body capped at 12 000 chars |
| `redacted(_:secrets:)` | Replaces key substrings with `<redacted>` |
| `prettyJSONString(_:)` | Sorted, pretty-printed JSON for request bodies |

Representative trace sites when enabled: `startDictation` target resolution, `correctedTranscript` polish runs, `setError`, `PasteService`, `CartesiaStreamingClient`, and `TranscriptHistoryStore` persistence errors.

<RequestExample>

```bash
# Tail the trace while reproducing an issue
defaults read com.cartesia.InkIt debugLoggingEnabled   # expect 1 when toggle is on
tail -f ~/Library/Logs/InkIt-debug.log
```

</RequestExample>

<Warning>
Disable debug logging after investigation. The file accumulates transcript text and context on disk.
</Warning>

## SwiftData persistence fallback

`TranscriptHistoryStore.makeContainer()` first opens a disk-backed `ModelContainer` for schema `[TranscriptRecord.self]`. On failure it logs (via `DebugLog.error` when tracing is on) and constructs an in-memory container with `isStoredInMemoryOnly: true`.

| `isPersistent` | Behavior |
| --- | --- |
| `true` | Rows survive relaunch; `lifetimeWords` advances in UserDefaults (`transcriptHistory.lifetimeWords.v1`) only after durable `saveContext()` |
| `false` | Session-only history; `entries` mirror updates in RAM but quit drops them; legacy UserDefaults migration (`transcriptHistory.v1`) is skipped to avoid deleting unmigrated data |

`add()` always inserts into the published `entries` array for the current session. `clear()` updates the UI only after a successful `context.delete` + `saveContext()`. A failed save returns `false` from `saveContext()` and leaves the legacy blob in place for retry on next launch.

Enable `debugLoggingEnabled` to capture `TranscriptHistoryStore: on-disk SwiftData store unavailable` or `save failed` lines when diagnosing missing history.

```text
TranscriptHistoryStore.init
        │
        ├─► ModelContainer (disk) ──success──► isPersistent = true
        │
        └─► catch ──► ModelContainer (in-memory) ──► isPersistent = false
                              │
                              └─► migrateLegacyHistoryIfNeeded() skipped
```

## Notch HUD on non-notch displays

`NotchHUDController` creates a borderless `NSPanel` at `CGWindowLevelForKey(.statusWindow) + 1`, click-through (`ignoresMouseEvents = true`), and repositions on `NSApplication.didChangeScreenParametersNotification`.

`NotchGeometry.detect(on:)` branches on `NSScreen.safeAreaInsets.top`:

| Display | `hasPhysicalNotch` | Horizontal anchor | HUD shape |
| --- | --- | --- | --- |
| MacBook with camera notch | `true` | Center of auxiliary top-left/right areas | `pill` merges with physical notch |
| External / non-notch Mac | `false` | `frame.midX` | `floatingIsland` — content-sized capsule below menu bar |

Non-notch layout uses `HUDMetrics.floatingTopGap` (3 pt), `floatingHeight` (22 pt), horizontal padding, and a soft capsule shadow. The island scales/fades in (`transition(.scale.combined(with: .opacity))`) because there is no notch to retract into. Window width stays `520` pt; `reposition()` sets `x = centerX - width/2`, `y = screen.frame.maxY - height`.

`SettingsStore` persists `notchHorizontalPosition` (UserDefaults, default `0.38`, clamped `0.04`…`0.96`), but `NotchHUDController.reposition()` currently derives `centerX` only from `NotchGeometry.detect` — not from that setting. On non-notch hardware the HUD is always centered on the main screen's top edge.

If the HUD appears off-screen after a display topology change, trigger a reposition by switching displays or toggling resolution; the screen-parameters observer calls `reposition()` automatically.

## Related pages

<CardGroup>
<Card title="Permissions model" href="/permissions-model">
Accessibility and microphone TCC flow, polling, `needsManual` state, and silent relaunch after grant.
</Card>
<Card title="Input device selection" href="/input-device-selection">
Preferred microphone UID, Bluetooth A2DP→HFP timing, and `audioReady` HUD coupling.
</Card>
<Card title="Dictation state machine" href="/dictation-state-machine">
`DictationState` transitions, `audioReady`, and error dismiss timing.
</Card>
<Card title="Settings reference" href="/settings-reference">
`debugLoggingEnabled`, `notchHorizontalPosition`, and other persisted keys.
</Card>
<Card title="STT troubleshooting" href="/stt-troubleshooting">
Cartesia API failures — separate from the runtime issues above.
</Card>
<Card title="Build from source" href="/build-from-source">
Why DerivedData builds create a second Accessibility TCC entry alongside `/Applications/InkIt.app`.
</Card>
</CardGroup>

---

## 19. Build from source

> Generate InkIt.xcodeproj with XcodeGen, open in Xcode 15+, Debug vs Release signing via Config/*.xcconfig, ad-hoc vs Developer ID notarized builds, Sparkle deep-sign post-build script, and install to /Applications.

- Page Markdown: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/19-build-from-source.md
- Generated: 2026-06-15T22:11:42.516Z

### Source Files

- `README.md`
- `project.yml`
- `AGENTS.md`
- `Config/Debug.xcconfig`
- `Config/Signing.xcconfig`
- `Config/Signing.local.xcconfig.example`

---
title: "Build from source"
description: "Generate InkIt.xcodeproj with XcodeGen, open in Xcode 15+, Debug vs Release signing via Config/*.xcconfig, ad-hoc vs Developer ID notarized builds, Sparkle deep-sign post-build script, and install to /Applications."
---

InkIt ships as a generated `InkIt.xcodeproj` (not committed) built from `project.yml` via XcodeGen. Signing, entitlements, and Sparkle helper re-signing are controlled per configuration through `Config/*.xcconfig`; the Release `.app` lands at `build/Build/Products/Release/InkIt.app` when you pass `-derivedDataPath build`.

## Prerequisites

| Requirement | Value |
|-------------|-------|
| macOS | 14.0+ (`MACOSX_DEPLOYMENT_TARGET` in `project.yml`) |
| Hardware | Apple silicon (product requirement in README) |
| Xcode | 15+ (CI runs on `macos-15`) |
| XcodeGen | `brew install xcodegen` |
| Bundle identifier | `ai.cartesia.InkIt` |
| App Sandbox | Disabled (`ENABLE_APP_SANDBOX = NO` in `project.yml`) |

<Note>
`InkIt.xcodeproj` and `build/` are gitignored. Regenerate the project after cloning or editing `project.yml`; edit `project.yml`, not the generated `.pbxproj`.
</Note>

## Generate the Xcode project

<Steps>
<Step title="Install XcodeGen">

```bash
brew install xcodegen
```

</Step>

<Step title="Generate InkIt.xcodeproj">

From the repository root:

```bash
xcodegen generate
```

XcodeGen reads `project.yml` and writes `InkIt.xcodeproj`, wiring the InkIt app target, `InkItTests`, the Sparkle Swift package (≥ 2.9.2), and per-configuration xcconfig files.

</Step>

<Step title="Open in Xcode">

```bash
open InkIt.xcodeproj
```

Select the **InkIt** scheme. Use **Debug** for development and unit tests; use **Release** for a distributable local build.

</Step>
</Steps>

## Configuration and signing

Each build configuration maps to a dedicated xcconfig. Both include `Config/Sparkle.xcconfig`, which sets `SPARKLE_PUBLIC_ED_KEY` for the `SUPublicEDKey` Info.plist entry.

<Tabs>
<Tab title="Debug">

`Config/Debug.xcconfig` — local development and `xcodebuild test`.

| Setting | Value |
|---------|-------|
| `CODE_SIGN_IDENTITY` | `-` (ad-hoc) |
| `CODE_SIGN_STYLE` | `Manual` |
| `ENABLE_HARDENED_RUNTIME` | `NO` |
| `CODE_SIGN_ENTITLEMENTS` | `InkIt/InkIt-Debug.entitlements` |

Debug entitlements match Release except `com.apple.security.get-task-allow` is `true`, allowing the debugger and XCTest host to attach.

</Tab>

<Tab title="Release">

`Config/Signing.xcconfig` — local Release builds and distribution.

| Setting | Default | With `Signing.local.xcconfig` |
|---------|---------|-------------------------------|
| `CODE_SIGN_IDENTITY` | `-` (ad-hoc) | `Developer ID Application` |
| `ENABLE_HARDENED_RUNTIME` | `YES` | `YES` |
| `CODE_SIGN_ENTITLEMENTS` | `InkIt/InkIt.entitlements` | `InkIt/InkIt.entitlements` |
| `OTHER_CODE_SIGN_FLAGS` | — | `--timestamp --options=runtime` |

Release entitlements set `com.apple.security.get-task-allow` to `false` (required for notarization). Both configurations grant microphone and Apple Events automation.

</Tab>
</Tabs>

### Ad-hoc vs Developer ID

By default, Release builds are **ad-hoc signed** (`CODE_SIGN_IDENTITY = -`). Anyone can compile and run InkIt locally without an Apple Developer account.

For **Developer ID** signing (notarized DMG distribution), create a gitignored local override:

```bash
cp Config/Signing.local.xcconfig.example Config/Signing.local.xcconfig
```

Fill in your Team ID:

```xcconfig
DEVELOPMENT_TEAM = YOUR_TEAM_ID
CODE_SIGN_STYLE = Manual
CODE_SIGN_IDENTITY = Developer ID Application
OTHER_CODE_SIGN_FLAGS = --timestamp --options=runtime
```

`Config/Signing.xcconfig` includes this file via `#include? "Signing.local.xcconfig"` — the include is silently skipped when the file is absent.

<Warning>
Ad-hoc Release builds run locally but are not distributable. `tools/make-dmg.sh` requires `Config/Signing.local.xcconfig` and a Developer ID certificate in your Keychain.
</Warning>

## Build commands

<CodeGroup>

```bash title="Debug — run unit tests"
xcodegen generate
xcodebuild test \
  -project InkIt.xcodeproj \
  -scheme InkIt \
  -configuration Debug \
  -destination 'platform=macOS'
```

```bash title="Release — local installable build"
xcodegen generate
xcodebuild build \
  -project InkIt.xcodeproj \
  -scheme InkIt \
  -configuration Release \
  -derivedDataPath build
```

</CodeGroup>

The Release command writes the app bundle to `build/Build/Products/Release/InkIt.app`. CI uses separate derived-data paths (`build/ci-debug`, `build/ci-release`) but follows the same generate → test (Debug) → build (Release) sequence.

<Info>
If you build from the Xcode GUI without `-derivedDataPath build`, locate the product via **Product → Show Build Folder in Finder** instead of the `build/` path above.
</Info>

## Deep-sign Sparkle helpers

`project.yml` defines a **Deep-sign Sparkle helpers** post-build script on the InkIt target. After Xcode embeds Sparkle, nested helpers (`Downloader.xpc`, `Installer.xpc`, `Autoupdate`, `Updater.app`, and the framework itself) are re-signed inside-out so notarization accepts the bundle.

Behavior:

- **Skips** when `CODE_SIGNING_ALLOWED` is `NO` or Sparkle is not embedded.
- Uses `EXPANDED_CODE_SIGN_IDENTITY` from the active build — `-` for ad-hoc, Developer ID hash for maintainer builds.
- **Ad-hoc**: `codesign --force --sign -`
- **Developer ID**: `codesign --force --timestamp --options runtime --sign <identity>`

For ad-hoc local builds the script is effectively a no-op beyond sealing helpers. For notarized builds it is required — `tools/make-dmg.sh` fails fast if any Sparkle nested binary remains ad-hoc.

## Install to /Applications

After a Release build with `-derivedDataPath build`:

```bash
cp -R build/Build/Products/Release/InkIt.app /Applications/
```

<Check>
Replacing `/Applications/InkIt.app` is required for your build to take effect. Rebuilding alone does not update the installed copy if Xcode or Finder is still serving the old bundle.
</Check>

Quit any running InkIt instance before copying. If a release DMG version is already installed, the copy overwrites it — see [Runtime troubleshooting](/runtime-troubleshooting) if you encounter duplicate-instance or permission issues.

## Notarized distribution (maintainers)

Shipping a stapled `InkIt.dmg` is a separate workflow from local ad-hoc builds. `tools/make-dmg.sh`:

1. Requires `Config/Signing.local.xcconfig` and a `notarytool` Keychain profile (`inkit-notary`).
2. Runs `xcodegen generate` and a Release build to `build/`.
3. Verifies codesign (including Sparkle nested binaries).
4. Packages, signs, submits, and staples the DMG.

See [Release and distribution](/release-distribution) for the full publish pipeline (`make-appcast.sh`, `publish-release.sh`, Sparkle appcast).

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `InkIt.xcodeproj` missing after clone | Project is generated, not committed | Run `xcodegen generate` |
| `xcodebuild test` fails to launch host | Debug entitlements or wrong configuration | Build **Debug**; confirm `InkIt-Debug.entitlements` has `get-task-allow=true` |
| Notarization rejects Sparkle helpers | Nested binaries still ad-hoc | Confirm **Deep-sign Sparkle helpers** ran; check `project.yml` post-build script |
| `make-dmg.sh` fails on signing | No Developer ID override | Create `Config/Signing.local.xcconfig` from the example |
| Installed app unchanged after build | Old `/Applications/InkIt.app` still in place | `cp -R` the new `.app` over the installed copy |
| Scheme or target edits lost | Edited generated `.pbxproj` | Change `project.yml`, then `xcodegen generate` |

## Related pages

<CardGroup>
<Card title="Installation" href="/installation">
Prerequisites, entitlements overview, and the requirement to replace `/Applications/InkIt.app` after local builds.
</Card>
<Card title="Testing and CI" href="/testing-and-ci">
Run `xcodebuild test` locally and understand the CI workflow (design-token check, xcodegen, Debug test, Release build).
</Card>
<Card title="Release and distribution" href="/release-distribution">
Notarized DMG creation, Sparkle appcast, and GitHub release publishing for maintainers.
</Card>
<Card title="Quickstart" href="/quickstart">
First successful dictation after installing your build.
</Card>
</CardGroup>

---

## 20. Testing and CI

> Run xcodebuild test locally, CI workflow steps (design-token check, xcodegen, Debug test, Release build), test targets (STTFailureRouting, AXBudget, TranscriptRecord, DebugLogFormatter), and design-token enforcement via tools/check-design-tokens.sh.

- Page Markdown: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/20-testing-and-ci.md
- Generated: 2026-06-15T22:12:09.052Z

### Source Files

- `.github/workflows/ci.yml`
- `AGENTS.md`
- `tools/check-design-tokens.sh`
- `InkItTests/STTFailureRoutingTests.swift`
- `InkItTests/AXBudgetTests.swift`
- `InkItTests/TranscriptRecordTests.swift`
- `InkItTests/DebugLogFormatterTests.swift`

---
title: Testing and CI
description: Run xcodebuild test locally, CI workflow steps (design-token check, xcodegen, Debug test, Release build), test targets (STTFailureRouting, AXBudget, TranscriptRecord, DebugLogFormatter), and design-token enforcement via tools/check-design-tokens.sh.
---

InkIt keeps a small, fast unit-test suite and a single GitHub Actions workflow. Together they verify design-system compliance, compile the app in Debug and Release, and lock regression contracts for STT failure routing, Accessibility traversal budgets, transcript persistence, and debug logging.

## Prerequisites

- macOS 14+ (matches `MACOSX_DEPLOYMENT_TARGET` in `project.yml`)
- Xcode 15+ with command-line tools
- [XcodeGen](https://github.com/yonaskolb/XcodeGen) (`brew install xcodegen`)

`InkIt.xcodeproj` is generated from `project.yml` — edit `project.yml`, not the `.pbxproj`.

## Run tests locally

<Steps>
<Step title="Regenerate the Xcode project">

After changing `project.yml` or adding test files:

```sh
xcodegen generate
```

</Step>

<Step title="Run the unit test suite">

```sh
xcodebuild test \
  -project InkIt.xcodeproj \
  -scheme InkIt \
  -configuration Debug \
  -destination 'platform=macOS'
```

Debug configuration uses ad-hoc signing with `get-task-allow=true` (`Config/Debug.xcconfig`) so XCTest can attach to the host app.

</Step>

<Step title="Check design tokens (optional but recommended)">

```sh
./tools/check-design-tokens.sh
```

Run this before pushing — CI fails the job on any unjustified literal.

</Step>

<Step title="Verify success">

Expect all tests in `InkItTests` to pass and the script to print:

```
✓ design tokens: no unjustified literals
```

</Step>
</Steps>

### Run a single test class

Filter by test target and class name:

```sh
xcodebuild test \
  -project InkIt.xcodeproj \
  -scheme InkIt \
  -configuration Debug \
  -destination 'platform=macOS' \
  -only-testing:InkItTests/STTFailureRoutingTests
```

## CI workflow

The **CI** workflow (`.github/workflows/ci.yml`) runs on every push and pull request to `main`, and can be triggered manually via `workflow_dispatch`.

| Setting | Value |
|---|---|
| Runner | `macos-15` |
| Timeout | 30 minutes |
| Concurrency | One job per ref; newer runs cancel in-progress jobs |
| Permissions | `contents: read` |

### Pipeline steps

```mermaid
flowchart LR
  A[Checkout] --> B[Tool versions]
  B --> C[Design tokens]
  C --> D[Install XcodeGen]
  D --> E[xcodegen generate]
  E --> F[Debug test]
  F --> G[Release build]
```

<AccordionGroup>
<Accordion title="1. Check out repository">

`actions/checkout@v4` clones the ref under test.

</Accordion>

<Accordion title="2. Show tool versions">

Prints `sw_vers` and `xcodebuild -version` for reproducibility in CI logs.

</Accordion>

<Accordion title="3. Check design tokens">

```sh
./tools/check-design-tokens.sh
```

Fails fast before any compile step if a view introduces a hardcoded design value.

</Accordion>

<Accordion title="4. Install build tools">

```sh
brew install xcodegen
```

</Accordion>

<Accordion title="5. Generate Xcode project">

```sh
xcodegen generate
```

Produces `InkIt.xcodeproj` from `project.yml`.

</Accordion>

<Accordion title="6. Run unit tests (Debug)">

```sh
xcodebuild test \
  -project InkIt.xcodeproj \
  -scheme InkIt \
  -configuration Debug \
  -destination 'platform=macOS' \
  -derivedDataPath build/ci-debug \
  -quiet
```

Runs the full `InkItTests` bundle against a Debug-built host app.

</Accordion>

<Accordion title="7. Build Release app">

```sh
xcodebuild build \
  -project InkIt.xcodeproj \
  -scheme InkIt \
  -configuration Release \
  -destination 'platform=macOS' \
  -derivedDataPath build/ci-release \
  -quiet
```

Verifies the Release configuration compiles. CI does not run tests in Release.

</Accordion>
</AccordionGroup>

## Test target: InkItTests

`InkItTests` is a macOS unit-test bundle defined in `project.yml`. It links against the main `InkIt` app target and uses the app as its test host:

| Setting | Value |
|---|---|
| Type | `bundle.unit-test` |
| Bundle ID | `ai.cartesia.InkItTests` |
| `TEST_HOST` | `$(BUILT_PRODUCTS_DIR)/InkIt.app/Contents/MacOS/InkIt` |
| Signing | Ad-hoc (`CODE_SIGN_IDENTITY = -`), signing disabled on the bundle |

All four test files live under `InkItTests/` and import `@testable import InkIt`.

### STTFailureRoutingTests

Locks the end-of-session contract in `CartesiaStreamingClient.reportFailureOrCollapse`. The "Couldn't transcribe" notch fires only for **named, actionable** failures when there is nothing useful to deliver.

| Test area | What it guards |
|---|---|
| Graceful goodbye | `.unknown` with transcript content delivers words via `onClosed`, never `onError` |
| Silent tap | `.unknown` with no content collapses silently (empty transcript, no error) |
| Named failures | `.offline`, `.serverError`, `.rateLimited`, `.outOfCredits`, `.invalidKey` always reach `onError` and do not also fire `onClosed` |
| Post-close 500 carve-out | `.serverError` after close with no content collapses silently; mid-hold 5xx still surfaces |
| Classification | Benign POSIX disconnects (`ENOTCONN`, `EPIPE`, `ECONNRESET`) classify as `.unknown`; real transport errors stay named |

These tests are the regression anchor for STT error UX documented on the STT troubleshooting page.

### AXBudgetTests

Locks `AX.run(budget:_:)` in `PasteService.swift` — the shared Accessibility traversal primitive. Every AX walk routes through this helper.

| Test | Property verified |
|---|---|
| `testReturnsClosureResult` | Closure return value propagates |
| `testDeadlineIsBudgetInTheFuture` | Deadline is approximately `budget` seconds ahead |
| `testClosureHonorsDeadlineAndReturnsPartialWork` | Long work stops at the deadline instead of running unbounded |
| `testRunsOffTheMainThread` | Work executes off the main thread |

This prevents regressions where a slow AX tree stalls the main run loop and freezes modifier keys or drops pastes.

### TranscriptRecordTests

Locks the SwiftData persistence contract between `TranscriptHistoryStore.Entry` and `TranscriptRecord`. Tests use an in-memory `ModelContainer` — no disk, no singleton.

| Test | What it guards |
|---|---|
| `testPolishedRecordRoundTripsAllFields` | Full entry with latency, original text, and polish outcome survives insert/fetch |
| `testFailedRecordRoundTripsFailureDetails` | `PolishFailure` composite attributes (reason, provider, retryAt) persist |
| `testMinimalRecordRoundTrips` | Legacy nil-valued fields round-trip |
| `testIDIsPreservedThroughMapping` | Entry ID is stable through `Entry → TranscriptRecord → Entry` |
| `testFetchSortsNewestFirst` | Timestamp sort descriptor returns newest-first order |

### DebugLogFormatterTests

Locks `DebugLog` helpers used when `debugLoggingEnabled` writes trace output to `~/Library/Logs/InkIt-debug.log`.

| Test | Behavior verified |
|---|---|
| `testBoundedBlockIncludesMetadataAndTruncates` | `boundedBlock` reports byte count, hash, `truncated=true`, and prefixes body |
| `testBoundedBlockMarksUntruncated` | Short text gets `truncated=false` |
| `testRedactsSecrets` | `redacted(_:secrets:)` replaces known secrets with `<redacted>` |
| `testPrettyJSONStringDoesNotInjectAPIKey` | `prettyJSONString` serializes the payload without injecting credentials |

## Design-token enforcement

InkIt enforces a single source of truth for typography, color, motion, and elevation: the `Font` and `Color` extensions (plus `Motion`, `Elevation`, `Radius`) at the top of `InkIt/InkItApp.swift`. Views reference named tokens (`Font.inkBody`, `Color.canvas`, `Motion.quick`, …) instead of inline literals.

`tools/check-design-tokens.sh` scans `InkIt/*.swift` and **exits 1** on any match that lacks justification.

### Patterns checked

| Pattern | Label | Fix |
|---|---|---|
| `system(size: N` | Hardcoded font size | Use a `Font.ink*` token |
| `Color(red:` / `Color(white:` | Raw color literal | Use a `Color` token or system semantic |
| `easeOut(duration:` | Hardcoded animation timing | Use a `Motion.*` token |
| `.black.opacity(` | Raw shadow/scrim ink | Use `Elevation.*` or `Color.scrim` |

### Exemptions

The script skips:

- Lines inside `static let` token definitions
- Lines containing `// ds-allow: <reason>` (the sanctioned one-off escape hatch)
- Comment-only lines

### Adding a justified one-off

For genuine exceptions — SF Symbol icon glyphs, the always-dark notch HUD, dual-appearance `AppearanceThumbnail` previews, bespoke reveal animations — append a trailing comment:

```swift
Image(systemName: "gearshape")
    .font(.system(size: 17, weight: .medium))  // ds-allow: icon
```

Recurring values belong in the token block, not behind `ds-allow`. Full rationale lives in `DESIGN_SYSTEM.md` and the contributor summary in `AGENTS.md`.

### Example failure output

```
✗ hardcoded font size — use a Font.ink* token:
    InkIt/SomeView.swift:42:        .font(.system(size: 15))

Found 1 hardcoded design value(s) outside the design system.
Use a Font.ink* / Color token, or justify a true one-off with  // ds-allow: <reason>
```

## What CI does not cover

CI validates compile-time correctness and the four unit-test contracts. It does **not**:

- Run UI or integration tests against live Cartesia or LLM APIs
- Exercise Accessibility grants, microphone capture, or global hotkey registration
- Produce a notarized Release DMG (see the release workflow in local tooling)

For runtime issues outside the test suite, use debug logging and the runtime troubleshooting guides.

## Related pages

<CardGroup cols={2}>
<Card title="Build from source" href="/build-from-source">
Generate the Xcode project, understand Debug vs Release signing, and install locally.
</Card>

<Card title="Contributing" href="/contributing">
Design-system token rules, commit-message conventions, and what CI enforces on every PR.
</Card>

<Card title="STT troubleshooting" href="/stt-troubleshooting">
Diagnose Cartesia transcription failures guarded by STTFailureRoutingTests.
</Card>

<Card title="Paste and focus reference" href="/paste-and-focus-reference">
AX budget walks, paste timing, and the production code behind AXBudgetTests.
</Card>
</CardGroup>

---

## 21. Release and distribution

> Ship InkIt: tools/make-dmg.sh, tools/make-appcast.sh, tools/publish-release.sh, Sparkle SUFeedURL and UpdateManager custom pill flow, version bump via tools/bump-version.sh, and GitHub release assets (InkIt.dmg, appcast.xml).

- Page Markdown: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/21-release-and-distribution.md
- Generated: 2026-06-15T22:12:34.279Z

### Source Files

- `tools/publish-release.sh`
- `tools/make-dmg.sh`
- `tools/make-appcast.sh`
- `tools/bump-version.sh`
- `InkIt/UpdateManager.swift`
- `project.yml`
- `Config/Sparkle.xcconfig`

---
title: "Release and distribution"
description: "Ship InkIt: tools/make-dmg.sh, tools/make-appcast.sh, tools/publish-release.sh, Sparkle SUFeedURL and UpdateManager custom pill flow, version bump via tools/bump-version.sh, and GitHub release assets (InkIt.dmg, appcast.xml)."
---

InkIt ships as a notarized `InkIt.dmg` on GitHub Releases at `cartesia-ai/InkIt`, with Sparkle auto-updates driven by `appcast.xml` hosted on the same release. Maintainers bump version in `project.yml`, build and staple the DMG with `tools/make-dmg.sh`, sign the appcast with `tools/make-appcast.sh`, and publish assets with `tools/publish-release.sh`. Installed apps poll `SUFeedURL` and surface update state through `UpdateManager` and the Home `UpdatePill` instead of Sparkle's default windows.

## Release pipeline

```text
project.yml (version) ──► bump-version.sh
        │
        ▼
make-dmg.sh ──► build/dist/InkIt.dmg (notarized, stapled)
        │
        ├──► make-appcast.sh ──► build/dist/appcast.xml
        │
        ▼
publish-release.sh ──► GitHub Release vX.Y.Z
        │                 ├── InkIt.dmg          (stable README download)
        │                 ├── InkIt_X.Y.Z_arm64.dmg
        │                 └── appcast.xml      (Sparkle feed)
        ▼
Installed InkIt ──► SUFeedURL ──► UpdateManager ──► UpdatePill
```

CI on `main` runs tests and a Release build but does not produce distributable artifacts. Release work is maintainer-local and requires a Developer ID certificate, notary credentials, and the `gh` CLI.

## Version numbers

`project.yml` is the source of truth for both marketing and build numbers. XcodeGen regenerates `InkIt/Info.plist` from the `properties` block on every `xcodegen generate`, so editing `Info.plist` alone does not survive the next build.

| Key | Role | Sparkle constraint |
|-----|------|-------------------|
| `CFBundleShortVersionString` | User-visible version (e.g. `0.1.2`) | Appears in update UI |
| `CFBundleVersion` | Monotonic build number (e.g. `4`) | Must strictly increase every release |

<ParamField body="level" type="patch | minor | major" default="patch">
Semantic bump level passed to `tools/bump-version.sh`.
</ParamField>

```bash
./tools/bump-version.sh          # patch: 0.1.2 → 0.1.3, build 4 → 5
./tools/bump-version.sh minor    # 0.1.2 → 0.2.0
./tools/bump-version.sh major    # 0.1.2 → 1.0.0
```

The script updates `project.yml` and mirrors values into the committed `InkIt/Info.plist`, prints the new marketing version on stdout, and refuses to run if git tag `v<new>` already exists. It does not commit — commit the version bump before building.

<Warning>
If `project.yml` lags behind published tags, `bump-version.sh` exits with an error. Set the version past the latest release tag manually, then bump again.
</Warning>

## One-time maintainer setup

<Steps>
<Step title="Configure Developer ID signing">

Copy `Config/Signing.local.xcconfig.example` to `Config/Signing.local.xcconfig` (gitignored) and set your Apple Team ID:

```bash
cp Config/Signing.local.xcconfig.example Config/Signing.local.xcconfig
```

```xcconfig
DEVELOPMENT_TEAM = YOUR_TEAM_ID
CODE_SIGN_STYLE = Manual
CODE_SIGN_IDENTITY = Developer ID Application
OTHER_CODE_SIGN_FLAGS = --timestamp --options=runtime
```

Without this file, Release builds fall back to ad-hoc signing in `Config/Signing.xcconfig` — fine for local dev, not for distribution.

</Step>

<Step title="Store notarization credentials">

```bash
xcrun notarytool store-credentials inkit-notary \
  --apple-id <your-apple-id-email> \
  --team-id <your-team-id> \
  --password <app-specific-password>
```

`tools/make-dmg.sh` expects the Keychain profile name `inkit-notary`.

</Step>

<Step title="Install build tools">

```bash
brew install xcodegen create-dmg gh
```

</Step>

<Step title="Configure Sparkle EdDSA keys">

Generate Sparkle's EdDSA keypair on the release maintainer's Mac (see [Sparkle documentation](https://sparkle-project.org/documentation/)). Paste the **public** key into `Config/Sparkle.xcconfig`:

```xcconfig
SPARKLE_PUBLIC_ED_KEY = <your-public-key>
```

The private key stays in the maintainer's login Keychain and is used only when `generate_appcast` signs appcast entries. `Config/Signing.xcconfig` includes `Sparkle.xcconfig`, so `SUPublicEDKey` resolves at build time via `$(SPARKLE_PUBLIC_ED_KEY)` in `project.yml`.

</Step>
</Steps>

## Build the DMG

`tools/make-dmg.sh` produces `build/dist/InkIt.dmg` — a notarized, stapled disk image ready to upload.

<Steps>
<Step title="Generate project and build Release">

The script runs `xcodegen generate`, then `xcodebuild` with `-configuration Release` and `-derivedDataPath build`. Output lands at `build/Build/Products/Release/InkIt.app`.

</Step>

<Step title="Verify code signature">

`codesign --verify --deep --strict` runs on the `.app`. The script also scans Sparkle nested binaries (`Updater.app`, `Autoupdate`, `*.xpc`) for ad-hoc signatures — a fast-fail before the ~5 minute notarization round-trip. Ad-hoc Sparkle helpers are re-signed by the **Deep-sign Sparkle helpers** post-build script in `project.yml` when a Developer ID identity is present.

</Step>

<Step title="Package and sign the DMG">

The app is staged, background art is regenerated via `tools/generate_dmg_background.swift`, and `create-dmg` lays out a 600×400 window with the app icon and Applications drop link. The resulting DMG is signed with `Developer ID Application`.

</Step>

<Step title="Notarize and staple">

```bash
xcrun notarytool submit build/dist/InkIt.dmg --keychain-profile inkit-notary --wait
xcrun stapler staple build/dist/InkIt.dmg
xcrun stapler validate build/dist/InkIt.dmg
```

On failure, the script fetches the notarization log by submission ID.

</Step>
</Steps>

<RequestExample>

```bash
./tools/make-dmg.sh
```

</RequestExample>

<ResponseExample>

```text
Notarized DMG ready: build/dist/InkIt.dmg
Verify on a clean machine: spctl -a -t open --context context:primary-signature build/dist/InkIt.dmg
```

</ResponseExample>

## Generate the Sparkle appcast

Run `tools/make-appcast.sh` after `make-dmg.sh`. It copies the DMG into `build/dist/appcast/`, invokes Sparkle's `generate_appcast` (resolved from `build/**/SourcePackages/artifacts/sparkle/Sparkle/bin/generate_appcast`), and writes `build/dist/appcast.xml`.

<ParamField body="dmg_path" type="string" default="build/dist/InkIt.dmg">
Optional first argument — path to the DMG to include in the appcast.
</ParamField>

Critical configuration:

| Setting | Value | Why |
|---------|-------|-----|
| `--download-url-prefix` | `https://github.com/cartesia-ai/InkIt/releases/latest/download/` | **Trailing slash required.** Without it, RFC 3986 path joining collapses `.../latest/download` + `InkIt.dmg` into `.../latest/InkIt.dmg`, which 404s and silently breaks auto-update. |
| `--maximum-versions` | `1` | Feed carries only the latest release entry. |
| `SPARKLE_PUBLIC_ED_KEY` | from `Config/Sparkle.xcconfig` | Embedded in the built app's `SUPublicEDKey`; appcast signatures must match. |

The generated appcast enclosure URL must resolve to:

```text
https://github.com/cartesia-ai/InkIt/releases/latest/download/InkIt.dmg
```

`publish-release.sh` validates this before upload.

## Publish to GitHub Releases

`tools/publish-release.sh` creates the GitHub release from pre-built artifacts. It handles tags, asset upload, and preflight checks — not building.

<ParamField body="version" type="string">
Marketing version (e.g. `0.2.0`; leading `v` tolerated). Omit to read `CFBundleShortVersionString` from `InkIt/Info.plist` — the version the DMG was built with.
</ParamField>

<ParamField body="--draft" type="flag">
Stage as a draft release for review before publishing. Default: publish immediately.
</ParamField>

<Steps>
<Step title="Preflight">

- `gh` CLI installed and authenticated
- `build/dist/InkIt.dmg` exists
- Tag `v<version>` does not already exist
- DMG passes `xcrun stapler validate` and `spctl -a -t open`
- Git state is clean (or you confirm continuing with uncommitted changes)
- Local `main` matches `origin/main` (or you confirm continuing)

</Step>

<Step title="Assemble assets">

| Asset | Purpose |
|-------|---------|
| `InkIt_<version>_arm64.dmg` | Versioned artifact on the releases page |
| `InkIt.dmg` | Stable name — README download button and appcast enclosure |
| `appcast.xml` | Sparkle auto-update feed (optional but required for existing users to update) |

If `appcast.xml` is missing, the script prompts to confirm publishing without an auto-update feed.

</Step>

<Step title="Create release">

```bash
gh release create v<version> <assets...> \
  --repo cartesia-ai/InkIt \
  --target main \
  --title "InkIt <version>" \
  --generate-notes
```

</Step>

<Step title="Verify">

```bash
gh release view v<version> --repo cartesia-ai/InkIt --json assets -q '.assets[].name'
```

Published download URL for new installs:

```text
https://github.com/cartesia-ai/InkIt/releases/latest/download/InkIt.dmg
```

</Step>
</Steps>

<RequestExample>

```bash
./tools/bump-version.sh patch
git add project.yml InkIt/Info.plist && git commit -m "Bump version to 0.1.3"
git push origin main

./tools/make-dmg.sh
./tools/make-appcast.sh
./tools/publish-release.sh          # reads version from Info.plist
# or
./tools/publish-release.sh 0.1.3 --draft
```

</RequestExample>

<Note>
`publish-release.sh` warns if the requested version differs from `Info.plist` — the DMG contents reflect whatever version was baked in at build time, not the tag you pass at publish time.
</Note>

## Sparkle configuration in the app

Sparkle is wired through `Info.plist` properties generated from `project.yml`:

| Plist key | Value |
|-----------|-------|
| `SUFeedURL` | `https://github.com/cartesia-ai/InkIt/releases/latest/download/appcast.xml` |
| `SUPublicEDKey` | `$(SPARKLE_PUBLIC_ED_KEY)` from `Config/Sparkle.xcconfig` |
| `SUEnableAutomaticChecks` | `true` |

`UpdateManager` starts Sparkle only when both `SUFeedURL` and `SUPublicEDKey` resolve to non-empty strings without unresolved `$(...)` build-setting placeholders. `AppDelegate.applicationDidFinishLaunching` calls `UpdateManager.shared.start()`.

## Custom update pill flow

`UpdateManager` implements `SPUUserDriver` to route all update UI through InkIt's Home pill (`UpdatePill`) instead of Sparkle's standard windows. The menu item **Check for Updates…** uses the same driver; it adds `NSAlert` dialogs only for user-initiated "up to date" and error cases the pill cannot show.

```mermaid
stateDiagram-v2
    [*] --> idle
    idle --> available: background check finds update
    available --> updating: user taps "Update now"
    updating --> ready: download and extract complete
    ready --> idle: user taps "Restart now" → relaunch
    available --> idle: dismiss / error
    updating --> idle: error
    idle --> idle: check finds no update
```

| Phase | Pill label | Action |
|-------|-----------|--------|
| `.idle` | Hidden | No update pending |
| `.available` | "New app version available" | **Update now** → `installNow()` invokes Sparkle's `updateFoundReply(.install)` |
| `.updating` | "Updating…" + spinner | Silent download and extraction in progress |
| `.ready` | "Update ready" | **Restart now** → `restartNow()` invokes `installReply(.install)` |

`UpdatePill` sits bottom-center on Home as a floating, undismissable overlay. Sparkle's permission prompt is auto-accepted with automatic checks enabled and no system profile sent. Release notes windows are suppressed (`showUpdateReleaseNotes` is a no-op).

<Tip>
After publishing, verify auto-update end-to-end: install the previous release, launch InkIt, confirm the pill appears when a newer `appcast.xml` is live, and complete the download → restart cycle.
</Tip>

## Failure modes

| Symptom | Likely cause | Fix |
|---------|-------------|-----|
| `Config/Signing.local.xcconfig missing` | No Developer ID config | Copy and fill the example xcconfig |
| Ad-hoc nested binaries in Sparkle | Deep-sign post-build script skipped or failed | Rebuild Release with Developer ID; check `project.yml` post-build script |
| Notarization rejected | Unsigned nested code, entitlements, or hardened runtime issue | Read `notarytool log`; fix signing, rebuild DMG |
| `appcast enclosure URL is wrong` | Missing trailing slash on `--download-url-prefix` | Regenerate with `tools/make-appcast.sh` |
| Auto-update 404s silently | Appcast points to `.../latest/InkIt.dmg` instead of `.../latest/download/InkIt.dmg` | Same — regenerate appcast with correct prefix |
| `tag vX.Y.Z already exists` | Version not bumped | Run `./tools/bump-version.sh` |
| Pill never appears | Sparkle keys or feed URL unresolved in built app | Confirm `SPARKLE_PUBLIC_ED_KEY` in `Config/Sparkle.xcconfig` and rebuild |
| Gatekeeper rejects DMG | Not stapled or unsigned | Re-run `make-dmg.sh`; validate with `spctl` |

## GitHub release assets summary

:::files
build/dist/
├── InkIt.dmg              # notarized DMG (uploaded as stable + versioned copy)
├── appcast.xml            # signed Sparkle feed
└── appcast/
    ├── InkIt.dmg          # input copy for generate_appcast
    └── appcast.xml        # intermediate output
:::

On GitHub, three assets attach to each release:

- **`InkIt.dmg`** — stable download URL used by the README button and Sparkle enclosure
- **`InkIt_<version>_arm64.dmg`** — versioned archive on the releases page
- **`appcast.xml`** — polled by installed apps via `SUFeedURL`

## Related pages

<CardGroup>
<Card title="Build from source" href="/build-from-source">
Local Release builds, ad-hoc vs Developer ID signing, Sparkle deep-sign post-build script, and installing to `/Applications`.
</Card>
<Card title="Installation" href="/installation">
End-user install from the release DMG and runtime prerequisites.
</Card>
<Card title="Testing and CI" href="/testing-and-ci">
CI test and Release build steps — what automation covers vs maintainer-only release scripts.
</Card>
<Card title="Overview" href="/overview">
Product surface including Sparkle updates in the feature set.
</Card>
</CardGroup>

---

## 22. Contributing

> Fork-only contribution policy, feedback form, design-system token rules (Font.ink*, Color.canvas, ds-allow escape hatch), DESIGN_SYSTEM.md reference, commit-message conventions in AGENTS.md, and CI-enforced checks.

- Page Markdown: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/pages/22-contributing.md
- Generated: 2026-06-15T22:12:22.630Z

### Source Files

- `.github/CONTRIBUTING.md`
- `AGENTS.md`
- `DESIGN_SYSTEM.md`
- `InkIt/InkItApp.swift`
- `tools/check-design-tokens.sh`
- `LICENSE`

---
title: "Contributing"
description: "Fork-only contribution policy, feedback form, design-system token rules (Font.ink*, Color.canvas, ds-allow escape hatch), DESIGN_SYSTEM.md reference, commit-message conventions in AGENTS.md, and CI-enforced checks."
---

InkIt is Apache-2.0 open source, but upstream does not accept outside pull requests. You may read, build, learn from, and fork the repository freely; active development stays with the Cartesia team. Fork maintainers and internal contributors follow the design-token rules in `AGENTS.md` and `DESIGN_SYSTEM.md`, with mechanical enforcement via `tools/check-design-tokens.sh` on every CI run.

## Contribution policy

| Path | Status |
|---|---|
| Read, build, and fork under Apache-2.0 | Allowed |
| Submit pull requests to `cartesia-ai/InkIt` | Not accepted |
| Report bugs or share feedback | Use the feedback form (below) |

The repository is public for transparency and learning, not for community-driven merges. Cartesia keeps development in one place to move quickly and keep the codebase coherent. If you want a change upstream does not ship, fork the repo and maintain your own variant — that is what the license is for.

<Warning>
Pull requests to the upstream repository are not reviewed or merged. Do not open PRs expecting integration; fork instead.
</Warning>

## License and forking

InkIt is distributed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). You may reproduce, modify, and distribute the work or derivative works in source or object form, provided you include the license, retain attribution notices, and mark modified files. Copyright is held by Cartesia AI (2026).

Forking is the intended path for custom features, experiments, or long-lived private variants. No upstream approval is required beyond complying with the license terms.

## Feedback and bug reports

Upstream does not use GitHub Issues for product feedback. Submit bugs, ideas, and usage notes through the feedback form:

https://forms.gle/jXNtDsTaLt2rKQ8N9

The team reads all submissions. Not every item leads to a change, but feedback informs how InkIt is used and what to prioritize internally.

## Design system — use tokens, never hardcode

InkIt has one source of truth for typography and color: the `Font` and `Color` extensions at the top of `InkIt/InkItApp.swift`, backed by the asset catalog. Full rationale, token tables, and interaction patterns live in `DESIGN_SYSTEM.md`. `AGENTS.md` summarizes the rules for humans and coding agents.

### Typography tokens

Display text must use a named `Font.ink*` token — never a bare `.font(.system(size: N))` at a call site.

| Token family | Examples | Use |
|---|---|---|
| Titles | `inkLargeTitle`, `inkTitle`, `inkBanner`, `inkSheetTitle` | Onboarding hero, pane headers, Home banner |
| Body | `inkBody`, `inkBodyEmphasized`, `inkReading`, `inkReadingEmphasized` | Transcripts, row labels, Try-It practice text |
| UI chrome | `inkHeadline`, `inkCallout`, `inkNav`, `inkSectionHeader`, `inkEyebrow`, `inkCaption` | Cards, nav, grouped headers, metadata |
| Credentials | `inkMono` | API-key fields |
| Notch HUD | `inkNotchBrand`, `inkNotchLabel` | Fixed-size menu-bar strip only |

If a size or weight is missing from the scale, add a new token to the extension block with a doc comment — do not inline the literal.

### Color tokens

Views reference semantic tokens, not raw RGB or hex values.

| Category | Tokens | Notes |
|---|---|---|
| Warm-paper surfaces | `Color.canvas`, `Color.surface`, `Color.lift`, `Color.card`, `Color.paper` | Asset-catalog neutrals for every surface |
| Brand | `Color.accentColor`, `Color.accentSoft`, `Color.recordingAmber`, `Color.diffAdd` | Amber accent; recording signal on the HUD |
| System semantics | `.primary`, `.secondary`, `.tertiary`, `Color(nsColor: .separatorColor)` | Text and hairlines; adapt to appearance |
| Fixed surfaces | `Color.hudPill`, `Color.scrim` | Always-dark HUD pill; modal dimming |

The notch HUD stays dark in all appearances and is exempt from the appearance picker. Only `recordingAmber` adds color to live feedback on that surface.

### Shape, motion, and elevation

Corner radii come from `enum Radius` in `InkItApp.swift`. Drop-shadow inks use `enum Elevation`; modal backdrops use `Color.scrim`. Animation curves use `enum Motion` — not raw `.easeOut(duration:)` at call sites. Hover affordances use `enum Hover` and the `.hoverBackdrop()` modifier.

### Do and don't

**Do**
- Use `.font(.inkBody)` (or another `Font.ink*` token) for every piece of display text.
- Use `Color.canvas`, `.secondary`, and other semantic tokens for UI color.
- Add a new token when the scale lacks the size or weight you need.

**Don't**
- Re-enter `.font(.system(size: N))` for running text or recurring UI.
- Write `Color(red:…)`, hex literals, or other raw color constructors for app chrome.
- Inline corner radii, shadow opacities, or animation durations that belong on the shared scales.

## The `ds-allow` escape hatch

Genuine one-offs may opt out of the lint with a trailing same-line comment that names the reason:

```swift
Image(systemName: "gearshape").font(.system(size: 17, weight: .medium))  // ds-allow: icon
```

Sanctioned one-offs include SF Symbol icon glyphs sized to their container, the always-dark notch HUD micro-type, the dual-appearance `AppearanceThumbnail` preview, and bespoke animations (reveals, repeating pulses). Use sparingly. If a value recurs or is plain running text, it belongs on the token scale — not behind `ds-allow`.

`tools/check-design-tokens.sh` skips lines containing `ds-allow`, token definitions (`static let`), and comment-only lines.

## Local design-token check

Run the guard script before pushing any Swift UI changes:

```sh
./tools/check-design-tokens.sh
```

The script scans `InkIt/*.swift` and fails on unjustified literals:

| Pattern | Label | Fix |
|---|---|---|
| `system(size: N)` | Hardcoded font size | Use a `Font.ink*` token |
| `Color(red:` / `Color(white:` | Raw color literal | Use a `Color` token |
| `easeOut(duration:` | Hardcoded animation | Use a `Motion.*` token |
| `.black.opacity(` | Raw shadow/scrim ink | Use `Elevation.*` or `Color.scrim` |

Success prints `✓ design tokens: no unjustified literals`. Failures list file paths and line numbers; exit code is `1`.

## CI-enforced checks

The `.github/workflows/ci.yml` workflow runs on pushes and pull requests to `main` (macOS 15, Xcode). A PR that breaks mechanical rules will not pass CI.

<Steps>
<Step title="Check design tokens">

```sh
./tools/check-design-tokens.sh
```

First step in CI; fails the job on any unjustified literal.

</Step>
<Step title="Generate Xcode project">

```sh
brew install xcodegen   # CI installs via Homebrew
xcodegen generate
```

`InkIt.xcodeproj` is generated from `project.yml` — edit `project.yml`, not the `.pbxproj`.

</Step>
<Step title="Run unit tests">

```sh
xcodebuild test \
  -project InkIt.xcodeproj \
  -scheme InkIt \
  -configuration Debug \
  -destination 'platform=macOS'
```

</Step>
<Step title="Build Release">

```sh
xcodebuild build \
  -project InkIt.xcodeproj \
  -scheme InkIt \
  -configuration Release \
  -destination 'platform=macOS'
```

Verifies the Release configuration compiles after tests pass.

</Step>
</Steps>

Fork maintainers can mirror this pipeline locally. See [Testing and CI](/testing-and-ci) for test targets and workflow detail.

## Build and test (fork workflow)

For a local fork:

```sh
xcodegen generate
xcodebuild -project InkIt.xcodeproj -scheme InkIt -configuration Debug \
  -destination 'platform=macOS' test
```

Replace `/Applications/InkIt.app` after local builds when testing the installed app. See [Build from source](/build-from-source) for signing, XcodeGen, and install steps.

## Commit message conventions

`AGENTS.md` defines message style for the public repository. Messages stay factual and low-profile.

**Subject line**
- Imperative mood; states what changed mechanically.
- No product framing, editorializing, or intent cues.
- Avoid subjects like "prep for open source", "visual-first landing page", or "leaner repo".

**Body (optional)**
- Technical mechanism only — e.g. why a fix works.
- No narrative about product or business goals.

Coding agents should propose commit messages for approval before committing.

<AccordionGroup>
<Accordion title="Subject examples">

| Avoid | Prefer |
|---|---|
| Add privacy gating for onboarding | Gate Cartesia key field behind onboarding step |
| Visual-first landing page refresh | Replace Home empty state with stats rail |
| Leaner/cleaner repo layout | Move design-token script to tools/ |

</Accordion>
</AccordionGroup>

## Key reference files

| File | Role |
|---|---|
| `.github/CONTRIBUTING.md` | Public contribution policy and feedback link |
| `AGENTS.md` | Agent and maintainer rules: tokens, build, commits |
| `DESIGN_SYSTEM.md` | Full design direction, token tables, interaction patterns |
| `InkIt/InkItApp.swift` | Token definitions: `Color`, `Font`, `Radius`, `Elevation`, `Motion`, `Hover` |
| `tools/check-design-tokens.sh` | Lint script run locally and in CI |
| `.github/workflows/ci.yml` | CI job: tokens → xcodegen → test → Release build |
| `LICENSE` | Apache-2.0 terms |

## Related pages

<CardGroup>
<Card title="Build from source" href="/build-from-source">
Generate the Xcode project, sign Debug/Release builds, and install to `/Applications`.
</Card>
<Card title="Testing and CI" href="/testing-and-ci">
Local `xcodebuild test` commands, CI workflow steps, and test target overview.
</Card>
<Card title="Overview" href="/overview">
What InkIt exposes: dictation pipeline, HUD, history, and runtime requirements.
</Card>
</CardGroup>

---
