# Quickstart

> First successful dictation: launch InkIt, complete onboarding (permissions, Cartesia API key, Try It trial), hold or toggle the default hotkey, speak, release, and verify text pastes at the cursor. Expected HUD states, success signals, and recovery when no editable field is focused.

- Repository: cartesia-ai/InkIt
- GitHub: https://github.com/cartesia-ai/InkIt
- Human docs: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b
- Complete Markdown: https://www.grok-wiki.com/public/docs/cartesia-ai-inkit-18975554254b/llms-full.txt

## Source Files

- `README.md`
- `InkIt/OnboardingView.swift`
- `InkIt/AppCoordinator.swift`
- `InkIt/PermissionsService.swift`
- `InkIt/TryItPracticeCard.swift`

---

---
title: "Quickstart"
description: "First successful dictation: launch InkIt, complete onboarding (permissions, Cartesia API key, Try It trial), hold or toggle the default hotkey, speak, release, and verify text pastes at the cursor. Expected HUD states, success signals, and recovery when no editable field is focused."
---

InkIt’s first-run path is `OnboardingRootView` → `hasCompletedOnboarding = true` → `MainWindowView`, with the global hotkey registered only after onboarding (except during the Try It trial). Default shortcut is **Fn** (`HotkeyBinding.fn`) in **hold-to-talk** mode (`DictationMode.hold`); release triggers Cartesia Ink-2 STT, optional polish, and `PasteService` Cmd+V insertion at the focused editable field.

## Prerequisites

| Requirement | Why |
| --- | --- |
| macOS 14+ on Apple silicon | Deployment target enforced by the app bundle |
| InkIt installed to `/Applications` | Duplicate bundle instances break Accessibility grants |
| Microphone permission | `AVCaptureDevice` audio capture for 16 kHz PCM |
| Accessibility permission | Global hotkey tap and synthetic paste require `AXIsProcessTrusted` |
| Cartesia API key | Stored in Keychain (`cartesiaAPIKey`); drives Ink-2 WebSocket STT |

<Note>
Polish is optional and off by default (`correctionEnabled` is `false` until you enable it). First dictation needs only the Cartesia key.
</Note>

## Onboarding sequence

`OnboardingStep` advances in order: `welcome` → `permissions` → `apiKey` → `tryIt` → `done`. Progress dots gate forward jumps until each step’s requirement is satisfied live (for example, clearing the API key re-locks later steps).

<Steps>
<Step title="Launch InkIt">

Open InkIt from Applications. `RootView` shows `OnboardingRootView` while `hasCompletedOnboarding` is `false`.

</Step>

<Step title="Grant permissions">

On **Permissions**, enable both cards:

- **Microphone** — `PermissionsService.requestMicrophone` fires the TCC prompt; denied/restricted states switch to `needsManual` with a link to **Privacy & Security ▸ Microphone**.
- **Accessibility** — `requestAccessibility` calls `AXIsProcessTrustedWithOptions(prompt: true)` once, pre-adds InkIt to the Accessibility list, and opens **Privacy & Security ▸ Accessibility**. After a deny or stale in-process trust bit, the card shows manual steps; granting may trigger a silent relaunch that resumes at the permissions step (`resumeOnboardingAtPermissions`).

`PermissionsService` polls every 0.5 s so toggles in System Settings are detected without restarting. **Continue** appears only when `hasMicrophone && hasAccessibility`.

</Step>

<Step title="Enter Cartesia API key">

On **API key**, paste a key from [play.cartesia.ai/keys](https://play.cartesia.ai/keys) into the masked field. `CartesiaKeyValidator` shows inline status (checking, verified, invalid, couldn’t verify). **Continue** requires a non-empty trimmed key (~15,000 words/month on the free tier).

</Step>

<Step title="Complete Try It">

On **Try it**, read the sample line aloud while holding the default shortcut. `TryItPracticeCard` calls `beginOnboardingTrial()`, registers the hotkey, and shows the real notch HUD. Hold **Fn**, speak, release; the transcript lands in **What InkIt heard** (no live word-by-word preview during the trial). Edit if needed, then send or choose **Skip for now**. Finish on **You're ready!** with **Start using InkIt**, which sets `hasCompletedOnboarding = true`.

</Step>
</Steps>

## Dictate with the default hotkey

After onboarding, `AppCoordinator.refreshHUD()` creates `NotchHUDController` and registers the stored binding.

| Setting | Default | User-facing gesture |
| --- | --- | --- |
| `hotkey` | `.fn` (displayed as `🌐 fn`) | Hold Fn while speaking |
| `dictationMode` | `.hold` | Release to finalize and paste |
| `dictationMode` | `.toggle` (if changed) | Press once to start, again to stop and paste |

<Tabs>
<Tab title="Hold to talk (default)">

1. Focus an editable field in any app (TextEdit, browser, chat, terminal).
2. Press and hold **Fn**.
3. Wait for the notch HUD waveform (see below), then speak.
4. Release **Fn**. InkIt finalizes STT, optionally polishes, and pastes via Cmd+V.

</Tab>
<Tab title="Hands-free toggle">

1. Focus an editable field.
2. Press **Fn** once to start recording.
3. Speak; press **Fn** again to stop, finalize, and paste.

</Tab>
</Tabs>

<Warning>
If Accessibility is missing when you press the hotkey, the notch shows **Accessibility needed** and System Settings opens at most once every 10 seconds. Grant Accessibility to the single InkIt bundle in `/Applications`.
</Warning>

## Notch HUD states

During onboarding Try It and post-onboarding dictation, the notch island is the live status surface. Finalizing, polishing, and pasting run silently in the background; the menu bar label still reflects them.

| Phase | `DictationState` | Notch island | Menu bar label |
| --- | --- | --- | --- |
| Mic warming up | `.recording` + `audioReady == false` | `InkIt` + pulsing preparing dot | `● Ink` |
| Capturing speech | `.recording` + `audioReady == true` | `InkIt` + live waveform (`inputLevel`) | `● Ink` |
| Released / processing | `.finalizing`, `.rewriting`, `.pasting` | Hidden (collapsed into notch) | `… Ink` / `✎ Ink` / `↩ Ink` |
| No paste target | `.heldInHistory` | `InkIt • Saved to History` (~2.5 s) | `⬇ Ink` |
| Failure | `.error(message)` | `InkIt ⚠ <message>` | `⚠ Ink` |
| Idle | `.idle` | Hidden | `Ink` |

<Tip>
`audioReady` flips when input level exceeds `readyLevelThreshold` (0.03) or after a 0.6 s fallback. On Bluetooth mics, wait for the waveform before speaking — the device may need 200–500 ms to switch from A2DP to HFP.
</Tip>

On Macs without a physical notch, the same content appears in a floating capsule below the menu bar.

## Success signals

### Try It (onboarding)

- Amber recording ring on the key cap while `state == .recording`.
- After release, the full transcript appears at once in **What InkIt heard** (trial suppresses interim `liveTranscript` updates).
- Green checkmark and enabled send button when `editedText` is non-empty and not mid-take.
- Take logged to `TranscriptHistoryStore` as polish `.off` with transcribe-only latency.

### First paste in any app

1. Open a text field in another app and place the cursor.
2. Hold **Fn**, speak a short phrase, release.
3. Polished or raw text appears at the cursor without manual copy-paste.
4. A new row appears in InkIt **History** with latency breakdown when paste succeeds.

<Check>
Paste success ends in `DictationState.idle` with no notch error. The menu bar returns to **Ink**.
</Check>

## When no editable field is focused

At release, `FocusedEditable.current()` re-checks whether an editable AX element has focus. If `focus.isEditable` is false — no text field, non-editable surface, or focus moved during dictation — InkIt does not fire Cmd+V.

Instead:

1. Transcript is saved to **History** (with polish outcome and latency).
2. `showHeldInHistoryNotice()` sets `DictationState.heldInHistory`.
3. Notch shows **Saved to History** for ~2.5 s, then returns to idle.
4. Copy the text from History or focus a field and dictate again.

<Info>
Pressing the hotkey during `heldInHistory` or `.error` starts a new take immediately — those states do not block the next dictation.
</Info>

## Quick recovery

| Symptom | Likely cause | Fix |
| --- | --- | --- |
| Notch: **Add your API key** | Empty `cartesiaAPIKey` | Settings or re-run onboarding API key step |
| Notch: **Mic access needed** | Microphone denied | System Settings ▸ Microphone → InkIt |
| Notch: **Accessibility needed** | AX not trusted | System Settings ▸ Accessibility → InkIt (one bundle only) |
| Notch: **Invalid API key** / **Out of credits** | Cartesia billing | Update key at [play.cartesia.ai/keys](https://play.cartesia.ai/keys) |
| Text in History, not at cursor | No editable focus at release | Click a text field, copy from History, or dictate again |
| Notch: **Paste failed** | Target app rejected Cmd+V | Re-focus the field and retry |
| Empty take after release | Silence or STT returned empty string | State collapses to idle with no history row |

STT errors surface briefly in the notch (1.5 s after release if shown mid-hold) and may persist service-issue flags on Home until the next successful dictation.

## End-to-end flow (post-onboarding)

```mermaid
stateDiagram-v2
    [*] --> idle
    idle --> recording: hotkey press
    recording --> finalizing: hotkey release
    finalizing --> rewriting: STT complete, polish enabled
    finalizing --> pasting: STT complete, no polish
    rewriting --> pasting: rewrite done
    pasting --> idle: Cmd+V success
    finalizing --> heldInHistory: no editable focus
    rewriting --> heldInHistory: no editable focus
    heldInHistory --> idle: 2.5s timer
    recording --> error: missing perm / STT fail
    error --> idle: 1.5s after release
    idle --> recording: hotkey during heldInHistory or error
```

## Related pages

<CardGroup>
<Card title="Installation" href="/installation">
Install from the release DMG or build with XcodeGen before running onboarding.
</Card>
<Card title="Permissions model" href="/permissions-model">
Tri-state permission cards, polling, and the Accessibility relaunch path.
</Card>
<Card title="Configure Cartesia API key" href="/configure-cartesia-api-key">
Keychain storage, validation, and service-issue flags.
</Card>
<Card title="Dictation pipeline" href="/dictation-pipeline">
Hotkey → audio → STT → polish → paste → history in full detail.
</Card>
<Card title="Paste and focus reference" href="/paste-and-focus-reference">
`FocusedEditable` checks, `heldInHistory`, and Cmd+V timing.
</Card>
<Card title="Configure hotkeys and mode" href="/configure-hotkeys-and-mode">
Change the shortcut or switch to hands-free toggle.
</Card>
</CardGroup>
