# Sync audio and SFX

> Wire voiceover, explainer VO, click/toggle/type SFX tracks using data-start, data-duration, data-track-index, and data-volume on audio elements in index.html.

- Repository: heygen-com/hyperframes-launches
- GitHub: https://github.com/heygen-com/hyperframes-launches
- Human docs: https://www.grok-wiki.com/public/docs/heygen-com-hyperframes-launches-996f3eaa626b
- Complete Markdown: https://www.grok-wiki.com/public/docs/heygen-com-hyperframes-launches-996f3eaa626b/llms-full.txt

## Source Files

- `index.html`
- `voiceover.mp3`
- `voiceover_explainer.mp3`
- `transcript.json`
- `compositions/compose-tasklist.html`

---

---
title: "Sync audio and SFX"
description: "Wire voiceover, explainer VO, click/toggle/type SFX tracks using data-start, data-duration, data-track-index, and data-volume on audio elements in index.html."
---

All audio for the 53.3s `claude-paper` master cut is declared as sibling `<audio>` elements inside `#claude-paper` in `index.html`. HyperFrames reads `data-start`, `data-duration`, `data-track-index`, and `data-volume` on each element and schedules playback against the master clock. Scene GSAP timelines in `compositions/*.html` drive visuals only; audio timing is authored separately and must be converted from scene-local seconds to master seconds.

## Architecture

```text
index.html (#claude-paper, data-duration="53.3")
├── <div data-composition-src="…">  ×10   ← visual sections (data-track-index 1–11)
└── <audio src="…">                   ×87   ← voiceover + SFX (data-track-index 8, 100–185)
         │
         ▼
HyperFrames runtime  →  muxes tracks at master time t ∈ [0, 53.3]
Scene GSAP timelines →  visuals only (window.__timelines, no audio API)
```

The root GSAP script in `index.html` handles cross-section opacity and seam transitions. It does not start, stop, or volume-mix audio. When a cursor click or keystroke needs sound, add or adjust an `<audio>` row in `index.html` whose `data-start` equals the section's `data-start` plus the scene timeline's local trigger time.

<Note>
Master time formula: `data-start (audio) = data-start (section) + local_scene_time`. Example: Scotsman VO plays when `compose-ui` (section `data-start="25.8"`) reaches local `R = 3.8` → `25.8 + 3.8 = 29.6`.
</Note>

## Audio element contract

Each track is a plain HTML `<audio>` element with four scheduling attributes and a `src` pointing at a file in the `claude-paper-launch/` root.

<ParamField body="data-start" type="number (seconds)" required>
Absolute offset on the master timeline where playback begins. Uses the same clock as section `data-start` on `#claude-paper` (0 at cut start).
</ParamField>

<ParamField body="data-duration" type="number (seconds)" required>
How long HyperFrames plays or holds this clip on the master timeline. For voiceover, match the audible segment (Scotsman VO is trimmed to 2.17s). For SFX, match the source file length (`click.mp3` → 0.07, `toggle.mp3` → 2.67, `typenew.mp3` → 0.57).
</ParamField>

<ParamField body="data-track-index" type="integer" required>
Mixer lane identifier. Must be unique across all `<audio>` elements and distinct from visual section indices unless intentionally shared. Current cut uses 8 for Scotsman VO, 184 for explainer VO, 100–185 for generated SFX.
</ParamField>

<ParamField body="data-volume" type="number (0–1)" required>
Per-track gain. Voiceover tracks use `1`. UI SFX use `0.85` for `click.mp3` and `toggle.mp3`, `0.2` for `typenew.mp3`.
</ParamField>

Minimal voiceover row:

```html
<audio
  id="vo"
  src="voiceover.mp3"
  data-start="29.6"
  data-duration="2.17"
  data-track-index="8"
  data-volume="1"
></audio>
```

SFX rows follow the same shape; the comment block in `index.html` documents the generation rule: one `click.mp3` per cursor tap, `toggle.mp3` on the Hyperframes toggle, and one `typenew.mp3` per keystroke with humanized swell timing.

## Track index layout

| Range | Role | Count | Notes |
|-------|------|-------|-------|
| 1–11 | Visual sections | 10 | On `<div>` section wrappers, not `<audio>` |
| 8 | Scotsman VO | 1 | `#vo` — angry Scotsman inside compose player |
| 100–108, 183–185 | Click SFX | 11 | `click.mp3`, 0.07s, volume 0.85 |
| 102 | Toggle SFX | 1 | `toggle.mp3`, 2.67s, volume 0.85 |
| 109–182 | Type SFX | 74 | `typenew.mp3`, 0.57s, volume 0.2 |
| 184 | Explainer VO | 1 | `#vo-explainer` — neutral presenter over TSLA explainer |

<Warning>
`data-track-index` values are not required to be sequential. The cut skips index 9 on sections (jumps from `thinking-big-2` at 9 to `compose-tasklist` at 10) and uses non-contiguous SFX indices (e.g. `sfx-click-9` at 183, `sfx-click-10` at 185). Assign a new unique integer for every added track.
</Warning>

## Voiceover tracks

### Scotsman VO (`#vo`)

| Field | Value |
|-------|-------|
| `src` | `voiceover.mp3` |
| `data-start` | `29.6` |
| `data-duration` | `2.17` |
| `data-track-index` | `8` |
| `data-volume` | `1` |

Plays while the generated video runs inside the compose player in `compositions/compose-ui.html`. The scene defines `R = VS + 0.3` (local 3.8s) as video + VO start and `PEND = R + 2.17` as the pause tap that cuts off "numpties". Those map to master 29.6s and 31.77s. On-screen captions in `compose-ui` use offsets from `R` (e.g. `R + 0.11` for "Right.", `R + 1.03` for "Listen up, you wee numpties.").

`transcript.json` holds word-level `{ text, start, end }` entries for the Scotsman clip (first word at 0.11s, last at 7.28s in file time). The master `<audio>` element plays only the first 2.17s; captions and scrubber UI in the scene are authored against the truncated segment, not the full file.

### Explainer VO (`#vo-explainer`)

| Field | Value |
|-------|-------|
| `src` | `voiceover_explainer.mp3` |
| `data-start` | `42.6` |
| `data-duration` | `6.41` |
| `data-track-index` | `184` |
| `data-volume` | `1` |

Plays over the 6s TSLA explainer inside `compositions/compose-tasklist.html`. Scene local `R = VS + 0.3` with `VS = 2.0` yields master `40.3 + 2.3 = 42.6`. Caption swaps at `R + 3.4` ("our base case puts fair value near $1,450") align with ElevenLabs phrase timing noted in the scene script.

## Generated SFX inventory

HyperFrames does not derive SFX from GSAP. Each audible UI moment has a pre-declared `<audio>` row.

### Click tracks (`click.mp3`)

| ID | Master `data-start` | Scene anchor |
|----|---------------------|--------------|
| `sfx-click-0` | 1.60 | `connector-morph` + tap at 1.6 |
| `sfx-click-1` | 3.10 | `connector-morph` Connectors row tap at 3.1 |
| `sfx-click-3` | 7.48 | `chat-response` composer tap at 0.78 (section 6.7) |
| `sfx-click-4` | 9.93 | `chat-response` send tap |
| `sfx-click-5` | 20.20 | `followup-type` composer tap at 0.8 (section 19.4) |
| `sfx-click-6` | 24.89 | `followup-type` send tap |
| `sfx-click-7` | 31.77 | `compose-ui` pause tap at `PEND` |
| `sfx-click-8` | 33.20 | `compose-ui` prompt-box tap during correction |
| `sfx-click-9` | 38.42 | `compose-ui` send tap at `SEND + 0.62` (section 25.8) |
| `sfx-click-10` | 49.30 | `compose-tasklist` download pill tap at `END` (section 40.3 + 9.0) |

### Toggle track (`toggle.mp3`)

| ID | Master `data-start` | `data-duration` | Scene anchor |
|----|---------------------|-----------------|--------------|
| `sfx-toggle` | 4.30 | 2.67 | `connector-morph` Hyperframes toggle tap at 4.3 |

### Type tracks (`typenew.mp3`)

74 elements (`sfx-type-0` … `sfx-type-73`) cover humanized keystroke rhythms in three typing bursts:

| Master window | Section | Local origin |
|---------------|---------|--------------|
| 7.80 – 9.51 | `chat-response` (6.7) | Prompt typing from ~1.1s |
| 20.80 – 23.70 | `followup-type` (19.4) | Follow-up message from ~1.4s |
| 33.57 – 37.30 | `compose-ui` (25.8) | Style-correction typing from ~7.77s |

Each row uses `data-duration="0.57"` and `data-volume="0.2"`. Stagger intervals mirror the scene's `charT` / `steps()` reveal timing — when adding characters, insert a new `<audio>` at `section_start + char_land_time` and bump `data-track-index`.

## Workflow: sync a new sound

<Steps>
<Step title="Find the scene trigger time">
Open the scene composition (e.g. `compositions/chat-response.html`) and read the GSAP position of the cursor tap or typing onset. Use `?t=` seek in preview to verify the local second.
</Step>

<Step title="Convert to master time">
Add the parent section's `data-start` from `index.html`. For `chat-response` at 6.7 with a tap at local 0.78, audio `data-start` is `7.48`.
</Step>

<Step title="Add the audio row">
Insert an `<audio>` sibling after the section divs, before the root `</div>`. Set `src`, `data-duration` from the asset length, `data-volume` from the SFX class table, and a fresh `data-track-index`.
</Step>

<Step title="Preview and render">
Run `hyperframes preview` on the `claude-paper-launch` folder, scrub to the trigger, and confirm the sound aligns with the cursor. Re-render if muxed output drifts.
</Step>
</Steps>

```text
Scene GSAP (local t)  +  section data-start  =  audio data-start
        0.78          +         6.7            =       7.48
```

<Info>
When you change a section's `data-start` or trim its `data-duration` while editing the master timeline, recompute every audio `data-start` that references that section. Visual seams and audio share the master clock but are edited in separate places.
</Info>

## Asset files

| File | Role | Git LFS |
|------|------|---------|
| `voiceover.mp3` | Scotsman VO source | Yes (`*.mp3` in `.gitattributes`) |
| `voiceover_explainer.mp3` | TSLA explainer VO | Yes |
| `click.mp3` | UI click SFX | Yes (referenced; required at render) |
| `toggle.mp3` | Toggle switch SFX | Yes (referenced; required at render) |
| `typenew.mp3` | Keystroke SFX | Yes (referenced; required at render) |
| `transcript.json` | Word timings for Scotsman captions | No — metadata only, not wired to `<audio>` |

Audio binaries must be pulled with Git LFS before preview or render. A 131-byte pointer file means LFS has not been fetched.

## Verification signals

| Check | Expected result |
|-------|-----------------|
| Scotsman VO at 29.6s | Audio starts as compose player video begins; stops by pause at ~31.77s |
| Explainer VO at 42.6s | Six-second VO over task-list explainer beats |
| Click at 49.30s | `sfx-click-10` coincides with download pill press before outro cut |
| No duplicate `data-track-index` | 87 unique audio lanes; no index collision with another `<audio>` |
| Rendered MP4 | Full 53.3s mux includes VO and SFX; silent UI means missing LFS assets or wrong `data-start` |

<Tip>
To debug a single scene in isolation, open `compositions/<scene>.html?dev=1` for visuals, but remember audio only fires from `index.html` in the master composition. Always verify SFX in the root cut.
</Tip>

## Common failure modes

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| Silent clicks or typing | `click.mp3` / `toggle.mp3` / `typenew.mp3` missing (LFS) | `git lfs pull` in repo root |
| Silent voiceover | `voiceover.mp3` or `voiceover_explainer.mp3` is an LFS pointer | Pull LFS; confirm files are >1 KB |
| SFX early/late vs cursor | `data-start` not converted from scene-local time | Recompute `section.data-start + local_t` |
| VO out of sync with captions | Scene `R` / `PEND` changed but `#vo` attributes stale | Update both scene GSAP and root `<audio>` |
| Missing mixer lane | Duplicate `data-track-index` | Assign unused integer in 100–185 band |

## Related pages

<CardGroup>
<Card title="Audio track reference" href="/audio-track-reference">
Full inventory of voiceover tracks, SFX files, volumes, and `transcript.json` word timings.
</Card>
<Card title="Master composition reference" href="/master-composition-reference">
Section `data-start` / `data-duration` table and visual `data-track-index` assignments for all ten scenes.
</Card>
<Card title="Edit the master timeline" href="/edit-master-timeline">
How section timing changes cascade to seam cuts and audio offsets.
</Card>
<Card title="Preview and render" href="/preview-and-render">
Validate muxed audio in HyperFrames preview and 1920×1080 render output.
</Card>
<Card title="Fonts and assets" href="/fonts-and-assets">
Git LFS setup for binary audio, fonts, and images.
</Card>
<Card title="Troubleshooting" href="/troubleshooting">
Recovery steps for missing LFS audio, seam misalignment, and silent renders.
</Card>
</CardGroup>
