# Open Knowledge Format

> OKF v0.1 bundle structure, concept documents, frontmatter fields, index.md progressive disclosure, and cross-link semantics for vendor-neutral knowledge exchange.

- Repository: GoogleCloudPlatform/knowledge-catalog
- GitHub: https://github.com/GoogleCloudPlatform/knowledge-catalog
- Human docs: https://www.grok-wiki.com/public/docs/googlecloudplatform-knowledge-catalog-9cee6ee3cba5
- Complete Markdown: https://www.grok-wiki.com/public/docs/googlecloudplatform-knowledge-catalog-9cee6ee3cba5/llms-full.txt

## Source Files

- `okf/SPEC.md`
- `okf/src/enrichment_agent/bundle/document.py`
- `okf/src/enrichment_agent/bundle/index.py`
- `okf/src/enrichment_agent/bundle/paths.py`
- `okf/bundles/stackoverflow/index.md`
- `okf/README.md`

---

---
title: Open Knowledge Format
description: OKF v0.1 bundle structure, concept documents, frontmatter fields, index.md progressive disclosure, and cross-link semantics for vendor-neutral knowledge exchange.
---

Open Knowledge Format (OKF) v0.1 is a vendor-neutral way to ship catalog knowledge as a directory of UTF-8 Markdown files with YAML frontmatter. A bundle is self-describing: humans can read it with ordinary file tools, agents can parse it without a proprietary SDK, and version control can diff it like source code. OKF standardizes only the structural conventions needed for interoperability; producers remain free to organize domains, extend frontmatter, and choose tooling.

## What OKF is for

OKF targets four goals:

1. Give enrichment agents a universal write target.
2. Give consumption agents predictable traversal rules.
3. Enable knowledge exchange across organizations and systems.
4. Require only a small set of fields so partial or agent-generated bundles stay useful.

OKF is **not** a fixed taxonomy, storage layer, or replacement for domain schemas such as Avro, Protobuf, or OpenAPI. It **references** those assets through concept documents and external citations.

## Bundle structure

A **knowledge bundle** is a directory tree of `.md` files. Directory layout is producer-defined; folders group related concepts but do not encode relationship types.

:::files
path/to/bundle/
├── index.md                      # Optional directory listing (progressive disclosure)
├── log.md                        # Optional update history
├── <concept>.md                  # Concept at bundle root
└── <subdirectory>/
    ├── index.md
    ├── <concept>.md
    └── <nested>/
        └── …
:::

Bundles may be distributed as:

- A git repository (recommended — history, attribution, diffs).
- A tarball or zip archive.
- A subdirectory inside a larger repository.

### Reserved filenames

These filenames have defined meaning at any hierarchy level and **must not** be used for concept documents:

| Filename | Purpose |
|----------|---------|
| `index.md` | Directory listing for progressive disclosure |
| `log.md` | Chronological update history for that scope |

Every other `.md` file is a concept document.

## Concept documents

Each **concept** is one Markdown file with two parts:

1. A YAML **frontmatter** block delimited by `---` on its own lines at the top of the file.
2. A Markdown **body** with free-form content.

The **concept ID** is the file path within the bundle with the `.md` suffix removed. For example, `tables/users.md` has concept ID `tables/users`.

Concept ID segments must match `[A-Za-z0-9_][A-Za-z0-9_.\-]*` per segment. Tools map IDs to paths with `concept_id_to_path` and reverse with `path_to_concept_id`.

### Frontmatter fields

OKF v0.1 conformance requires only a non-empty `type` field. The Knowledge Catalog enrichment agent enforces a stricter write contract for generated bundles.

<ParamField body="type" type="string" required>
Short string identifying the concept kind. Examples: `BigQuery Table`, `BigQuery Dataset`, `Reference`, `Playbook`. Types are not centrally registered; consumers must tolerate unknown values.
</ParamField>

<ParamField body="title" type="string">
Human-readable display name. If omitted, consumers may derive a title from the filename. Required by the enrichment agent's `write_concept_doc` tool.
</ParamField>

<ParamField body="description" type="string">
One-sentence summary used in `index.md` entries, search snippets, and previews. Required by the enrichment agent.
</ParamField>

<ParamField body="resource" type="string">
Canonical URI for the underlying asset (for example, a BigQuery table API URL or console link). Omit for abstract concepts.
</ParamField>

<ParamField body="tags" type="string[]">
YAML list of short categorization strings. Producers may synthesize tag-browsing views at consumption time by scanning frontmatter; OKF does not define a separate tag file format.
</ParamField>

<ParamField body="timestamp" type="string">
ISO 8601 datetime of the last meaningful change. The enrichment agent auto-fills UTC time when omitted.
</ParamField>

Producers may add arbitrary extension keys. Consumers should preserve unknown keys on round-trip and must not reject documents because of unrecognized fields.

**Enrichment agent key order.** When writing through `write_concept_doc`, frontmatter is reordered to: `type`, `resource`, `title`, `description`, `tags`, `timestamp`, then any extensions.

### Body conventions

The body has no required sections. These headings carry conventional meaning:

| Heading | Purpose |
|---------|---------|
| `# Schema` | Structured description of columns, fields, or enumerations |
| `# Examples` | Concrete usage examples, often fenced code blocks |
| `# Common query patterns` | SQL or API usage patterns (enrichment agent convention) |
| `# Citations` | External sources backing claims in the body |

The enrichment agent expects, in order: short prose, `# Schema`, `# Common query patterns` (for tables), and `# Citations`. During the web enrichment pass, writes that shrink an existing BigQuery Table's `# Schema` field set or `# Citations` entry count are rejected to preserve metadata-grounded content.

<RequestExample>
```markdown
---
type: BigQuery Table
title: Users
description: One row per registered Stack Overflow user.
resource: https://bigquery.googleapis.com/v2/projects/bigquery-public-data/datasets/stackoverflow/tables/users
tags: [Stack Overflow, users, profiles]
timestamp: 2026-05-28T23:32:24+00:00
---

This table stores user profiles for the [stackoverflow](../datasets/stackoverflow.md) dataset.

# Schema

* `id` (INTEGER) - Unique identifier for the user.
* `display_name` (STRING) - Publicly visible name.

# Common query patterns

```sql
SELECT id, display_name, reputation
FROM `bigquery-public-data.stackoverflow.users`
ORDER BY reputation DESC
LIMIT 10
```

# Citations

[1] [Stack Overflow Users Table](https://bigquery.googleapis.com/v2/projects/bigquery-public-data/datasets/stackoverflow/tables/users)
```
</RequestExample>

## `index.md` and progressive disclosure

An `index.md` may appear in any directory, including the bundle root. Index files contain **no frontmatter** (except optionally at bundle root for version declaration — see Versioning). The body lists directory contents under section headings so humans and agents can browse one level at a time instead of loading the entire corpus.

```markdown
# BigQuery Table

* [Users](users.md) - One row per registered Stack Overflow user.
* [Votes](votes.md) - Records of upvotes and downvotes on posts.

# Subdirectories

* [references](references/index.md) - Enumerated types and internal references.
```

Entries should include each linked concept's `description` from frontmatter. The enrichment agent's `regenerate_indexes` groups concepts by `type`, sorts entries alphabetically by title, and synthesizes subdirectory blurbs when a folder has multiple children. Single-child directories reuse the child's description.

`index.md` files are navigation aids, not concepts. Graph viewers and concept walkers skip them.

<Steps>
<Step title="Open the bundle root index">
Read `index.md` at the bundle root to see top-level subdirectories and any root-level concepts.
</Step>
<Step title="Drill into a section">
Follow a subdirectory link such as `tables/index.md` to see concepts grouped by type.
</Step>
<Step title="Open a concept document">
Follow a concept link to load frontmatter metadata and the full body.
</Step>
</Steps>

## Cross-linking

Concepts express relationships beyond parent/child directory structure with standard Markdown links. The relationship kind (joins-with, depends-on, parent-of, and so on) is conveyed by surrounding prose, not by link syntax. Graph consumers typically treat links as directed, untyped edges.

### Link forms

| Form | Example | Notes |
|------|---------|-------|
| Bundle-relative absolute | `[customers](/tables/customers.md)` | SPEC-recommended; stable when moving documents within a subdirectory |
| File-relative | `[users](users.md)` from `tables/events.md` | Resolves correctly when browsing plain files (GitHub, local filesystem) |
| Parent traversal | `[dataset](../datasets/stackoverflow.md)` | Typical pattern from a table to its dataset |

OKF consumers **must tolerate broken links**. A missing target is not malformed; it may represent knowledge not yet authored.

### Producer and consumer guidance

The OKF specification recommends bundle-relative absolute paths starting with `/`. The enrichment agent instructs producers to use **file-relative paths only** and avoid leading `/` so links render correctly on GitHub. The bundled graph viewer extracts edges only from relative `.md` links resolved within the bundle; absolute `/…` links and external URLs are skipped for edge construction but still work as navigation in rendered Markdown.

Rules enforced by the enrichment agent when writing:

- Link only to concept IDs returned by `list_concepts()`.
- Do not link from headers, fenced code blocks, or schema field listings.
- Do not self-link.
- One link per concept mention per section is sufficient.

## Citations

External claims should be listed under `# Citations` at the bottom of the document, numbered:

```markdown
# Citations

[1] [BigQuery table schema](https://console.cloud.google.com/bigquery?p=acme&d=sales&t=orders)
[2] [Internal data quality runbook](https://wiki.acme.internal/data/quality)
```

Citation targets may be absolute URLs, bundle-relative paths, or paths into a `references/` subtree that mirrors external material as first-class OKF concepts.

## `log.md` (optional)

A `log.md` at any hierarchy level records changes for that scope. Format is a flat list grouped by date, newest first:

```markdown
# Directory Update Log

## 2026-05-22
* **Update**: Added [Customer Metrics](/tables/customer-metrics.md).
* **Creation**: Established [Dataplex Playbook](/playbooks/dataplex.md).

## 2026-05-15
* **Initialization**: Created foundational directory structure.
```

Date headings use ISO 8601 `YYYY-MM-DD`. Leading bold verbs (`**Update**`, `**Creation**`, `**Deprecation**`) are conventions, not requirements.

## Conformance

A bundle is **conformant with OKF v0.1** when:

1. Every non-reserved `.md` file has parseable YAML frontmatter.
2. Every frontmatter block contains a non-empty `type` field.
3. Every present `index.md` or `log.md` follows the structures described above.

Consumers should treat all other constraints as soft guidance. Consumers **must not** reject a bundle because of:

- Missing optional frontmatter fields
- Unknown `type` values or extension keys
- Broken cross-links
- Missing `index.md` files

This permissive model keeps bundles useful as they grow, refactor, and are partially generated by agents.

## Versioning

This repository ships OKF **version 0.1**. Future revisions use `<major>.<minor>` semantics: minor bumps add backward-compatible optional fields; major bumps may break required fields or reserved filenames.

Bundles may declare their target version with `okf_version: "0.1"` in **bundle-root `index.md` frontmatter** — the only place frontmatter is permitted on an `index.md`. Consumers that do not understand the declared version should attempt best-effort consumption.

## Example bundle layout

The repository includes three reference bundles under `okf/bundles/`:

| Bundle | Domain |
|--------|--------|
| `ga4/` | GA4 e-commerce sample dataset |
| `stackoverflow/` | Stack Overflow public dataset |
| `crypto_bitcoin/` | Bitcoin blocks and transactions |

A typical Stack Overflow bundle organizes `datasets/`, `tables/`, and `references/` subtrees, each with its own `index.md`, and cross-links such as a table pointing to its parent dataset with `../datasets/stackoverflow.md`.

## Produce, visualize, and publish

OKF bundles in this project are commonly produced by the OKF enrichment agent (BigQuery metadata plus optional web crawl) or the catalog enrichment agent, then optionally visualized or published into a Knowledge Catalog workspace.

<AccordionGroup>
<Accordion title="Enrichment agent write contract vs OKF minimum">
OKF conformance requires only `type`. The enrichment agent's `write_concept_doc` requires `type`, `title`, `description`, and `timestamp` (auto-filled when absent). This stricter contract keeps auto-generated `index.md` entries informative and bundles consistent for downstream catalog sync.
</Accordion>
<Accordion title="Graph consumption behavior">
The `visualize` subcommand walks all concept `.md` files, builds nodes from frontmatter, and draws directed edges from relative cross-links. Missing link targets are skipped without error. Backlinks ("Cited by") are computed from the reverse link graph in the generated `viz.html` viewer.
</Accordion>
</AccordionGroup>

## Related pages

<CardGroup cols={2}>
<Card title="Overview" icon="book-open" href="/overview">
Knowledge Catalog tooling surface and shortest paths to produce and publish metadata context.
</Card>
<Card title="Produce OKF bundles" icon="package" href="/produce-okf-bundles">
Run the OKF enrichment agent against BigQuery with optional web crawl seeds.
</Card>
<Card title="Visualize OKF bundles" icon="network" href="/visualize-okf-bundles">
Generate self-contained `viz.html` graph viewers from bundle cross-links.
</Card>
<Card title="OKF bundle recipes" icon="flask" href="/okf-bundle-recipes">
Copy-paste recipes for GA4, Stack Overflow, and Bitcoin sample bundles.
</Card>
<Card title="Enrichment workflows" icon="workflow" href="/enrichment-workflows">
How agents read source metadata, emit OKF bundles, and hand off to catalog publication.
</Card>
<Card title="Metadata as Code" icon="code" href="/metadata-as-code">
kcmd workspace model for syncing enriched metadata into Knowledge Catalog.
</Card>
</CardGroup>
