# Metadata as Code

> kcmd workspace model: catalog.yaml manifest, YAML and Markdown layouts, pull/push sync, reference layers, entry links, and glossary scope for Knowledge Catalog metadata.

- Repository: GoogleCloudPlatform/knowledge-catalog
- GitHub: https://github.com/GoogleCloudPlatform/knowledge-catalog
- Human docs: https://www.grok-wiki.com/public/docs/googlecloudplatform-knowledge-catalog-9cee6ee3cba5
- Complete Markdown: https://www.grok-wiki.com/public/docs/googlecloudplatform-knowledge-catalog-9cee6ee3cba5/llms-full.txt

## Source Files

- `agents/mdcode/README.md`
- `agents/mdcode/docs/concept.md`
- `agents/mdcode/src/libts/manifest.ts`
- `agents/mdcode/src/libts/snapshot.ts`
- `agents/mdcode/src/libts/sync.ts`
- `toolbox/mdcode/docs/concept.md`

---

---
title: "Metadata as Code"
description: "kcmd workspace model: catalog.yaml manifest, YAML and Markdown layouts, pull/push sync, reference layers, entry links, and glossary scope for Knowledge Catalog metadata."
---

Metadata as Code is the `kcmd` workspace model in `agents/mdcode`: a directory rooted at `catalog.yaml` that mirrors Knowledge Catalog entries, aspects, and entry links as versionable YAML and Markdown files, with `kcmd pull`, `kcmd reference`, and `kcmd push` synchronizing editable snapshots against Dataplex.

## Workspace model

A kcmd workspace is a filesystem directory that acts as the unit of synchronization with Knowledge Catalog. The manifest (`catalog.yaml`) declares which GCP resources to manage, which metadata types to snapshot locally, which subsets to publish, and optional read-only reference scopes. Editable artifacts live under `catalog/`; the layout engine (`CatalogSnapshot` + `CatalogLayout`) maps service resources to local file paths based on scope type.

```mermaid
flowchart TB
  subgraph workspace["kcmd workspace"]
    manifest["catalog.yaml"]
    catalog["catalog/"]
    ref["*.ref.yaml siblings"]
  end

  subgraph kcmd["kcmd CLI / MCP"]
    pull["pull"]
    reference["reference"]
    push["push"]
  end

  subgraph service["Knowledge Catalog / Dataplex"]
    entries["Entries + Aspects"]
    links["EntryLinks"]
    glossary["Glossary hierarchy"]
  end

  manifest --> pull
  manifest --> reference
  manifest --> push
  pull --> catalog
  reference --> ref
  catalog --> push
  pull <--> entries
  pull <--> links
  reference <--> entries
  push --> entries
  push --> links
  push --> glossary
```

<Info>
Authentication uses gcloud Application Default Credentials (`gcloud auth application-default login`). The CLI and MCP server share the same workspace binding.
</Info>

## Scope types and layouts

`kcmd init` requires exactly one primary source type. The init flag writes `scope` into `catalog.yaml` and selects the on-disk layout automatically.

| Source type | Init flag | Scope prefix | Layout | Target resource |
| --- | --- | --- | --- | --- |
| BigQuery | `--bigquery-dataset` | `bq-dataset` | Standard (YAML) | Tables, views, schemas in `@bigquery` |
| Knowledge base | `--kb` | `kb` | Documents (Markdown) | Wiki/doc entries in an Entry Group |
| Entry group | `--entry-group` | `entryGroup` | Standard (YAML) | Custom user-managed entries |
| BigLake (Iceberg) | `--biglake-namespace --iceberg` | `biglake-iceberg-namespace` | Standard (YAML) | Iceberg table metadata |
| Glossary | `--glossary` | `glossary` | Standard (YAML) | Business glossary terms and categories |

BigQuery mode accepts multiple datasets by repeating `--bigquery-dataset` or by declaring an array in `scope`. Glossary mode supports comma-separated IDs, display-name lookup, or location mode (`--glossary my-project.us-central1`) to manage all glossaries in a location.

### Standard layout (YAML + sidecars)

Used for `bq-dataset`, `entryGroup`, `biglake-*`, and `glossary` scopes. Each entry is a `<entry-id>.yaml` file. Unstructured aspects (for example `overview`) split into sidecar Markdown files named `<entry-id>.<aspect-alias>.md`. Reference baselines are sibling `*.ref.yaml` files.

:::files
/
├── catalog.yaml
└── catalog/
    └── bigquery/
        └── my-project/
            ├── my-dataset.yaml
            └── my-dataset/
                ├── orders.yaml
                ├── orders.ref.yaml
                └── orders.overview.md
:::

### Documents layout (Markdown-first)

Used for `kb` scopes. Each entry is a single `.md` file: structured metadata in YAML frontmatter, with `overview.content` promoted to the Markdown body.

:::files
/
├── catalog.yaml
└── catalog/
    └── my-namespace/
        └── my-project/
            └── my-location/
                ├── page1.md
                └── playbooks/mbr.md
:::

## catalog.yaml manifest

The manifest drives all sync behavior. `CatalogManifest.load` validates scope, snapshot, publishing, and optional reference blocks.

<ParamField body="scope" type="string | string[]" required>
Primary resource(s) to manage. Format: `<type>.<resource-id>`. Examples: `bq-dataset.my-project.my-dataset`, `kb.my-project.us-central1.my-kb`, `glossary.my-project.global.my-glossary`. Multi-dataset scopes use a YAML array of `bq-dataset.*` entries.
</ParamField>

<ParamField body="resourceAlias" type="object">
Optional alias map for aspect types, glossaries, and entry link types. Built-in Dataplex types already have predefined aliases (`bigquery-table`, `schema`, `overview`, `definition`, `synonym`, `related`, `schema-join`).
</ParamField>

<ParamField body="snapshot" type="object">
Entry, aspect, and entry link types to download locally. Required aspects of listed entry types are implicitly included. `entryLinks` triggers `lookupEntryLinks` on pull.
</ParamField>

<ParamField body="publishing" type="object">
Subset of snapshot types that `kcmd push` writes back. Must be a subset of `snapshot`; publishing types not in snapshot cause validation errors.
</ParamField>

<ParamField body="reference" type="object">
Read-only scope for grounding. `reference.scope` can differ from the primary scope (for example, pull schemas from a dataset while publishing enrichments to another). `reference.snapshot` mirrors `snapshot` structure.
</ParamField>

<RequestExample>

```yaml title="catalog.yaml — BigQuery enrichment workspace"
scope: bq-dataset.my-project.my-dataset

snapshot:
  entries:
    - dataplex-types.global.bigquery-table
  aspects:
    - dataplex-types.global.schema
    - dataplex-types.global.overview
  entryLinks:
    - definition
    - synonym

publishing:
  aspects:
    - dataplex-types.global.overview
  entryLinks:
    - definition

reference:
  scope: bq-dataset.my-project.my-dataset
  snapshot:
    entries:
      - dataplex-types.global.bigquery-table
    aspects:
      - dataplex-types.global.schema
    entryLinks:
      - definition
```

</RequestExample>

## Pull, reference, and push

### Pull editable metadata

`kcmd pull` lists entries from the scoped source, calls `lookupEntry` for each matching entry type, and writes files under `catalog/`. When `snapshot.entryLinks` is declared, pull also calls `lookupEntryLinks` and inlines results into entry YAML.

<Steps>
<Step title="Initialize the workspace">

```bash
kcmd init --bigquery-dataset my-project.my-dataset
```

This writes `catalog.yaml` with the correct `scope` prefix and layout selection.

</Step>
<Step title="Configure snapshot and publishing">

Edit `catalog.yaml` to declare which entry types, aspects, and entry links to manage locally and which to publish.

</Step>
<Step title="Pull metadata">

```bash
kcmd pull
```

Verify `.yaml` or `.md` files appear under `catalog/` matching your scope hierarchy.

</Step>
</Steps>

### Pull reference layers

`kcmd reference` downloads read-only metadata defined in the `reference:` block. Files are saved as `*.ref.yaml` siblings to editable entries. Reference files are indexed separately and marked non-modifiable — `push` skips them via `isModifiable`.

<Warning>
Reference layers are never pushed. Use them as authoritative baselines for enrichment agents; diff live `.yaml` against `.ref.yaml` to surface only your changes.
</Warning>

When `reference.snapshot.entryLinks` is set, reference pull includes pre-edit link state so diffs do not treat existing links as enrichment additions.

### Push local edits

`kcmd push` iterates modifiable entries, converts local metadata to Dataplex API representations, and creates or updates entries and entry links.

| Behavior | Detail |
| --- | --- |
| Auto-create entries | Missing entries and parent Entry Groups are created during push (non-ingested scopes) |
| Aspect filtering | Only aspects listed in `publishing.aspects` are sent; required ingested aspects are skipped |
| Entry link reconciliation | When `publishing.entryLinks` is set, push compares local vs remote links by normalized target + path; matches are kept, new links created, unmatched remote links deleted |
| Glossary tree | `kcmd push` never creates Glossary, GlossaryCategory, or GlossaryTerm resources; it fails fast if they are missing |
| Glossary metadata updates | Descriptions and labels on existing glossary resources can be updated |
| Flags | `--force` overwrites conflicts; `--validate-only` validates without pushing; `--dry-run` logs planned mutations |

<Check>
EntryLinks that reference glossary terms (for example `definition` links from a BQ column to a term) are catalog metadata and are created/deleted normally by push. The no-create rule applies only to the glossary hierarchy itself.
</Check>

## Entry links

Entry links are first-class artifacts in pull and push. Declare link types in `snapshot.entryLinks` to fetch them; declare a subset in `publishing.entryLinks` to reconcile them on push. Omit `publishing.entryLinks` to read links without mutating them.

**Column-level links** carry a `Schema.<field>` source path. On pull, these are inlined under `aspects.schema.fields[].links`. On push, the path is reconstructed as `Schema.${field.name}`.

**Entry-level links** without a schema path appear under the top-level `links` block.

**Target resolution** uses a human-readable form for glossary terms (`<project>.<location>.<glossary-display-name>.<term-display-name>`) while preserving the full UID resource path in `id` for round-trip push. Matching during reconciliation unwraps `@dataplex` proxy entries and normalizes project IDs to avoid spurious delete-and-recreate cycles.

<RequestExample>

```yaml title="Column-level definition link (excerpt)"
aspects:
  schema:
    fields:
      - name: customer_id
        dataType: STRING
        mode: NULLABLE
        links:
          definition:
            - target: my-project.global.business-glossary.customer-id
              id: projects/my-project/locations/global/glossaries/biz/terms/customer-id

links:
  related:
    - target: my-other-project.us.docs-eg.runbook-page
```

</RequestExample>

Built-in entry link aliases include `definition`, `synonym`, `related`, and `schema-join`, each mapping to `dataplex-types.global.*` link types.

## Glossary scope

A Business Glossary can be the primary workspace scope (`glossary.<project>.<location>.<glossary-id>`). The local hierarchy mirrors the glossary tree under `catalog/glossaries/`:

```yaml title="Glossary term entry"
name: glossaries/Business Glossary (biz)/terms/customer-id
type: glossaryTerm
displayName: customer-id
description: Unique identifier for a customer record.
parent: projects/my-project/locations/global/glossaries/biz
```

Glossaries also work as `reference.scope` so enrichment workspaces can ground on business vocabulary without owning glossary CRUD. Provision glossary resources out-of-band (Dataplex console or `gcloud dataplex glossaries create`) before the first push; `kcmd pull` then `kcmd push` manages metadata on existing nodes.

<Tabs>
<Tab title="Single glossary">

```bash
kcmd init --glossary my-project.us-central1.my-glossary-id
```

</Tab>
<Tab title="Multiple glossaries">

```bash
kcmd init --glossary my-project.us-central1.glossary-a,glossary-b
```

</Tab>
<Tab title="Location mode">

```bash
kcmd init --glossary my-project.us-central1
```

</Tab>
</Tabs>

## Agent integration

Metadata as Code artifacts are the interchange format for enrichment agents and human-in-the-loop review. Agents read and modify workspace files; `kcmd push` publishes approved changes. The built-in MCP server exposes `list-entries`, `lookup-entry`, and `modify-entry` tools bound to a workspace path, enabling agentic metadata workflows without coupling to a specific model provider.

<CardGroup>
<Card title="Sync catalog metadata" href="/sync-catalog-metadata">
Initialize workspaces per source type, pull snapshots, check status, and push edits back to Knowledge Catalog.
</Card>
<Card title="catalog.yaml reference" href="/catalog-manifest-reference">
Full manifest field reference: scope, snapshot, publishing, reference, aliases, and layout selection rules.
</Card>
<Card title="kcmd CLI reference" href="/kcmd-cli-reference">
Command flags for init, pull, push, reference, dry-run, force, and validate-only.
</Card>
</CardGroup>

## Related pages

<CardGroup>
<Card title="Overview" href="/overview">
Knowledge Catalog tooling surface and shortest paths to produce and publish metadata context.
</Card>
<Card title="Quickstart" href="/quickstart">
First successful runs: initialize a workspace, pull metadata, and inspect output.
</Card>
<Card title="Enrichment workflows" href="/enrichment-workflows">
How enrichment agents read source metadata, emit mdcode artifacts, and hand off to kcmd push.
</Card>
<Card title="Publish enriched metadata" href="/publish-enriched-metadata">
Push mdcode workspaces and reconcile entry links without modifying reference layers.
</Card>
<Card title="kcmd MCP reference" href="/kcmd-mcp-reference">
MCP server startup, workspace binding, and agent tools for pull, push, and modify-entry.
</Card>
</CardGroup>
