# Metadata as Code > kcmd workspace model: catalog.yaml manifest, YAML and Markdown layouts, pull/push sync, reference layers, entry links, and glossary scope for Knowledge Catalog metadata. - Repository: GoogleCloudPlatform/knowledge-catalog - GitHub: https://github.com/GoogleCloudPlatform/knowledge-catalog - Human docs: https://www.grok-wiki.com/public/docs/googlecloudplatform-knowledge-catalog-9cee6ee3cba5 - Complete Markdown: https://www.grok-wiki.com/public/docs/googlecloudplatform-knowledge-catalog-9cee6ee3cba5/llms-full.txt ## Source Files - `agents/mdcode/README.md` - `agents/mdcode/docs/concept.md` - `agents/mdcode/src/libts/manifest.ts` - `agents/mdcode/src/libts/snapshot.ts` - `agents/mdcode/src/libts/sync.ts` - `toolbox/mdcode/docs/concept.md` --- --- title: "Metadata as Code" description: "kcmd workspace model: catalog.yaml manifest, YAML and Markdown layouts, pull/push sync, reference layers, entry links, and glossary scope for Knowledge Catalog metadata." --- Metadata as Code is the `kcmd` workspace model in `agents/mdcode`: a directory rooted at `catalog.yaml` that mirrors Knowledge Catalog entries, aspects, and entry links as versionable YAML and Markdown files, with `kcmd pull`, `kcmd reference`, and `kcmd push` synchronizing editable snapshots against Dataplex. ## Workspace model A kcmd workspace is a filesystem directory that acts as the unit of synchronization with Knowledge Catalog. The manifest (`catalog.yaml`) declares which GCP resources to manage, which metadata types to snapshot locally, which subsets to publish, and optional read-only reference scopes. Editable artifacts live under `catalog/`; the layout engine (`CatalogSnapshot` + `CatalogLayout`) maps service resources to local file paths based on scope type. ```mermaid flowchart TB subgraph workspace["kcmd workspace"] manifest["catalog.yaml"] catalog["catalog/"] ref["*.ref.yaml siblings"] end subgraph kcmd["kcmd CLI / MCP"] pull["pull"] reference["reference"] push["push"] end subgraph service["Knowledge Catalog / Dataplex"] entries["Entries + Aspects"] links["EntryLinks"] glossary["Glossary hierarchy"] end manifest --> pull manifest --> reference manifest --> push pull --> catalog reference --> ref catalog --> push pull <--> entries pull <--> links reference <--> entries push --> entries push --> links push --> glossary ``` Authentication uses gcloud Application Default Credentials (`gcloud auth application-default login`). The CLI and MCP server share the same workspace binding. ## Scope types and layouts `kcmd init` requires exactly one primary source type. The init flag writes `scope` into `catalog.yaml` and selects the on-disk layout automatically. | Source type | Init flag | Scope prefix | Layout | Target resource | | --- | --- | --- | --- | --- | | BigQuery | `--bigquery-dataset` | `bq-dataset` | Standard (YAML) | Tables, views, schemas in `@bigquery` | | Knowledge base | `--kb` | `kb` | Documents (Markdown) | Wiki/doc entries in an Entry Group | | Entry group | `--entry-group` | `entryGroup` | Standard (YAML) | Custom user-managed entries | | BigLake (Iceberg) | `--biglake-namespace --iceberg` | `biglake-iceberg-namespace` | Standard (YAML) | Iceberg table metadata | | Glossary | `--glossary` | `glossary` | Standard (YAML) | Business glossary terms and categories | BigQuery mode accepts multiple datasets by repeating `--bigquery-dataset` or by declaring an array in `scope`. Glossary mode supports comma-separated IDs, display-name lookup, or location mode (`--glossary my-project.us-central1`) to manage all glossaries in a location. ### Standard layout (YAML + sidecars) Used for `bq-dataset`, `entryGroup`, `biglake-*`, and `glossary` scopes. Each entry is a `.yaml` file. Unstructured aspects (for example `overview`) split into sidecar Markdown files named `..md`. Reference baselines are sibling `*.ref.yaml` files. :::files / ├── catalog.yaml └── catalog/ └── bigquery/ └── my-project/ ├── my-dataset.yaml └── my-dataset/ ├── orders.yaml ├── orders.ref.yaml └── orders.overview.md ::: ### Documents layout (Markdown-first) Used for `kb` scopes. Each entry is a single `.md` file: structured metadata in YAML frontmatter, with `overview.content` promoted to the Markdown body. :::files / ├── catalog.yaml └── catalog/ └── my-namespace/ └── my-project/ └── my-location/ ├── page1.md └── playbooks/mbr.md ::: ## catalog.yaml manifest The manifest drives all sync behavior. `CatalogManifest.load` validates scope, snapshot, publishing, and optional reference blocks. Primary resource(s) to manage. Format: `.`. Examples: `bq-dataset.my-project.my-dataset`, `kb.my-project.us-central1.my-kb`, `glossary.my-project.global.my-glossary`. Multi-dataset scopes use a YAML array of `bq-dataset.*` entries. Optional alias map for aspect types, glossaries, and entry link types. Built-in Dataplex types already have predefined aliases (`bigquery-table`, `schema`, `overview`, `definition`, `synonym`, `related`, `schema-join`). Entry, aspect, and entry link types to download locally. Required aspects of listed entry types are implicitly included. `entryLinks` triggers `lookupEntryLinks` on pull. Subset of snapshot types that `kcmd push` writes back. Must be a subset of `snapshot`; publishing types not in snapshot cause validation errors. Read-only scope for grounding. `reference.scope` can differ from the primary scope (for example, pull schemas from a dataset while publishing enrichments to another). `reference.snapshot` mirrors `snapshot` structure. ```yaml title="catalog.yaml — BigQuery enrichment workspace" scope: bq-dataset.my-project.my-dataset snapshot: entries: - dataplex-types.global.bigquery-table aspects: - dataplex-types.global.schema - dataplex-types.global.overview entryLinks: - definition - synonym publishing: aspects: - dataplex-types.global.overview entryLinks: - definition reference: scope: bq-dataset.my-project.my-dataset snapshot: entries: - dataplex-types.global.bigquery-table aspects: - dataplex-types.global.schema entryLinks: - definition ``` ## Pull, reference, and push ### Pull editable metadata `kcmd pull` lists entries from the scoped source, calls `lookupEntry` for each matching entry type, and writes files under `catalog/`. When `snapshot.entryLinks` is declared, pull also calls `lookupEntryLinks` and inlines results into entry YAML. ```bash kcmd init --bigquery-dataset my-project.my-dataset ``` This writes `catalog.yaml` with the correct `scope` prefix and layout selection. Edit `catalog.yaml` to declare which entry types, aspects, and entry links to manage locally and which to publish. ```bash kcmd pull ``` Verify `.yaml` or `.md` files appear under `catalog/` matching your scope hierarchy. ### Pull reference layers `kcmd reference` downloads read-only metadata defined in the `reference:` block. Files are saved as `*.ref.yaml` siblings to editable entries. Reference files are indexed separately and marked non-modifiable — `push` skips them via `isModifiable`. Reference layers are never pushed. Use them as authoritative baselines for enrichment agents; diff live `.yaml` against `.ref.yaml` to surface only your changes. When `reference.snapshot.entryLinks` is set, reference pull includes pre-edit link state so diffs do not treat existing links as enrichment additions. ### Push local edits `kcmd push` iterates modifiable entries, converts local metadata to Dataplex API representations, and creates or updates entries and entry links. | Behavior | Detail | | --- | --- | | Auto-create entries | Missing entries and parent Entry Groups are created during push (non-ingested scopes) | | Aspect filtering | Only aspects listed in `publishing.aspects` are sent; required ingested aspects are skipped | | Entry link reconciliation | When `publishing.entryLinks` is set, push compares local vs remote links by normalized target + path; matches are kept, new links created, unmatched remote links deleted | | Glossary tree | `kcmd push` never creates Glossary, GlossaryCategory, or GlossaryTerm resources; it fails fast if they are missing | | Glossary metadata updates | Descriptions and labels on existing glossary resources can be updated | | Flags | `--force` overwrites conflicts; `--validate-only` validates without pushing; `--dry-run` logs planned mutations | EntryLinks that reference glossary terms (for example `definition` links from a BQ column to a term) are catalog metadata and are created/deleted normally by push. The no-create rule applies only to the glossary hierarchy itself. ## Entry links Entry links are first-class artifacts in pull and push. Declare link types in `snapshot.entryLinks` to fetch them; declare a subset in `publishing.entryLinks` to reconcile them on push. Omit `publishing.entryLinks` to read links without mutating them. **Column-level links** carry a `Schema.` source path. On pull, these are inlined under `aspects.schema.fields[].links`. On push, the path is reconstructed as `Schema.${field.name}`. **Entry-level links** without a schema path appear under the top-level `links` block. **Target resolution** uses a human-readable form for glossary terms (`...`) while preserving the full UID resource path in `id` for round-trip push. Matching during reconciliation unwraps `@dataplex` proxy entries and normalizes project IDs to avoid spurious delete-and-recreate cycles. ```yaml title="Column-level definition link (excerpt)" aspects: schema: fields: - name: customer_id dataType: STRING mode: NULLABLE links: definition: - target: my-project.global.business-glossary.customer-id id: projects/my-project/locations/global/glossaries/biz/terms/customer-id links: related: - target: my-other-project.us.docs-eg.runbook-page ``` Built-in entry link aliases include `definition`, `synonym`, `related`, and `schema-join`, each mapping to `dataplex-types.global.*` link types. ## Glossary scope A Business Glossary can be the primary workspace scope (`glossary...`). The local hierarchy mirrors the glossary tree under `catalog/glossaries/`: ```yaml title="Glossary term entry" name: glossaries/Business Glossary (biz)/terms/customer-id type: glossaryTerm displayName: customer-id description: Unique identifier for a customer record. parent: projects/my-project/locations/global/glossaries/biz ``` Glossaries also work as `reference.scope` so enrichment workspaces can ground on business vocabulary without owning glossary CRUD. Provision glossary resources out-of-band (Dataplex console or `gcloud dataplex glossaries create`) before the first push; `kcmd pull` then `kcmd push` manages metadata on existing nodes. ```bash kcmd init --glossary my-project.us-central1.my-glossary-id ``` ```bash kcmd init --glossary my-project.us-central1.glossary-a,glossary-b ``` ```bash kcmd init --glossary my-project.us-central1 ``` ## Agent integration Metadata as Code artifacts are the interchange format for enrichment agents and human-in-the-loop review. Agents read and modify workspace files; `kcmd push` publishes approved changes. The built-in MCP server exposes `list-entries`, `lookup-entry`, and `modify-entry` tools bound to a workspace path, enabling agentic metadata workflows without coupling to a specific model provider. Initialize workspaces per source type, pull snapshots, check status, and push edits back to Knowledge Catalog. Full manifest field reference: scope, snapshot, publishing, reference, aliases, and layout selection rules. Command flags for init, pull, push, reference, dry-run, force, and validate-only. ## Related pages Knowledge Catalog tooling surface and shortest paths to produce and publish metadata context. First successful runs: initialize a workspace, pull metadata, and inspect output. How enrichment agents read source metadata, emit mdcode artifacts, and hand off to kcmd push. Push mdcode workspaces and reconcile entry links without modifying reference layers. MCP server startup, workspace binding, and agent tools for pull, push, and modify-entry.