# Sync catalog metadata

> Initialize a kcmd workspace for BigQuery, knowledge base, entry group, BigLake, or glossary scope; pull snapshots; check status; and push local edits back to Knowledge Catalog.

- Repository: GoogleCloudPlatform/knowledge-catalog
- GitHub: https://github.com/GoogleCloudPlatform/knowledge-catalog
- Human docs: https://www.grok-wiki.com/public/docs/googlecloudplatform-knowledge-catalog-9cee6ee3cba5
- Complete Markdown: https://www.grok-wiki.com/public/docs/googlecloudplatform-knowledge-catalog-9cee6ee3cba5/llms-full.txt

## Source Files

- `agents/mdcode/README.md`
- `agents/mdcode/src/tool/commands.ts`
- `agents/mdcode/src/libts/sync.ts`
- `agents/mdcode/src/tool/main.ts`
- `toolbox/mdcode/README.md`
- `toolbox/mdcode/src/tool/commands.ts`

---

---
title: "Sync catalog metadata"
description: "Initialize a kcmd workspace for BigQuery, knowledge base, entry group, BigLake, or glossary scope; pull snapshots; check status; and push local edits back to Knowledge Catalog."
---

`kcmd` in `agents/mdcode` implements Metadata as Code sync against the Dataplex Catalog API: `init` writes `catalog.yaml`, `pull` and `reference` materialize remote metadata into `catalog/`, and `push` publishes editable local artifacts back. `CatalogSync` in the TypeScript library owns the pull/push engine; the CLI in `src/tool/main.ts` exposes `init`, `pull`, `push`, `reference`, and `mcp`.

<Note>
Install `kcmd` from `agents/mdcode` (`npm install` then `npm run build`) or run `npx kcmd`. Authentication uses gcloud Application Default Credentials.
</Note>

## Prerequisites

Before initializing a workspace:

1. Enable Dataplex / Knowledge Catalog APIs and grant IAM to list, lookup, and modify catalog entries.
2. Authenticate with ADC:

```bash
gcloud auth application-default login
```

`ApiContext.default()` reads the active gcloud project, compute region, and access token. Missing values cause init or sync to fail immediately.

## Sync lifecycle

```mermaid
sequenceDiagram
  participant User
  participant kcmd as kcmd CLI
  participant Snapshot as CatalogSnapshot
  participant Sync as CatalogSync
  participant API as Dataplex Catalog API

  User->>kcmd: init --scope-flag
  kcmd->>Snapshot: write catalog.yaml
  User->>kcmd: pull
  kcmd->>Sync: pull()
  Sync->>API: lookupEntry / lookupEntryLinks
  API-->>Sync: entries + links
  Sync->>Snapshot: _storeResource → catalog/
  User->>kcmd: edit local files
  User->>kcmd: push
  kcmd->>Sync: push()
  Sync->>Snapshot: listEntries / _fetchResource
  Sync->>API: createEntry / modifyEntry / reconcile EntryLinks
```

| Phase | Command | Local output | Remote effect |
|-------|---------|--------------|---------------|
| Bootstrap | `kcmd init` | `catalog.yaml` | None |
| Download | `kcmd pull` | `catalog/**/*.yaml` or `*.md` | Read-only |
| Grounding | `kcmd reference` | `catalog/**/*.ref.yaml` | Read-only |
| Publish | `kcmd push` | Unchanged files | Creates/updates entries and links |

## Initialize a workspace

`kcmd init` requires exactly one primary source type. The flag selects workspace mode, `catalog.yaml` `scope`, and on-disk layout (YAML for data assets, Markdown for knowledge bases).

| Mode | Flag | ID format | Layout |
|------|------|-----------|--------|
| BigQuery | `--bigquery-dataset` | `project.dataset` (repeat flag for multiple datasets) | YAML |
| Knowledge base | `--kb` | `project.location.entry-group-id` | Markdown (`.md`) |
| Entry group | `--entry-group` | `project.location.entry-group-id` | YAML |
| BigLake (Iceberg) | `--biglake-namespace` + `--iceberg` | `project.catalog.namespace` | YAML |
| Glossary | `--glossary` | `project.location.glossary-id` (comma-separated or location-only) | YAML under `catalog/glossaries/` |

<Steps>
<Step title="Create the workspace directory">

```bash
mkdir my-catalog-workspace && cd my-catalog-workspace
```

</Step>
<Step title="Run init for your scope">

<CodeGroup>
```bash BigQuery
kcmd init --bigquery-dataset my-project.my_dataset
```

```bash Knowledge base
kcmd init --kb my-project.us-central1.my-kb-id
```

```bash Entry group
kcmd init --entry-group my-project.us-central1.my-entry-group
```

```bash BigLake Iceberg
kcmd init --biglake-namespace my-project.my-catalog.my-namespace --iceberg
```

```bash Glossary
kcmd init --glossary my-project.us-central1.my-glossary-id
```
</CodeGroup>

Add `--pull` to initialize and immediately download metadata:

```bash
kcmd init --bigquery-dataset my-project.my_dataset --pull
```

</Step>
<Step title="Verify catalog.yaml">

Init prints the generated manifest. A BigQuery workspace produces a scope like `bq-dataset.my-project.my_dataset`. Customize `snapshot`, `publishing`, and optional `reference` blocks before the first pull.

</Step>
</Steps>

<ParamField body="--bigquery-dataset" type="string[]">
One or more dataset IDs as `project.dataset`. Multiple flags merge into a single multi-dataset workspace.
</ParamField>

<ParamField body="--kb" type="string" required>
Knowledge base entry group as `project.location.entry-group-id`. Uses Markdown layout.
</ParamField>

<ParamField body="--entry-group" type="string" required>
Custom Dataplex entry group as `project.location.entry-group-id`.
</ParamField>

<ParamField body="--biglake-namespace" type="string" required>
BigLake namespace as `project.catalog.namespace`. Requires `--iceberg`; other metastores are rejected.
</ParamField>

<ParamField body="--glossary" type="string" required>
Glossary scope: single ID, comma-separated IDs, display name, or location-only (`project.location`) for all glossaries in a location.
</ParamField>

<ParamField body="--pull" type="boolean">
Run `kcmd pull` immediately after writing `catalog.yaml`.
</ParamField>

## Pull editable metadata

`kcmd pull` loads `catalog.yaml`, enumerates entries from the scoped source, calls `lookupEntry` for each matching entry type, and writes files under `catalog/`.

```bash
kcmd pull
```

<ParamField body="--dry-run" type="boolean">
Log `[DRY-RUN] Pull Resource: …` without writing files.
</ParamField>

### What pull fetches

- **Entries** listed in `snapshot.entries` (or all entries when the list is empty).
- **Aspects** named in `snapshot.aspects`, passed to `lookupEntry` as the aspect filter.
- **Entry links** when `snapshot.entryLinks` is set: `lookupEntryLinks` runs per entry. Column-level links with a `Schema.<field>` source path land under `aspects.schema.fields[].links`; entry-level links appear under top-level `links`. Omit `snapshot.entryLinks` to skip link download.

### Layout-specific output

| Scope layout | File pattern | Example |
|--------------|--------------|---------|
| Standard (BQ, entry group, BigLake, glossary) | `catalog/<namespace>/…/<entry-id>.yaml` | `catalog/bigquery/my-project/my_dataset/orders.yaml` |
| Documents (knowledge base) | `catalog/<entry-id>.md` with YAML frontmatter | `catalog/getting-started.md` |

Long-form aspect text can detach into sidecar files such as `orders.dataplex-types.global.overview.md`.

## Pull reference layers

`kcmd reference` downloads read-only metadata declared in the manifest `reference:` block. Reference files use a `.ref.yaml` suffix as siblings to editable files and are never pushed.

```bash
kcmd reference
```

Typical `catalog.yaml` reference block:

```yaml
scope: entryGroup.my-project.us-central1.my-eg
reference:
  scope: bq-dataset.my-project.my_dataset
  snapshot:
    entries:
      - dataplex-types.global.bigquery-table
    aspects:
      - dataplex-types.global.schema
    entryLinks:
      - definition
```

Reference pull honors `reference.snapshot.entryLinks` the same way `pull` honors `snapshot.entryLinks`, so diffs between live `.yaml` and `.ref.yaml` show only enrichment deltas.

<Warning>
Files ending in `.ref.yaml` are skipped during `push`. `isModifiable()` returns false when an entry has only a reference path.
</Warning>

## Check local changes

The design doc and toolbox README describe `kcmd status` for detecting local modifications against a saved checksum state. In the current `agents/mdcode` implementation:

- `CatalogSync.status()` throws `Not yet implemented`.
- `src/tool/main.ts` does not register a `status` subcommand.

Until `status` ships, inspect changes with version control or filesystem diff:

```bash
git diff catalog/
git status catalog/
```

For enrichment workflows, compare editable files against `.ref.yaml` baselines to isolate agent-added metadata.

## Push local edits

`kcmd push` walks modifiable entries in `catalog/`, converts local YAML or Markdown to Dataplex API payloads, and applies creates or updates.

```bash
kcmd push
```

<ParamField body="--dry-run" type="boolean">
Log planned creates, updates, and EntryLink mutations without calling the API.
</ParamField>

<ParamField body="--force" type="boolean">
Declared on the CLI; conflict override is not yet wired in `CatalogSync.push`.
</ParamField>

<ParamField body="--validate-only" type="boolean">
Declared on the CLI; pre-push validation is not yet wired in `CatalogSync.push`.
</ParamField>

### Push behavior by resource type

**Catalog entries**

- Missing entries: `push` auto-creates the parent entry group (if needed) and the entry from local file paths.
- Existing entries: `modifyEntry` updates `aspects` and, for non-ingested sources, `entry_source`.
- Only aspects listed in `publishing.aspects` are written back.

**Entry links**

When `publishing.entryLinks` is set, `push` reconciles local vs remote links per entry:

1. Normalize both sides (unwrap `@dataplex` proxies, canonicalize project IDs).
2. Keep matching links in place.
3. Delete remote links of configured types with no local match.
4. Create local links missing remotely.

Omit or leave `publishing.entryLinks` empty to disable link mutations.

**Glossary hierarchy**

<Warning>
`kcmd push` does not create `Glossary`, `GlossaryCategory`, or `GlossaryTerm` resources. Missing glossary nodes cause push to fail fast with an explicit error. Provision glossaries via the Dataplex console or `gcloud dataplex glossaries create` first, then `pull` and `push` to update descriptions and labels on existing nodes. EntryLinks that reference glossary terms are created and deleted normally.
</Warning>

### Auto-creation rules

| Resource | Created by push? |
|----------|------------------|
| Entry group | Yes, when missing |
| Catalog entry | Yes, when missing |
| Entry link | Yes (when `publishing.entryLinks` configured) |
| Glossary / category / term | No — must exist before push |

## Configure sync scope in catalog.yaml

The manifest drives every sync operation. Key fields:

<ResponseField name="scope" type="string | string[]">
Primary source of truth. Supported prefixes: `bq-dataset.*`, `entryGroup.*`, `kb.*`, `biglake-namespace.*`, `biglake-iceberg-namespace.*`, `glossary.<project>.<location>.<id>`.
</ResponseField>

<ResponseField name="snapshot" type="object">
`entries`, `aspects`, and optional `entryLinks` to download on `pull`.
</ResponseField>

<ResponseField name="publishing" type="object">
Subset of `snapshot` aspects and `entryLinks` written on `push`. Publishing entry link types must appear in `snapshot.entryLinks`.
</ResponseField>

<ResponseField name="reference" type="object">
Optional read-only scope and `reference.snapshot` for `kcmd reference`.
</ResponseField>

Example manifest for a BigQuery dataset with link sync:

```yaml
scope: bq-dataset.my-project.my_dataset

snapshot:
  entries:
    - dataplex-types.global.bigquery-table
  aspects:
    - dataplex-types.global.schema
    - dataplex-types.global.overview
  entryLinks:
    - definition

publishing:
  aspects:
    - dataplex-types.global.overview
  entryLinks:
    - definition
```

## Workspace layout

:::files
my-workspace/
├── catalog.yaml
└── catalog/
    └── bigquery/
        └── my-project/
            ├── my_dataset.yaml
            └── my_dataset/
                ├── orders.yaml
                ├── orders.ref.yaml
                └── orders.dataplex-types.global.overview.md
:::

Knowledge base workspaces replace `.yaml` entry files with `.md` documents. Glossary workspaces mirror the glossary tree under `catalog/glossaries/`.

## Agent-driven sync

Start the MCP server to let agents read and modify the local snapshot:

```json
{
  "mcpServers": {
    "kcmd": {
      "command": "npx",
      "args": ["-y", "kcmd", "mcp", "--path", "/absolute/path/to/workspace"]
    }
  }
}
```

MCP tools: `list-entries`, `lookup-entry`, `modify-entry`. Run `kcmd pull` and `kcmd push` from the CLI (or enrichment pipelines) to sync with the remote catalog after agent edits.

## Common failures

| Symptom | Likely cause | Verification |
|---------|--------------|--------------|
| `Unable to retrieve project, location, or token` | Missing gcloud ADC or config | `gcloud auth application-default login` |
| `Must provide either --entry-group, --bigquery-dataset, …` | No init flag | Pass exactly one source flag |
| `Must specify --iceberg when initializing a BigLake namespace` | BigLake without `--iceberg` | Add `--iceberg` |
| Pull skips entries silently | 403 on `lookupEntry` (missing resource or IAM) | Confirm entry exists in console; check permissions |
| `Glossary term '…' does not exist` on push | Glossary node not provisioned | Create via console/gcloud, then `pull` |
| `Failed to create entry group` | IAM lacks `entryGroups.create` | Grant Dataplex admin or entry-group create role |
| Entry links deleted and recreated every push | Project ID vs number mismatch | Ensure normalization; avoid hand-editing `id` fields |

## End-to-end workflow

<Steps>
<Step title="Initialize and pull">

```bash
kcmd init --bigquery-dataset my-project.ecommerce --pull
```

</Step>
<Step title="Optional: pull reference baselines">

Add a `reference:` block to `catalog.yaml`, then:

```bash
kcmd reference
```

</Step>
<Step title="Edit metadata locally">

Update aspect YAML, sidecar Markdown, or entry link targets under `catalog/`.

</Step>
<Step title="Preview push">

```bash
kcmd push --dry-run
```

</Step>
<Step title="Publish">

```bash
kcmd push
```

Expect `Successfully pushed catalog entries.` on success.

</Step>
</Steps>

## Related pages

<CardGroup>
<Card title="Metadata as Code" href="/metadata-as-code">
Workspace model, manifest fields, reference layers, and entry link semantics.
</Card>
<Card title="Quickstart" href="/quickstart">
First successful init, pull, and inspect workflow.
</Card>
<Card title="kcmd CLI reference" href="/kcmd-cli-reference">
Full command and flag reference for init, pull, push, and reference.
</Card>
<Card title="catalog.yaml manifest reference" href="/catalog-manifest-reference">
Scope, snapshot, publishing, and entryLinks reconciliation rules.
</Card>
<Card title="Publish enriched metadata" href="/publish-enriched-metadata">
Push enrichment output and reconcile entry links after agent runs.
</Card>
<Card title="Troubleshooting" href="/troubleshooting">
Auth, billing, push conflict, and glossary provisioning failures.
</Card>
</CardGroup>
