# Publish enriched metadata

> Push mdcode workspaces with kcmd, publish sample enrichment output via catalog APIs, and reconcile entry links and aspects without modifying read-only reference layers.

- Repository: GoogleCloudPlatform/knowledge-catalog
- GitHub: https://github.com/GoogleCloudPlatform/knowledge-catalog
- Human docs: https://www.grok-wiki.com/public/docs/googlecloudplatform-knowledge-catalog-9cee6ee3cba5
- Complete Markdown: https://www.grok-wiki.com/public/docs/googlecloudplatform-knowledge-catalog-9cee6ee3cba5/llms-full.txt

## Source Files

- `agents/enrichment/README.md`
- `agents/mdcode/src/tool/commands.ts`
- `samples/enrichment/src/enrichment/publish.py`
- `samples/enrichment/src/enrichment/enrich.py`
- `toolbox/enrichment/README.md`
- `agents/mdcode/README.md`

---

---
title: "Publish enriched metadata"
description: "Push mdcode workspaces with kcmd, publish sample enrichment output via catalog APIs, and reconcile entry links and aspects without modifying read-only reference layers."
---

Enrichment agents in this repository **generate** local mdcode artifacts (`catalog.yaml`, `catalog/` entries, Markdown sidecars, and optional `*.ref.yaml` reference layers) but do not call Dataplex publish APIs themselves. Publication is a separate step: run `kcmd push` from the workspace root for the primary Metadata as Code path, or `python -m enrichment.publish` from `samples/enrichment` for a direct overview-aspect update via the Dataplex Catalog API.

## Publication paths

| Path | When to use | What gets published | Reference layers |
|------|-------------|---------------------|------------------|
| `kcmd push` | Enrichment agent output, toolbox demo, any mdcode workspace | Aspects and entry types listed under `publishing` in `catalog.yaml`; optional `entryLinks` reconciliation | `*.ref.yaml` and `*.ref.*.md` are **never** pushed |
| `python -m enrichment.publish` | `samples/enrichment` workflow only | `overview` aspect on existing `@bigquery` table entries | N/A (flat `*.md` snapshot, not mdcode) |

<Info>
The catalog enrichment agent (`agents/enrichment`) shells out to read-only `kcmd init`, `kcmd pull`, and `kcmd reference` commands. You run `kcmd push` after reviewing or evaluating the generated tree.
</Info>

```mermaid
sequenceDiagram
  participant Agent as enrichment agent
  participant WS as mdcode workspace
  participant KCMD as kcmd push
  participant DP as Dataplex Catalog API

  Agent->>WS: write catalog.yaml + catalog/ + sidecars
  Agent->>WS: kcmd reference (optional .ref.yaml)
  Note over WS: *.ref.yaml = read-only grounding
  WS->>KCMD: kcmd push [--dry-run]
  KCMD->>DP: modifyEntry (publishing.aspects)
  KCMD->>DP: create/delete EntryLink (publishing.entryLinks)
  Note over KCMD,DP: .ref.yaml entries skipped (no local editable path)
```

## Prerequisites

- **Authentication**: Application Default Credentials via `gcloud auth application-default login`. Set `CLOUDSDK_CORE_PROJECT` (and optionally `CLOUDSDK_COMPUTE_REGION`) when pushing from an enrichment output directory.
- **Built `kcmd`**: `cd agents/mdcode && npm install && npm run build` produces `agents/mdcode/dist/kcmd`. Add `dist/` to `PATH` or invoke the binary directly.
- **Complete manifest**: Enrichment modes write a full `catalog.yaml` with `snapshot` and `publishing` blocks. A bare `scope:` line from `kcmd init` alone causes `kcmd push` to load no entry types and silently no-op.
- **Pre-existing resources** (mode-dependent):
  - **Doc mode**: target entry group must exist before enrichment (`gcloud dataplex entry-groups create …`).
  - **Context-overlay mode**: editable entry group must exist; live `@bigquery` entries are read-only via `kcmd reference`.
  - **Glossary terms**: `kcmd push` updates metadata on existing glossary terms but does not create glossaries, categories, or terms.

<Warning>
`kcmd push` fails fast when a referenced glossary term, category, or glossary does not exist. Bootstrap glossary structure out-of-band (console or `gcloud dataplex glossaries create` / `glossary-terms create`), then `kcmd pull` before pushing link metadata.
</Warning>

## Publish with kcmd push

### Workspace layout after enrichment

Editable files live beside read-only reference mirrors:

```text
output_dir/
├── catalog.yaml
└── catalog/
    └── bigquery/<project>/<dataset>/
        ├── orders.yaml              # editable — pushed
        ├── orders.overview.md       # sidecar — merged on push
        ├── orders.queries.md        # sidecar (table / overlay modes)
        ├── orders.ref.yaml          # reference — skipped on push
        └── orders.ref.overview.md   # reference sidecar — skipped
```

`CatalogSnapshot.isModifiable` returns true only when an entry has a **local** (non-`.ref.yaml`) path. Reference entries index under `.ref.yaml` but are excluded from push iteration.

### Push flags

<ParamField body="--dry-run" type="boolean">
  Log create, modify, and delete operations without calling the Catalog API.
</ParamField>

<ParamField body="--validate-only" type="boolean">
  Validate the local snapshot against the service without applying changes.
</ParamField>

<ParamField body="--force" type="boolean">
  Overwrite service metadata, ignoring potential conflicts.
</ParamField>

### Standard push workflow

<Steps>
<Step title="Review local artifacts">

Inspect the enrichment output before publishing:

```bash
find /tmp/enrich_out -type f | head -50
diff -u orders.ref.overview.md orders.overview.md   # overlay mode: compare baseline vs enriched
```

Optionally score the run with the dynamic evaluator (`python -m eval --output-dir /tmp/enrich_out`) before push.

</Step>

<Step title="Dry-run push">

From the workspace root (directory containing `catalog.yaml`):

```bash
cd /tmp/enrich_out
CLOUDSDK_CORE_PROJECT=<project> \
  ../agents/mdcode/dist/kcmd push --dry-run
```

Confirm the log shows only intended `Modify Entry`, `Create Entry`, and `EntryLink` operations. No `[DRY-RUN]` lines should target `*.ref.yaml` entries.

</Step>

<Step title="Push to the catalog">

```bash
CLOUDSDK_CORE_PROJECT=<project> CLOUDSDK_COMPUTE_REGION=<region> \
  kcmd push
```

On success the CLI prints `Successfully pushed catalog entries.`

</Step>

<Step title="Verify in the catalog">

Re-pull the affected scope and diff against your pre-push tree, or inspect entries in the Dataplex console. For table mode, confirm `overview` and `queries` aspects updated on `@bigquery` entries. For context-overlay mode, confirm new generic entries exist in your entry group while `@bigquery` entries are unchanged.

</Step>
</Steps>

<RequestExample>

```bash
cd /tmp/enrich_out
CLOUDSDK_CORE_PROJECT=my-gcp-project CLOUDSDK_COMPUTE_REGION=us-central1 \
  kcmd push --dry-run
```

</RequestExample>

<ResponseExample>

```text
Pushing catalog entries...
[DRY-RUN] Modify Entry projects/my-gcp-project/locations/us/entryGroups/@bigquery/entries/bigquery.googleapis.com/projects/my-gcp-project/datasets/analytics/tables/orders (updateMask: aspects, aspects: 655216118709.global.overview,...)
Successfully pushed catalog entries.
```

</ResponseExample>

## Publishing by enrichment mode

### Table mode (`--mode=table`)

The agent runs `kcmd init --bigquery-dataset` + `kcmd pull`, enriches each table's `overview` (and optionally `queries`) aspect, then writes sidecar Markdown. Push targets live `@bigquery` table entries.

Default manifest aspects published:

- `dataplex-types.global.overview`
- `dataplex-types.global.queries`

With `--glossaries`, the manifest also declares `snapshot.entryLinks: [definition, synonym]` and `publishing.entryLinks: [definition]`. The linking step injects column-level `links.definition` into `<table>.yaml`; `kcmd push` reconciles those links to Dataplex.

### Doc mode (`--mode=doc`)

The agent creates knowledge-base entries (generic entry type + `overview` aspect) under your `--entry_group`. Push may auto-create missing entries and entry groups when they do not exist remotely.

### Context-overlay mode (`--mode=context_overlay`)

Read-only 1P BigQuery metadata is pulled via `kcmd reference` into `*.ref.yaml`. The agent writes **new** overlay entries (`<table>.yaml` + `<table>.overview.md`) in your editable entry group. Only overlay pairs are pushed; the `.ref.*` mirror of the live table is never modified or published.

## Manifest controls

`catalog.yaml` `publishing` determines what `kcmd push` writes. Reference scope is pull-only.

| Key | Role on push |
|-----|--------------|
| `publishing.aspects` | Aspect types uploaded via `modifyEntry` |
| `publishing.entries` | Entry types eligible for create/update (doc and overlay modes) |
| `publishing.entryLinks` | Link types reconciled per entry; must be a subset of `snapshot.entryLinks` |
| `reference.scope` | Read-only pull via `kcmd reference`; never pushed |

Example publishing block for table enrichment with glossary links:

```yaml
publishing:
  aspects:
    - dataplex-types.global.overview
    - dataplex-types.global.queries
  entryLinks:
    - definition
```

Omit `publishing.entryLinks` (or leave it empty) to disable link mutations entirely — useful when you only want to read links without taking responsibility for reconciling them.

## Entry link reconciliation

When `publishing.entryLinks` is set, `CatalogSync.push` compares local links from entry YAML against remote `lookupEntryLinks` results for the configured types.

Reconciliation rules:

1. **Match** — normalized target + source path (project ID/number agnostic, `@dataplex` proxy unwrapped). Existing remote links with a local match are kept.
2. **Create** — local links with no remote match are created.
3. **Delete** — remote links of the configured types with no local match are deleted.

Column-level links (source path `Schema.<field>`) are stored under `aspects.schema.fields[].links` in entry YAML. Entry-level links appear under the top-level `links` block.

<AccordionGroup>
<Accordion title="Diff reference vs enriched links">

When `reference.snapshot.entryLinks` is declared, `kcmd reference` includes pre-edit link state in `*.ref.yaml`. Compare live `<table>.yaml` against `<table>.ref.yaml` to see only what enrichment added or removed before pushing.

</Accordion>

<Accordion title="Glossary definition links">

Glossary terms pulled as reference (`catalog/glossaries/.../*.ref.yaml`) ground the LinkingAgent but are not pushed. Only the `definition` links injected into editable table YAML are reconciled on push.

</Accordion>
</AccordionGroup>

## Sample enrichment API publish

The `samples/enrichment` package demonstrates a lighter-weight path that bypasses mdcode. It downloads table overviews into flat Markdown files, enriches them, then publishes via `dataplex.CatalogServiceClient.update_entry` with an aspects-only field mask.

<Steps>
<Step title="Download snapshot">

```bash
python3 -m enrichment.download \
  --dir ../sample/metadata.initial \
  --dataset ${CLOUD_PROJECT}.kc_enrich_sample_data
```

</Step>

<Step title="Enrich">

```bash
python3 -m enrichment.enrich \
  --dir ../sample/metadata.initial \
  --output-dir ../sample/metadata.new \
  --config-dir ../sample/config
```

</Step>

<Step title="Review diff">

```bash
git diff --no-index ../sample/metadata.initial ../sample/metadata.new
```

</Step>

<Step title="Publish">

```bash
python3 -m enrichment.publish --dir ../sample/metadata.new
```

Each `*.md` file is converted back to a Dataplex `Entry` protobuf and `update_entry` is called with `update_mask.paths=['aspects']` and `aspect_keys` set to the overview aspect key.

</Step>
</Steps>

<Note>
The sample publish path updates only the `overview` aspect on existing entries. It does not reconcile entry links, create entries, or manage reference layers. For production enrichment from `agents/enrichment` or `toolbox/enrichment`, use `kcmd push`.
</Note>

## Toolbox demo publish

The TypeScript toolbox demo (`toolbox/enrichment`) follows the mdcode path: `kcmd init` + `kcmd pull`, `kcagent enrich`, then push from the demo workspace:

```bash
cd demo
../../mdcode/dist/kcmd pull
../dist/kcagent enrich --catalog-path . --tools-path tools --prompt-path prompt.md
../../mdcode/dist/kcmd push
```

## Agent-driven publish (MCP)

Agents can publish through the kcmd MCP server (`kcmd mcp --path <workspace>`), which exposes `modify-entry` and related tools. The push semantics are the same as the CLI: only modifiable local entries and manifest-declared publishing types are affected. Reference layers remain read-only.

## Troubleshooting

| Symptom | Likely cause | Mitigation |
|---------|--------------|------------|
| Push succeeds but nothing changes | `publishing` block missing or empty; bare `scope:` manifest | Ensure enrichment wrote a complete manifest with `publishing.aspects` (and `publishing.entries` for doc/overlay modes) |
| `Glossary term does not exist` | Term referenced in links but not provisioned | Create term via `gcloud dataplex glossary-terms create`, then `kcmd pull` |
| `Failed to create entry group` | IAM or quota on Dataplex entry groups | Verify `dataplex.entryGroups.create` permission; create group manually |
| Spurious link delete/create cycles | Project number vs ID mismatch in link targets | Rely on built-in normalization (fixed in `CatalogSync.push`); ensure targets use consistent FQN form from `kcmd pull` |
| Context overlay modified live BQ entry | Pushed wrong files or wrong scope | Confirm `scope:` points at your entry group, not `bq-dataset`; verify only `<table>.yaml` (not `<table>.ref.yaml`) changed locally |
| `kcmd not found` | Binary not built or not on PATH | `cd agents/mdcode && npm run build`; set `KCMD_BIN` or add `dist/` to PATH |

## Next

<CardGroup>
<Card title="Sync catalog metadata" href="/sync-catalog-metadata">
  Initialize workspaces, pull snapshots, and understand the full pull/push lifecycle before publishing.
</Card>
<Card title="Catalog manifest reference" href="/catalog-manifest-reference">
  Configure `snapshot`, `publishing`, `reference`, and `entryLinks` reconciliation rules in `catalog.yaml`.
</Card>
<Card title="kcmd CLI reference" href="/kcmd-cli-reference">
  Full `kcmd push` flags, init modes, and authentication via gcloud ADC.
</Card>
<Card title="Evaluate enrichment output" href="/evaluate-enrichment-output">
  Score structural validity and grounding before you push enriched metadata.
</Card>
<Card title="Troubleshooting" href="/troubleshooting">
  Auth, billing, push conflict, and glossary provisioning failures.
</Card>
</CardGroup>
