# Quickstart

> First successful runs: initialize a kcmd workspace and pull metadata, produce an OKF bundle from BigQuery, or run the catalog enrichment agent and inspect output.

- Repository: GoogleCloudPlatform/knowledge-catalog
- GitHub: https://github.com/GoogleCloudPlatform/knowledge-catalog
- Human docs: https://www.grok-wiki.com/public/docs/googlecloudplatform-knowledge-catalog-9cee6ee3cba5
- Complete Markdown: https://www.grok-wiki.com/public/docs/googlecloudplatform-knowledge-catalog-9cee6ee3cba5/llms-full.txt

## Source Files

- `agents/mdcode/README.md`
- `okf/README.md`
- `okf/src/enrichment_agent/cli.py`
- `agents/enrichment/src/agent_runner.py`
- `toolbox/mdcode/README.md`
- `samples/enrichment/README.md`

---

---
title: "Quickstart"
description: "First successful runs: initialize a kcmd workspace and pull metadata, produce an OKF bundle from BigQuery, or run the catalog enrichment agent and inspect output."
---

Knowledge Catalog tooling in this repository exposes three independent first-run surfaces: the `kcmd` CLI for Metadata as Code sync (`agents/mdcode`), the OKF enrichment agent for vendor-neutral markdown bundles (`okf/`), and the catalog enrichment agent that emits mdcode artifacts for `kcmd push` (`agents/enrichment/`). Each path below completes with inspectable filesystem output and a concrete verification command.

<Note>
Complete prerequisite setup — Node.js, Python, package installs, and credential configuration — is documented on the [Installation](/installation) page. This page assumes Application Default Credentials via `gcloud auth application-default login`.
</Note>

## Choose a path

| Path | CLI entrypoint | Primary output | Typical next step |
|------|----------------|----------------|-------------------|
| Sync catalog metadata | `kcmd` | `catalog.yaml` + `catalog/` YAML or Markdown entries | Edit locally, then `kcmd push` |
| Produce an OKF bundle | `python -m enrichment_agent enrich` | Directory of OKF concept `.md` files + `index.md` | `visualize` or version in git |
| Run catalog enrichment | `python3 agents/enrichment/src/agent_runner.py` | mdcode tree with overview sidecars + `trajectory.json` | Review, then `kcmd push` |

```mermaid
flowchart LR
  subgraph kcmd_path ["kcmd workspace"]
    init["kcmd init"]
    pull["kcmd pull"]
    catalog["catalog/ entries"]
    init --> pull --> catalog
  end
  subgraph okf_path ["OKF enrichment"]
    enrich["enrichment_agent enrich"]
    bundle["OKF bundle/"]
    enrich --> bundle
  end
  subgraph agent_path ["Catalog enrichment agent"]
    runner["agent_runner.py"]
    mdcode["mdcode output_dir/"]
    runner --> mdcode
  end
  catalog --> push["kcmd push"]
  mdcode --> push
```

## Shared prerequisites

<Steps>
<Step title="Authenticate to Google Cloud">

```bash
gcloud auth application-default login
gcloud config set project <your-gcp-project-id>
```

`kcmd` and BigQuery-backed agents use Application Default Credentials. Table-mode enrichment also requires Vertex AI access via `--project` and `--model`.

</Step>
<Step title="Build kcmd (required for paths 1 and 3)">

```bash
cd agents/mdcode
npm install
npm run build
export PATH="$(pwd)/dist:$PATH"
which kcmd
```

The catalog enrichment agent shells out to `agents/mdcode/dist/kcmd` automatically; adding `dist/` to `PATH` lets you run `kcmd push` from any output directory.

</Step>
</Steps>

---

## Path 1: Initialize a kcmd workspace and pull metadata

`kcmd init` scaffolds `catalog.yaml` and selects a workspace mode. `kcmd pull` downloads editable metadata into `catalog/`.

<Steps>
<Step title="Create a workspace directory">

```bash
mkdir -p ~/kc-workspace && cd ~/kc-workspace
```

</Step>
<Step title="Initialize for a BigQuery dataset">

<ParamField body="--bigquery-dataset" type="string" required>
Dataset identifier as `project-id.dataset-id`. Repeat the flag to include multiple datasets in one workspace.
</ParamField>

```bash
kcmd init --bigquery-dataset <project-id>.<dataset-id>
```

Other init modes: `--kb` (Markdown knowledge base), `--entry-group`, `--biglake-namespace` (with `--iceberg`), or `--glossary`.

</Step>
<Step title="Pull a metadata snapshot">

```bash
kcmd pull
```

Pull writes entry files under `catalog/` — `.yaml` for data assets or `.md` for knowledge-base mode — plus optional `.ref.yaml` reference layers when declared in the manifest.

</Step>
<Step title="Verify the snapshot">

```bash
kcmd status
ls -R catalog/
```

<Check>
Success signals: `catalog.yaml` exists at the workspace root; `catalog/bigquery/<project>/<dataset>/` contains one `.yaml` file per table or view; `kcmd status` reports the local snapshot state without auth errors.
</Check>

</Step>
</Steps>

<RequestExample>

```bash title="BigQuery workspace init and pull"
mkdir -p ~/kc-bq-demo && cd ~/kc-bq-demo
kcmd init --bigquery-dataset my-project.analytics
kcmd pull
kcmd status
```

</RequestExample>

<ResponseExample>

```text title="Expected layout after pull"
catalog.yaml
catalog/
└── bigquery/
    └── my-project/
        └── analytics/
            ├── my-project.analytics.yaml
            └── analytics/
                ├── orders.yaml
                └── customers.yaml
```

</ResponseExample>

---

## Path 2: Produce an OKF bundle from BigQuery

The OKF enrichment agent (`enrichment_agent`) runs a BigQuery pass that writes one OKF concept document per advertised concept, then an optional web pass that enriches from seed URLs.

<Steps>
<Step title="Install the OKF agent">

```bash
cd okf
python3 -m venv .venv
.venv/bin/pip install --index-url https://pypi.org/simple/ -e .[dev]
```

</Step>
<Step title="Configure model credentials">

<Tabs>
<Tab title="Vertex AI">

```bash
export GOOGLE_GENAI_USE_VERTEXAI=true
export GOOGLE_CLOUD_PROJECT=<your-gcp-project-id>
export GOOGLE_CLOUD_LOCATION=<region>
```

</Tab>
<Tab title="AI Studio">

```bash
export GEMINI_API_KEY=<your-api-key>
```

</Tab>
</Tabs>

BigQuery reads public datasets with ADC; query bytes bill to your configured project.

</Step>
<Step title="Run enrichment against a public dataset">

<CodeGroup>

```bash title="BQ-only (fastest first run)"
.venv/bin/python -m enrichment_agent enrich \
  --source bq \
  --dataset bigquery-public-data.ga4_obfuscated_sample_ecommerce \
  --no-web \
  --out ./bundles/my-first-bundle
```

```bash title="BQ + web pass (seeded docs)"
.venv/bin/python -m enrichment_agent enrich \
  --source bq \
  --dataset bigquery-public-data.ga4_obfuscated_sample_ecommerce \
  --web-seed-file samples/ga4_merch_store/seeds.txt \
  --out ./bundles/ga4_merch_store
```

</CodeGroup>

<ParamField body="--source" type="string" required>
Source adapter. Currently `bq` (BigQuery).
</ParamField>

<ParamField body="--dataset" type="string" required>
BigQuery dataset as `project.dataset`.
</ParamField>

<ParamField body="--out" type="path" required>
Bundle root directory to create or update.
</ParamField>

<ParamField body="--no-web" type="flag">
Skip the web crawl pass entirely.
</ParamField>

<ParamField body="--concept" type="string">
Enrich a single concept id (e.g. `tables/events_`). Repeatable.
</ParamField>

</Step>
<Step title="Inspect the bundle">

```bash
find ./bundles/my-first-bundle -name '*.md' | head -20
cat ./bundles/my-first-bundle/index.md
```

<Check>
Success signals: stderr reports `Enriched N concept(s) into <out>`; the bundle contains `index.md` at each directory level, concept files under paths like `datasets/` and `tables/`, and YAML frontmatter with `type`, `title`, and `resource` fields on each concept.
</Check>

</Step>
<Step title="Generate an interactive graph viewer (optional)">

```bash
.venv/bin/python -m enrichment_agent visualize \
  --bundle ./bundles/my-first-bundle
open ./bundles/my-first-bundle/viz.html
```

</Step>
</Steps>

:::files
path/to/bundle/
├── index.md
├── datasets/
│   └── ga4_obfuscated_sample_ecommerce.md
├── tables/
│   ├── index.md
│   └── events_.md
└── references/          # present when web pass runs
    └── metrics/
        └── event_count.md
:::

---

## Path 3: Run the catalog enrichment agent and inspect output

`agent_runner.py` dispatches to `table`, `doc`, or `context_overlay` modes. For a first run without Google Drive, use **table mode** with a local Markdown corpus and a BigQuery dataset the agent discovers via `kcmd init` + `kcmd pull`.

<Steps>
<Step title="Install Python dependencies">

```bash
python3 -m venv ~/.venv/kc-enrich
source ~/.venv/kc-enrich/bin/activate
pip install -r agents/enrichment/src/requirements.txt
export PYTHONPATH=agents/enrichment/src
```

</Step>
<Step title="Run table-mode enrichment">

<ParamField body="--mode" type="enum" required>
`table`, `doc`, or `context_overlay`. Omit to infer: `--dataset` present implies `table`.
</ParamField>

<ParamField body="--project" type="string" required>
GCP project hosting the Vertex AI model.
</ParamField>

<ParamField body="--model" type="string" required>
Vertex model id, e.g. `gemini-2.5-pro`.
</ParamField>

<ParamField body="--dataset" type="string" required>
BigQuery dataset as `project.dataset`.
</ParamField>

<ParamField body="--output_dir" type="path" required>
Local directory for the generated mdcode tree.
</ParamField>

<ParamField body="--folders" type="string">
Comma-separated Google Drive folder URLs/IDs and/or local directories of `.md` files used as grounding context.
</ParamField>

```bash
python3 agents/enrichment/src/agent_runner.py \
  --mode=table \
  --dataset=<project>.<dataset> \
  --folders=agents/enrichment/eval/corpora/thelook_ecommerce \
  --topic="E-commerce analytics metadata" \
  --project=<your-gcp-project> \
  --location=us-central1 \
  --model=gemini-2.5-pro \
  --output_dir=/tmp/enrich_out
```

The agent runs read-only `kcmd init` and `kcmd pull` internally, then writes `<table>.overview.md` sidecars next to pulled entry YAML files.

</Step>
<Step title="Inspect generated artifacts">

```bash
find /tmp/enrich_out -type f | sort
cat /tmp/enrich_out/trajectory.json | head -40
ls /tmp/enrich_out/catalog/
```

<Check>
Success signals: `catalog.yaml` and `catalog/<project>.<dataset>/` exist; each enriched table has a `<table>.yaml` entry and a `<table>.overview.md` sidecar; `trajectory.json` records tool calls (`read_local_md`, `fetch_gdoc`, etc.) for downstream evaluation.
</Check>

</Step>
<Step title="Review a table overview">

```bash
# Replace with an actual table name from your dataset
cat /tmp/enrich_out/catalog/<project>.<dataset>/<table>.overview.md
```

Overview sidecars carry the enriched prose; entry YAML retains the pulled schema as the source of truth.

</Step>
<Step title="Optional interactive refinement">

```bash
python3 agents/enrichment/src/agent_runner.py \
  --mode=table \
  --dataset=<project>.<dataset> \
  --folders=agents/enrichment/eval/corpora/thelook_ecommerce \
  --project=<your-gcp-project> \
  --model=gemini-2.5-pro \
  --output_dir=/tmp/enrich_out \
  --interactive
```

The `refine>` REPL reuses loaded context without re-pulling the dataset.

</Step>
</Steps>

<ResponseExample>

```text title="Table-mode log excerpt"
[kcmd] 🔎 Discovering my-project.analytics via kcmd init + pull ...
[kcmd] OK: ...
[kcmd] 📑 orders (12 cols)
[kcmd] 📑 customers (8 cols)
```

</ResponseExample>

<Warning>
The enrichment agent generates mdcode and runs read-only `kcmd` commands only. Publishing enriched metadata to Knowledge Catalog is a separate `kcmd push` step from `--output_dir`.
</Warning>

---

## Compare outputs

| Artifact | kcmd pull | OKF bundle | Catalog enrichment agent |
|----------|-----------|------------|--------------------------|
| Manifest | `catalog.yaml` | — | `catalog.yaml` (from `kcmd init`) |
| Entry format | YAML (`.yaml`) or Markdown (`.md`) | OKF concept `.md` + frontmatter | YAML entry + `.overview.md` sidecar |
| Schema source | Pulled from catalog | Embedded in concept body | Pulled via `kcmd pull` (not rewritten) |
| Run log | CLI stdout | stderr summary line | `trajectory.json` |
| Publish path | `kcmd push` | Exchange as files; no `kcmd` step | `kcmd push` from `--output_dir` |

OKF bundles are vendor-neutral and portable across tools. mdcode output from the catalog enrichment agent is designed for direct `kcmd push` into Dataplex.

---

## Quick troubleshooting

<AccordionGroup>
<Accordion title="kcmd pull returns auth or permission errors">

Re-run `gcloud auth application-default login` and confirm `gcloud config get-value project` matches the dataset project. Verify Dataplex/Knowledge Catalog API access for the target project.

</Accordion>
<Accordion title="OKF enrich exits on missing --dataset">

`--dataset` is required when `--source bq`. Use the fully qualified form `project.dataset`.

</Accordion>
<Accordion title="Catalog enrichment agent reports no tables pulled">

Confirm `--dataset` uses `project.dataset` format, ADC is valid, and the dataset has readable `@bigquery` catalog entries. Check `[kcmd]` log lines for the underlying `kcmd init` + `kcmd pull` result.

</Accordion>
<Accordion title="agent_runner.py requires --project and --model">

Both flags are mandatory in every mode. The agent configures Vertex AI from `--project`, `--location` (default `global`), and `--model`.

</Accordion>
</AccordionGroup>

## Next

<CardGroup>
<Card title="Installation" href="/installation">
Prerequisites, Python and Node.js setup, package installs, and credential configuration.
</Card>
<Card title="Sync catalog metadata" href="/sync-catalog-metadata">
Deeper kcmd pull/push workflows, reference layers, and glossary scope.
</Card>
<Card title="Produce OKF bundles" href="/produce-okf-bundles">
Two-pass BQ-then-web enrichment, concept scoping, and web crawl constraints.
</Card>
<Card title="Run catalog enrichment agent" href="/run-catalog-enrichment-agent">
All three modes, Drive and GitHub inputs, glossary linking, and refinement.
</Card>
<Card title="Publish enriched metadata" href="/publish-enriched-metadata">
Push mdcode workspaces with `kcmd push` and reconcile entry links.
</Card>
<Card title="Troubleshooting" href="/troubleshooting">
Auth, billing, push conflict, and model credential failures.
</Card>
</CardGroup>
