# Quickstart > First successful runs: initialize a kcmd workspace and pull metadata, produce an OKF bundle from BigQuery, or run the catalog enrichment agent and inspect output. - Repository: GoogleCloudPlatform/knowledge-catalog - GitHub: https://github.com/GoogleCloudPlatform/knowledge-catalog - Human docs: https://www.grok-wiki.com/public/docs/googlecloudplatform-knowledge-catalog-9cee6ee3cba5 - Complete Markdown: https://www.grok-wiki.com/public/docs/googlecloudplatform-knowledge-catalog-9cee6ee3cba5/llms-full.txt ## Source Files - `agents/mdcode/README.md` - `okf/README.md` - `okf/src/enrichment_agent/cli.py` - `agents/enrichment/src/agent_runner.py` - `toolbox/mdcode/README.md` - `samples/enrichment/README.md` --- --- title: "Quickstart" description: "First successful runs: initialize a kcmd workspace and pull metadata, produce an OKF bundle from BigQuery, or run the catalog enrichment agent and inspect output." --- Knowledge Catalog tooling in this repository exposes three independent first-run surfaces: the `kcmd` CLI for Metadata as Code sync (`agents/mdcode`), the OKF enrichment agent for vendor-neutral markdown bundles (`okf/`), and the catalog enrichment agent that emits mdcode artifacts for `kcmd push` (`agents/enrichment/`). Each path below completes with inspectable filesystem output and a concrete verification command. Complete prerequisite setup — Node.js, Python, package installs, and credential configuration — is documented on the [Installation](/installation) page. This page assumes Application Default Credentials via `gcloud auth application-default login`. ## Choose a path | Path | CLI entrypoint | Primary output | Typical next step | |------|----------------|----------------|-------------------| | Sync catalog metadata | `kcmd` | `catalog.yaml` + `catalog/` YAML or Markdown entries | Edit locally, then `kcmd push` | | Produce an OKF bundle | `python -m enrichment_agent enrich` | Directory of OKF concept `.md` files + `index.md` | `visualize` or version in git | | Run catalog enrichment | `python3 agents/enrichment/src/agent_runner.py` | mdcode tree with overview sidecars + `trajectory.json` | Review, then `kcmd push` | ```mermaid flowchart LR subgraph kcmd_path ["kcmd workspace"] init["kcmd init"] pull["kcmd pull"] catalog["catalog/ entries"] init --> pull --> catalog end subgraph okf_path ["OKF enrichment"] enrich["enrichment_agent enrich"] bundle["OKF bundle/"] enrich --> bundle end subgraph agent_path ["Catalog enrichment agent"] runner["agent_runner.py"] mdcode["mdcode output_dir/"] runner --> mdcode end catalog --> push["kcmd push"] mdcode --> push ``` ## Shared prerequisites ```bash gcloud auth application-default login gcloud config set project ``` `kcmd` and BigQuery-backed agents use Application Default Credentials. Table-mode enrichment also requires Vertex AI access via `--project` and `--model`. ```bash cd agents/mdcode npm install npm run build export PATH="$(pwd)/dist:$PATH" which kcmd ``` The catalog enrichment agent shells out to `agents/mdcode/dist/kcmd` automatically; adding `dist/` to `PATH` lets you run `kcmd push` from any output directory. --- ## Path 1: Initialize a kcmd workspace and pull metadata `kcmd init` scaffolds `catalog.yaml` and selects a workspace mode. `kcmd pull` downloads editable metadata into `catalog/`. ```bash mkdir -p ~/kc-workspace && cd ~/kc-workspace ``` Dataset identifier as `project-id.dataset-id`. Repeat the flag to include multiple datasets in one workspace. ```bash kcmd init --bigquery-dataset . ``` Other init modes: `--kb` (Markdown knowledge base), `--entry-group`, `--biglake-namespace` (with `--iceberg`), or `--glossary`. ```bash kcmd pull ``` Pull writes entry files under `catalog/` — `.yaml` for data assets or `.md` for knowledge-base mode — plus optional `.ref.yaml` reference layers when declared in the manifest. ```bash kcmd status ls -R catalog/ ``` Success signals: `catalog.yaml` exists at the workspace root; `catalog/bigquery///` contains one `.yaml` file per table or view; `kcmd status` reports the local snapshot state without auth errors. ```bash title="BigQuery workspace init and pull" mkdir -p ~/kc-bq-demo && cd ~/kc-bq-demo kcmd init --bigquery-dataset my-project.analytics kcmd pull kcmd status ``` ```text title="Expected layout after pull" catalog.yaml catalog/ └── bigquery/ └── my-project/ └── analytics/ ├── my-project.analytics.yaml └── analytics/ ├── orders.yaml └── customers.yaml ``` --- ## Path 2: Produce an OKF bundle from BigQuery The OKF enrichment agent (`enrichment_agent`) runs a BigQuery pass that writes one OKF concept document per advertised concept, then an optional web pass that enriches from seed URLs. ```bash cd okf python3 -m venv .venv .venv/bin/pip install --index-url https://pypi.org/simple/ -e .[dev] ``` ```bash export GOOGLE_GENAI_USE_VERTEXAI=true export GOOGLE_CLOUD_PROJECT= export GOOGLE_CLOUD_LOCATION= ``` ```bash export GEMINI_API_KEY= ``` BigQuery reads public datasets with ADC; query bytes bill to your configured project. ```bash title="BQ-only (fastest first run)" .venv/bin/python -m enrichment_agent enrich \ --source bq \ --dataset bigquery-public-data.ga4_obfuscated_sample_ecommerce \ --no-web \ --out ./bundles/my-first-bundle ``` ```bash title="BQ + web pass (seeded docs)" .venv/bin/python -m enrichment_agent enrich \ --source bq \ --dataset bigquery-public-data.ga4_obfuscated_sample_ecommerce \ --web-seed-file samples/ga4_merch_store/seeds.txt \ --out ./bundles/ga4_merch_store ``` Source adapter. Currently `bq` (BigQuery). BigQuery dataset as `project.dataset`. Bundle root directory to create or update. Skip the web crawl pass entirely. Enrich a single concept id (e.g. `tables/events_`). Repeatable. ```bash find ./bundles/my-first-bundle -name '*.md' | head -20 cat ./bundles/my-first-bundle/index.md ``` Success signals: stderr reports `Enriched N concept(s) into `; the bundle contains `index.md` at each directory level, concept files under paths like `datasets/` and `tables/`, and YAML frontmatter with `type`, `title`, and `resource` fields on each concept. ```bash .venv/bin/python -m enrichment_agent visualize \ --bundle ./bundles/my-first-bundle open ./bundles/my-first-bundle/viz.html ``` :::files path/to/bundle/ ├── index.md ├── datasets/ │ └── ga4_obfuscated_sample_ecommerce.md ├── tables/ │ ├── index.md │ └── events_.md └── references/ # present when web pass runs └── metrics/ └── event_count.md ::: --- ## Path 3: Run the catalog enrichment agent and inspect output `agent_runner.py` dispatches to `table`, `doc`, or `context_overlay` modes. For a first run without Google Drive, use **table mode** with a local Markdown corpus and a BigQuery dataset the agent discovers via `kcmd init` + `kcmd pull`. ```bash python3 -m venv ~/.venv/kc-enrich source ~/.venv/kc-enrich/bin/activate pip install -r agents/enrichment/src/requirements.txt export PYTHONPATH=agents/enrichment/src ``` `table`, `doc`, or `context_overlay`. Omit to infer: `--dataset` present implies `table`. GCP project hosting the Vertex AI model. Vertex model id, e.g. `gemini-2.5-pro`. BigQuery dataset as `project.dataset`. Local directory for the generated mdcode tree. Comma-separated Google Drive folder URLs/IDs and/or local directories of `.md` files used as grounding context. ```bash python3 agents/enrichment/src/agent_runner.py \ --mode=table \ --dataset=. \ --folders=agents/enrichment/eval/corpora/thelook_ecommerce \ --topic="E-commerce analytics metadata" \ --project= \ --location=us-central1 \ --model=gemini-2.5-pro \ --output_dir=/tmp/enrich_out ``` The agent runs read-only `kcmd init` and `kcmd pull` internally, then writes `.overview.md` sidecars next to pulled entry YAML files. ```bash find /tmp/enrich_out -type f | sort cat /tmp/enrich_out/trajectory.json | head -40 ls /tmp/enrich_out/catalog/ ``` Success signals: `catalog.yaml` and `catalog/./` exist; each enriched table has a `

.yaml` entry and a `

.overview.md` sidecar; `trajectory.json` records tool calls (`read_local_md`, `fetch_gdoc`, etc.) for downstream evaluation. ```bash # Replace with an actual table name from your dataset cat /tmp/enrich_out/catalog/./

.overview.md ``` Overview sidecars carry the enriched prose; entry YAML retains the pulled schema as the source of truth. ```bash python3 agents/enrichment/src/agent_runner.py \ --mode=table \ --dataset=. \ --folders=agents/enrichment/eval/corpora/thelook_ecommerce \ --project= \ --model=gemini-2.5-pro \ --output_dir=/tmp/enrich_out \ --interactive ``` The `refine>` REPL reuses loaded context without re-pulling the dataset. ```text title="Table-mode log excerpt" [kcmd] 🔎 Discovering my-project.analytics via kcmd init + pull ... [kcmd] OK: ... [kcmd] 📑 orders (12 cols) [kcmd] 📑 customers (8 cols) ``` The enrichment agent generates mdcode and runs read-only `kcmd` commands only. Publishing enriched metadata to Knowledge Catalog is a separate `kcmd push` step from `--output_dir`. --- ## Compare outputs | Artifact | kcmd pull | OKF bundle | Catalog enrichment agent | |----------|-----------|------------|--------------------------| | Manifest | `catalog.yaml` | — | `catalog.yaml` (from `kcmd init`) | | Entry format | YAML (`.yaml`) or Markdown (`.md`) | OKF concept `.md` + frontmatter | YAML entry + `.overview.md` sidecar | | Schema source | Pulled from catalog | Embedded in concept body | Pulled via `kcmd pull` (not rewritten) | | Run log | CLI stdout | stderr summary line | `trajectory.json` | | Publish path | `kcmd push` | Exchange as files; no `kcmd` step | `kcmd push` from `--output_dir` | OKF bundles are vendor-neutral and portable across tools. mdcode output from the catalog enrichment agent is designed for direct `kcmd push` into Dataplex. --- ## Quick troubleshooting Re-run `gcloud auth application-default login` and confirm `gcloud config get-value project` matches the dataset project. Verify Dataplex/Knowledge Catalog API access for the target project. `--dataset` is required when `--source bq`. Use the fully qualified form `project.dataset`. Confirm `--dataset` uses `project.dataset` format, ADC is valid, and the dataset has readable `@bigquery` catalog entries. Check `[kcmd]` log lines for the underlying `kcmd init` + `kcmd pull` result. Both flags are mandatory in every mode. The agent configures Vertex AI from `--project`, `--location` (default `global`), and `--model`. ## Next Prerequisites, Python and Node.js setup, package installs, and credential configuration. Deeper kcmd pull/push workflows, reference layers, and glossary scope. Two-pass BQ-then-web enrichment, concept scoping, and web crawl constraints. All three modes, Drive and GitHub inputs, glossary linking, and refinement. Push mdcode workspaces with `kcmd push` and reconcile entry links. Auth, billing, push conflict, and model credential failures.