# Installation

> Prerequisites, Python and Node.js setup, package installs, and credential configuration for BigQuery, Vertex AI or Gemini, and gcloud Application Default Credentials.

- Repository: GoogleCloudPlatform/knowledge-catalog
- GitHub: https://github.com/GoogleCloudPlatform/knowledge-catalog
- Human docs: https://www.grok-wiki.com/public/docs/googlecloudplatform-knowledge-catalog-9cee6ee3cba5
- Complete Markdown: https://www.grok-wiki.com/public/docs/googlecloudplatform-knowledge-catalog-9cee6ee3cba5/llms-full.txt

## Source Files

- `okf/pyproject.toml`
- `okf/README.md`
- `agents/mdcode/package.json`
- `toolbox/enrichment/package.json`
- `agents/enrichment/src/requirements.txt`
- `samples/discovery/requirements.txt`
- `samples/enrichment/src/env.sh`

---

---
title: "Installation"
description: "Prerequisites, Python and Node.js setup, package installs, and credential configuration for BigQuery, Vertex AI or Gemini, and gcloud Application Default Credentials."
---

Knowledge Catalog tooling in this repository ships as independent install surfaces—`kcmd` (TypeScript/Bun binary), Python enrichment agents, and sample workflows—each with its own package manifest but a shared dependency on `gcloud` Application Default Credentials (ADC) for GCP APIs and a separate model credential path for LLM-backed agents.

## Install surfaces

| Surface | Path | Runtime | Primary binary / entrypoint |
| --- | --- | --- | --- |
| Metadata as Code (`kcmd`) | `agents/mdcode/` (mirror: `toolbox/mdcode/`) | Node.js, npm, Bun | `dist/kcmd` |
| Catalog enrichment agent | `agents/enrichment/` | Python 3.11+ | `agents/enrichment/src/agent_runner.py` |
| OKF enrichment agent | `okf/` | Python ≥ 3.11 | `enrichment-agent` / `python -m enrichment_agent` |
| Toolbox enrichment harness | `toolbox/enrichment/` | Node.js, npm, Bun | `dist/kcagent`, `dist/md-fileset` |
| Discovery agent sample | `samples/discovery/` | Python 3.11+ | ADK CLI against `agent.py` |
| Enrichment sample | `samples/enrichment/` | Python 3.11+ | `python3 -m enrichment.*` |

<Note>
Install only the surfaces you plan to run. The catalog enrichment agent shells out to a built `kcmd` binary; the toolbox `kcagent` package depends on the sibling `toolbox/mdcode` build.
</Note>

## Prerequisites

| Requirement | Used by | Verification |
| --- | --- | --- |
| **gcloud CLI** | `kcmd`, BigQuery, Knowledge Catalog APIs, ADC | `gcloud --version` |
| **Python 3.11+** | OKF agent (`requires-python = ">=3.11"`), catalog enrichment agent, samples | `python3 --version` |
| **Node.js + npm** (recent LTS) | `kcmd`, `kcagent` builds | `node --version`, `npm --version` |
| **Bun** (via `npm install`) | Compiles standalone `kcmd` and `kcagent` binaries | Installed as a devDependency; invoked by `npm run build` |

For discovery-agent deployment you also need a GCP project with Knowledge Catalog (`dataplex.googleapis.com`), Vertex AI (`aiplatform.googleapis.com`), and Service Usage (`serviceusage.googleapis.com`) APIs enabled, plus IAM roles that grant `dataplex.projects.search`, `aiplatform.endpoints.predict`, and `serviceusage.services.use`.

## Install `kcmd`

`kcmd` is the Metadata as Code CLI and MCP server. Build it from `agents/mdcode` (or the equivalent `toolbox/mdcode` tree).

<Steps>
<Step title="Install Node dependencies">

```bash
cd agents/mdcode
npm install
```

</Step>

<Step title="Build the standalone binary">

```bash
npm run build
```

Produces `agents/mdcode/dist/kcmd`. The build compiles TypeScript (`build:libts`) then uses Bun to emit a single executable (`build:tool`).

</Step>

<Step title="Optional: add kcmd to PATH">

```bash
echo "export PATH=\"$(pwd)/dist:\$PATH\"" >> ~/.zshrc
source ~/.zshrc
which kcmd
```

The catalog enrichment agent resolves `agents/mdcode/dist/kcmd` automatically; PATH is only required when you invoke `kcmd` yourself (for example `kcmd push`).

</Step>
</Steps>

<Tabs>
<Tab title="agents/mdcode">

Canonical path referenced by `agents/enrichment/src/tools/kcmd_tools.py`.

```bash
cd agents/mdcode && npm install && npm run build
```

</Tab>

<Tab title="toolbox/mdcode">

Used by the toolbox enrichment demo and `kcagent` package (`kcmd: file:../mdcode`).

```bash
cd toolbox/mdcode && npm install && npm run build
```

</Tab>
</Tabs>

## Install Python agents

### Catalog enrichment agent

<Steps>
<Step title="Build kcmd first">

Follow the `kcmd` steps above. The agent never calls the Dataplex API directly—it shells out to `kcmd init`, `kcmd pull`, and `kcmd reference`.

</Step>

<Step title="Create a virtual environment and install dependencies">

```bash
python3 -m venv ~/.venv/kc-enrich
source ~/.venv/kc-enrich/bin/activate
pip install -r agents/enrichment/src/requirements.txt
```

Core packages: `google-adk`, `google-genai`, `google-api-python-client`, `google-auth`, `google-cloud-bigquery`, `pyyaml`, `requests`, `absl-py`. Install `mcp` only when `--repo` uses a local stdio GitHub MCP server; the default hosted remote works without it.

</Step>

<Step title="Set PYTHONPATH for direct invocation">

```bash
export PYTHONPATH=agents/enrichment/src
```

</Step>
</Steps>

### OKF enrichment agent

From the `okf/` directory:

```bash
python3.13 -m venv .venv
.venv/bin/pip install --index-url https://pypi.org/simple/ -e .[dev]
```

The package (`enrichment-agent`) requires Python ≥ 3.11 and installs `google-adk`, `google-cloud-bigquery`, `pyyaml`, `pydantic`, and `markdownify`. The `enrichment-agent` console script maps to `enrichment_agent.cli:main`.

Run tests after install:

```bash
.venv/bin/pytest
```

### Toolbox `kcagent`

```bash
cd toolbox/enrichment
npm install
npm run build
```

Produces `dist/kcagent` and `dist/md-fileset`. Requires a built `toolbox/mdcode/dist/kcmd` on the relative path `../../mdcode/dist/kcmd`.

### Samples

<Tabs>
<Tab title="Discovery sample">

```bash
python3 -m venv /tmp/kcsearch
source /tmp/kcsearch/bin/activate
cd samples/discovery
pip3 install -r requirements.txt
```

Dependencies: `google-adk`, `google-cloud-dataplex`, `google-api-core`.

</Tab>

<Tab title="Enrichment sample">

```bash
git clone https://github.com/GoogleCloudPlatform/knowledge-catalog.git
cd samples/enrichment/src
source env.sh --install
```

`env.sh --install` creates `.venv` and runs `pip install -r requirements.txt`. It exports `GOOGLE_GENAI_USE_VERTEXAI=True` and reads the active gcloud project into `KC_ENRICH_SAMPLE_PROJECT`.

</Tab>
</Tabs>

### Evaluation tooling (optional)

To score enrichment output with `python -m eval --run`, install both requirement files:

```bash
pip install -r agents/enrichment/src/requirements.txt \
            -r agents/enrichment/eval/requirements.txt
```

## Credential configuration

```mermaid
flowchart LR
  subgraph gcloud["gcloud CLI"]
    ADC["ADC token"]
    Proj["config project"]
    Region["compute/region"]
  end
  subgraph consumers["Consumers"]
    kcmd["kcmd / MCP"]
    BQ["BigQuery client"]
    Catalog["Knowledge Catalog HTTP"]
  end
  subgraph llm["LLM backends"]
    Vertex["Vertex AI"]
    Studio["Gemini API key"]
  end
  ADC --> kcmd
  ADC --> BQ
  ADC --> Catalog
  Proj --> kcmd
  Region --> kcmd
  Vertex --> OKF["OKF agent"]
  Vertex --> CatalogAgent["Catalog enrichment agent"]
  Studio --> OKF
```

### Application Default Credentials

`kcmd` obtains tokens by shelling out to gcloud:

```bash
gcloud auth application-default login
gcloud config set project <project-id>
gcloud config set compute/region <region>
```

`ApiContext.default()` reads the active project (`gcloud config get-value project`), compute region (`gcloud config get-value compute/region`), and ADC access token (`gcloud auth application-default print-access-token`). All three must be non-empty or `kcmd` fails fast. Tokens refresh automatically on HTTP 401 via `gcloud auth application-default print-access-token`.

<Warning>
`kcmd` requires a configured compute region, not just a project. Set `gcloud config set compute/region` before running `kcmd pull` or `kcmd push`.
</Warning>

For catalog enrichment with Google Drive sources, request Drive read scope at login:

```bash
gcloud auth application-default login \
  --scopes='openid,https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/drive.readonly'
```

The enrichment sample also sets a quota project:

```bash
gcloud auth application-default set-quota-project $CLOUD_PROJECT
```

### BigQuery

BigQuery clients use ADC. Public datasets (for example `bigquery-public-data.*`) are readable, but query bytes bill against the caller's project:

```bash
gcloud auth application-default login
gcloud config set project <your-billing-project>
```

The OKF agent accepts an optional `--billing-project` flag; when omitted, the BigQuery client uses the ADC default project.

### Vertex AI and Gemini

Model credentials are separate from catalog ADC. Choose one backend:

<Tabs>
<Tab title="Vertex AI (catalog enrichment agent)">

The catalog enrichment agent always sets Vertex mode from CLI flags—no manual env export is required at runtime:

<ParamField body="--project" type="string" required>
GCP project for the Vertex AI model. Also sets `GOOGLE_CLOUD_PROJECT`.
</ParamField>

<ParamField body="--location" type="string">
Vertex AI region. Default: `global`.
</ParamField>

<ParamField body="--model" type="string" required>
Model ID, for example `gemini-2.5-pro`.
</ParamField>

```bash
python3 agents/enrichment/src/agent_runner.py \
  --mode=table \
  --dataset=<project>.<dataset> \
  --project=<your_gcp_project> \
  --location=us-central1 \
  --model=gemini-2.5-pro \
  --output_dir=<local_output_dir>
```

</Tab>

<Tab title="Vertex AI (OKF / discovery samples)">

```bash
export GOOGLE_GENAI_USE_VERTEXAI=true
export GOOGLE_CLOUD_PROJECT=<project-id>
export GOOGLE_CLOUD_LOCATION=<region>
```

Discovery sample sets `GOOGLE_GENAI_USE_VERTEXAI=True` and `GOOGLE_CLOUD_PROJECT` before running via ADK.

</Tab>

<Tab title="Gemini API key (OKF agent)">

For AI Studio instead of Vertex:

```bash
export GEMINI_API_KEY=<your-api-key>
```

Do not set `GOOGLE_GENAI_USE_VERTEXAI` when using an API key. The OKF agent default model is `gemini-flash-latest` (override with `--model`).

</Tab>
</Tabs>

### Optional credentials

| Variable / secret | When needed |
| --- | --- |
| `KCMD_BIN` | Override auto-resolved `agents/mdcode/dist/kcmd` path |
| `GITHUB_PERSONAL_ACCESS_TOKEN` | `--repo` GitHub source via GitHub MCP server |
| `KC_ENRICH_MCP_CONFIG` | Custom MCP server configuration for GitHub tools |
| `KC_ENRICH_SAMPLE_PROJECT` | Set automatically by `samples/enrichment/src/env.sh` from gcloud config |

## Verify installation

<Steps>
<Step title="Confirm gcloud ADC">

```bash
gcloud auth application-default print-access-token | head -c 20
gcloud config get-value project
gcloud config get-value compute/region
```

Each command should return a non-empty value.

</Step>

<Step title="Confirm kcmd binary">

```bash
agents/mdcode/dist/kcmd --help
# or, if on PATH:
kcmd --help
```

</Step>

<Step title="Confirm Python agent imports">

```bash
source ~/.venv/kc-enrich/bin/activate
export PYTHONPATH=agents/enrichment/src
python3 -c "import engine; print('ok')"
```

</Step>

<Step title="Confirm OKF agent CLI">

```bash
cd okf && .venv/bin/enrichment-agent --help
```

</Step>

<Step title="Confirm toolbox binaries (if built)">

```bash
toolbox/enrichment/dist/kcagent --help
toolbox/mdcode/dist/kcmd --help
```

</Step>
</Steps>

<Check>
A successful install returns help text from each built binary and a non-empty ADC token. The catalog enrichment agent additionally requires `--project` and `--model` at run time; missing values raise `UsageError` before any enrichment work starts.
</Check>

## Environment variable reference

| Variable | Set by | Purpose |
| --- | --- | --- |
| `GOOGLE_GENAI_USE_VERTEXAI` | User or agent (`agent_runner.py`, `env.sh`) | Route `google-genai` calls through Vertex AI |
| `GOOGLE_CLOUD_PROJECT` | User, flags, or `env.sh` | Vertex project and genai client project |
| `GOOGLE_CLOUD_LOCATION` | User or `--location` flag | Vertex region (default `global` in catalog agent) |
| `GEMINI_API_KEY` | User | AI Studio authentication for OKF agent |
| `KCMD_BIN` | User | Explicit path to `kcmd` binary |
| `GITHUB_PERSONAL_ACCESS_TOKEN` | User | GitHub MCP server PAT for `--repo` |
| `GCP_LOG` | User | Enable verbose HTTP logging in `kcmd` `ApiContext` |

## Next

<CardGroup>
<Card title="Quickstart" href="/quickstart">
First successful runs: initialize a kcmd workspace, produce an OKF bundle, or run the catalog enrichment agent.
</Card>
<Card title="kcmd CLI reference" href="/kcmd-cli-reference">
Commands, init flags per source type, pull/push options, and ADC authentication behavior.
</Card>
<Card title="Troubleshooting" href="/troubleshooting">
Auth, billing, push conflict, and model credential failures with verification signals.
</Card>
</CardGroup>
