# Run the discovery agent

> Deploy the Knowledge Catalog discovery agent with ADK: required GCP APIs and IAM roles, environment variables, and root-agent or AgentTool integration patterns.

- Repository: GoogleCloudPlatform/knowledge-catalog
- GitHub: https://github.com/GoogleCloudPlatform/knowledge-catalog
- Human docs: https://www.grok-wiki.com/public/docs/googlecloudplatform-knowledge-catalog-9cee6ee3cba5
- Complete Markdown: https://www.grok-wiki.com/public/docs/googlecloudplatform-knowledge-catalog-9cee6ee3cba5/llms-full.txt

## Source Files

- `samples/discovery/README.md`
- `samples/discovery/agent.py`
- `samples/discovery/tools.py`
- `samples/discovery/utils.py`
- `samples/discovery/SKILL.md`
- `samples/discovery/requirements.txt`

---

---
title: "Run the discovery agent"
description: "Deploy the Knowledge Catalog discovery agent with ADK: required GCP APIs and IAM roles, environment variables, and root-agent or AgentTool integration patterns."
---

The Knowledge Catalog discovery agent in `samples/discovery/` is a Google ADK `llm_agent.Agent` that answers natural-language questions by calling Knowledge Catalog semantic search through `dataplex_v1.CatalogServiceClient.search_entries`. The agent loads its behavior from `SKILL.md`, uses `gemini-3-flash-preview` on Vertex AI, and exposes a single tool—`knowledge_catalog_search`—that returns catalog entry metadata for the LLM to decompose, batch, merge, and rerank.

## Architecture

```mermaid
flowchart TB
  subgraph adk["ADK runtime"]
    CLI["adk run parent folder"]
    Root["root_agent or parent Agent"]
    Disc["discovery_agent / knowledge_catalog_discovery_agent"]
    Skill["SKILL.md instruction"]
    Tool["knowledge_catalog_search"]
  end

  subgraph gcp["Google Cloud"]
    Vertex["Vertex AI Gemini"]
    KC["Knowledge Catalog Search API"]
  end

  CLI --> Root
  Root -->|"AgentTool (optional)"| Disc
  Disc --> Skill
  Disc --> Vertex
  Disc --> Tool
  Tool -->|"search_entries semantic_search=true"| KC
```

| Component | Module | Responsibility |
| --- | --- | --- |
| Agent definition | `samples/discovery/agent.py` | Builds `discovery_agent` with model, name, description, instruction, and tools |
| Search tool | `samples/discovery/tools.py` | Calls `CatalogServiceClient.search_entries` against `projects/{project}/locations/global` |
| Project resolution | `samples/discovery/utils.py` | Reads `GOOGLE_CLOUD_PROJECT` for consumer project and model path |
| Agent instruction | `samples/discovery/SKILL.md` | Semantic decomposition, predicate rules, parallel search batching, result merging |

## Prerequisites

### Required GCP APIs

| API | Service name |
| --- | --- |
| Knowledge Catalog | `dataplex.googleapis.com` |
| Vertex AI | `aiplatform.googleapis.com` |
| Service Usage | `serviceusage.googleapis.com` |

### Required IAM permissions

| Permission | Typical role |
| --- | --- |
| `dataplex.projects.search` | `roles/dataplex.viewer` |
| `aiplatform.endpoints.predict` | `roles/aiplatform.user` |
| `serviceusage.services.use` | `roles/serviceusage.serviceUsageConsumer` |

<Note>
Configure Application Default Credentials before running the agent. See [Installation](/installation) for `gcloud auth application-default login` and project setup.
</Note>

## Install dependencies

<Steps>
<Step title="Clone and enter the sample">

```bash
git clone https://github.com/GoogleCloudPlatform/knowledge-catalog.git
cd knowledge-catalog/samples/discovery
```

</Step>
<Step title="Create a virtual environment and install packages">

```bash
python3 -m venv /tmp/kcsearch
source /tmp/kcsearch/bin/activate
pip3 install -r requirements.txt
```

Packages from `requirements.txt`:

| Package | Purpose |
| --- | --- |
| `google-adk` | ADK agent runtime, `llm_agent.Agent`, ADK CLI |
| `google-cloud-dataplex` | `CatalogServiceClient` for Knowledge Catalog search |
| `google-api-core` | API error types (`PermissionDenied`) |

</Step>
</Steps>

## Environment variables

<ParamField body="GOOGLE_CLOUD_PROJECT" type="string" required>
Consumer GCP project ID. Used by `get_consumer_project()` in `utils.py` to build the Vertex model path and the Knowledge Catalog search parent `projects/{id}/locations/global`. Raises `ValueError` if unset.
</ParamField>

<ParamField body="GOOGLE_GENAI_USE_VERTEXAI" type="boolean" required>
Set to `True` so ADK routes Gemini calls through Vertex AI instead of the Gemini API.
</ParamField>

<RequestExample>

```bash
export GOOGLE_CLOUD_PROJECT=my-consumer-project
export GOOGLE_GENAI_USE_VERTEXAI=True
```

</RequestExample>

The agent resolves the model at startup:

```python
GEMINI_MODEL = f"projects/{consumer_project}/locations/global/publishers/google/models/gemini-3-flash-preview"
```

## Deployment patterns

The sample supports two ADK integration paths. Both use `adk run` against the **parent folder** that contains the agent package directory.

### Pattern 1: Root agent

Use when the discovery agent is the only agent in the deployment.

1. In `samples/discovery/agent.py`, rename `discovery_agent` to `root_agent`.
2. Run ADK against the parent of the agent source directory.

For the stock sample layout, the agent source lives in `samples/discovery/`, so the parent is `samples/`:

```bash
adk run samples
```

<Warning>
ADK requires the exported symbol to be named `root_agent`. The sample ships with `discovery_agent` so it can also be imported as a sub-agent.
</Warning>

### Pattern 2: Sub-agent via AgentTool

Use when a custom orchestrator delegates catalog search to the discovery agent. Copy the discovery package into your parent agent folder:

:::files
my_custom_agent/
├── agent.py
└── knowledge_catalog_discovery_agent/
    ├── SKILL.md
    ├── agent.py
    ├── tools.py
    └── utils.py
:::

Import `discovery_agent` from the copied package and wrap it with ADK `AgentTool` per the [ADK multi-agent docs](https://adk.dev/agents/multi-agents/#c-explicit-invocation-agenttool). Run against the parent folder:

```bash
adk run my_custom_agent
```

<Info>
The sub-agent is registered as `knowledge_catalog_discovery_agent` (see `agent.py` `name=` and `SKILL.md` frontmatter). Parent agents invoke it explicitly through `AgentTool` rather than as the default root.
</Info>

## Agent definition

The agent is constructed in `agent.py`:

| Field | Value |
| --- | --- |
| `name` | `knowledge_catalog_discovery_agent` |
| `description` | Searches Knowledge Catalog for data entries based on natural-language user queries |
| `model` | `google_llm.Gemini(model=GEMINI_MODEL)` |
| `instruction` | Contents of `SKILL.md` loaded by `load_instruction()` |
| `tools` | `[tools.knowledge_catalog_search]` |

## Search tool reference

`knowledge_catalog_search(query: str)` in `tools.py` calls the Knowledge Catalog Search API.

| Request field | Value |
| --- | --- |
| `name` | `projects/{GOOGLE_CLOUD_PROJECT}/locations/global` |
| `query` | Natural-language or predicate-qualified search string |
| `page_size` | `50` |
| `semantic_search` | `True` |
| API endpoint | `dataplex.googleapis.com` |

<ResponseField name="results" type="array">
On success, a list of objects with `entry_name`, `system`, `resource_id`, and `display_name` extracted from `result.dataplex_entry`.
</ResponseField>

<ResponseExample>

```json
{
  "results": [
    {
      "entry_name": "projects/my-project/locations/global/entryGroups/@bigquery/entries/my-table",
      "system": "BIGQUERY",
      "resource_id": "my-project.my_dataset.my_table",
      "display_name": "my_table"
    }
  ]
}
```

</ResponseExample>

Error shapes returned to the LLM:

| Key | Condition |
| --- | --- |
| `{"Error obtaining consumer project": "..."}` | `GOOGLE_CLOUD_PROJECT` missing |
| `{"error": "Permission denied: ..."}` | `PermissionDenied` from the API |
| `{"error": "An unexpected error occurred: ..."}` | Other exceptions |

## Agent search behavior

`SKILL.md` drives multi-step retrieval beyond a single API call:

1. **Understand the query** — preserve user-supplied predicates such as `type=table`.
2. **Semantic decomposition** — break business questions into data-engineering terms; generate up to three distinct query variations plus a **baseline search** (the verbatim user request).
3. **Predicate extraction** — map keywords to official predicates; embed `projectid=` constraints inside the `query` string argument.
4. **Parallel search** — batch searches to minimize round trips.
5. **Merge and rank** — deduplicate by `entry_name`, filter irrelevant hits, sort by relevance, return full entry names.

<AccordionGroup>
<Accordion title="Official search predicates">

| Predicate | Operators | Common triggers |
| --- | --- | --- |
| `type` | `=` | `table`, `dataset` |
| `system` | `=` | `bigquery`, `cloud_sql`, `dataplex` |
| `description` | `=` | `description` (only when user explicitly refers to description) |
| `name` | `:`, `=`, `!=` | `name` (only when user explicitly refers to resource name) |
| `displayname` | `:`, `=`, `!=` | `display name` |
| `projectid` | `=`, `:` | `project`, `project id` |
| `parent` | `=`, `:` | `parent` |

Logical operators `AND` and `OR` must be uppercase. Negation uses a leading hyphen (for example `-name:foo`). Knowledge Catalog search does not interpret double quotes in free text.

</Accordion>
<Accordion title="Example predicate queries">

| Natural language | Search query |
| --- | --- |
| BigQuery tables containing foo in project bar | `system=bigquery AND type=table AND name:foo AND projectid=bar` |
| Tables not containing foo | `type=table AND -name:foo` |
| Tables from project foo-1 or bar-1 | `type=table AND (projectid:foo-1 OR projectid:bar-1)` |
| All datasets | `type=dataset` |

</Accordion>
</AccordionGroup>

## Verification

After `adk run`, send a natural-language query such as *"Show me BigQuery tables in project my-project"*.

Expected signals:

- The agent issues one or more `knowledge_catalog_search` calls with predicates embedded in the query string.
- Successful responses include `results` with `entry_name`, `system`, `resource_id`, and `display_name`.
- Permission failures surface the `Permission denied` error string from the tool rather than crashing the agent loop.

## Troubleshooting

| Symptom | Likely cause | Fix |
| --- | --- | --- |
| `GOOGLE_CLOUD_PROJECT environment variable is required` | Missing env var | Export `GOOGLE_CLOUD_PROJECT` before `adk run` |
| `Permission denied` in tool output | Missing `dataplex.projects.search` | Grant `roles/dataplex.viewer` or equivalent on the consumer project |
| Vertex AI auth errors | Missing ADC or `GOOGLE_GENAI_USE_VERTEXAI` | Run `gcloud auth application-default login`; set `GOOGLE_GENAI_USE_VERTEXAI=True` |
| `adk run` cannot find agent | Wrong folder or symbol name | Use parent folder path; ensure `root_agent` is exported for standalone mode |
| Empty or irrelevant results | Query lacks predicates or uses double quotes | Follow `SKILL.md` predicate rules; avoid quoted free text |

See [Troubleshooting](/troubleshooting) for cross-cutting auth and billing issues.

## Related pages

<CardGroup>
<Card title="Overview" href="/overview">
Knowledge Catalog tooling surface: discovery agents, enrichment agents, OKF bundles, and kcmd workspaces.
</Card>
<Card title="Installation" href="/installation">
Python setup, package installs, and Application Default Credentials for Vertex AI and BigQuery.
</Card>
<Card title="Enrichment workflows" href="/enrichment-workflows">
How enrichment agents produce metadata context that discovery agents later search.
</Card>
<Card title="Sync catalog metadata" href="/sync-catalog-metadata">
Pull catalog entries into a kcmd workspace so enrichment and discovery share the same metadata layer.
</Card>
</CardGroup>
