# Enable RAG

> Optional vector retrieval over OKF bundles: `pip install hermes-okf[rag]`, LangChain `DirectoryLoader` + `MarkdownHeaderTextSplitter`, Chroma persistence, `HERMES_OKF_ENABLE_RAG` config flag, and provider-neutral embedding model selection.

- Repository: EliaszDev/hermes-okf
- GitHub: https://github.com/EliaszDev/hermes-okf
- Human docs: https://www.grok-wiki.com/public/docs/eliaszdev-hermes-okf-b71befaafe02
- Complete Markdown: https://www.grok-wiki.com/public/docs/eliaszdev-hermes-okf-b71befaafe02/llms-full.txt

## Source Files

- `examples/rag_integration.py`
- `pyproject.toml`
- `README.md`
- `src/hermes_okf/hermes_integration.py`
- `docs/HERMES_USERS.md`

---

---
title: "Enable RAG"
description: "Optional vector retrieval over OKF bundles: `pip install hermes-okf[rag]`, LangChain `DirectoryLoader` + `MarkdownHeaderTextSplitter`, Chroma persistence, `HERMES_OKF_ENABLE_RAG` config flag, and provider-neutral embedding model selection."
---

Vector retrieval in `hermes-okf` is an optional layer on top of the filesystem OKF bundle. The core package depends only on `pyyaml` and ships inverted-index full-text search via `SearchIndex`. Installing `hermes-okf[rag]` adds LangChain and ChromaDB so you can embed bundle markdown, persist vectors locally, and run semantic queries. OKF markdown remains the source of truth; Chroma is a derived index you rebuild when the bundle changes.

## When to use RAG vs full-text search

| Capability | Mechanism | Best for |
|------------|-----------|----------|
| Keyword / token match | `SearchIndex` (stdlib inverted index) | Exact terms, concept IDs, tags |
| `hermes-okf search` / `provider.search()` | Same `SearchIndex` | CLI and provider keyword recall |
| Semantic similarity | Chroma + embeddings via `[rag]` extra | Paraphrased queries, conceptually related content |

<Note>
RAG does not replace OKF storage or the built-in search index. It adds a vector index derived from the same `.md` files. For hybrid recall, combine `provider.search()` (keywords) with `provider.rag_search()` (semantics).
</Note>

## Install the RAG extra

<Steps>
<Step title="Install optional dependencies">

```bash
pip install hermes-okf[rag]
```

This pulls in:

| Package | Role |
|---------|------|
| `langchain` | Text splitting utilities |
| `langchain-community` | `DirectoryLoader`, `TextLoader` |
| `langchain-chroma` | Chroma vector store |
| `langchain-openai` | Default `OpenAIEmbeddings` adapter |

</Step>

<Step title="Set embedding provider credentials">

The built-in provider path uses `OpenAIEmbeddings`. Export the API key your embedding provider expects (for OpenAI-compatible endpoints, typically `OPENAI_API_KEY`). Custom pipelines can swap in any LangChain `Embeddings` implementation without changing OKF files.

</Step>

<Step title="Verify import">

```python
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
```

If this raises `ImportError`, the `[rag]` extra is not installed in the active environment.

</Step>
</Steps>

## Architecture

```mermaid
flowchart LR
  subgraph OKF["OKF bundle (source of truth)"]
    MD["**/*.md concepts"]
  end

  subgraph LC["LangChain ingestion"]
    DL["DirectoryLoader + TextLoader"]
    SP["MarkdownHeaderTextSplitter"]
  end

  subgraph VS["Vector store"]
    CH["Chroma persist dir"]
  end

  subgraph EMB["Embedding provider (swappable)"]
    OE["OpenAIEmbeddings (default)"]
  end

  MD --> DL --> SP --> CH
  OE --> CH
  CH --> RET["retriever.invoke() / rag_search()"]
```

Chroma persistence locations differ by integration path:

| Path | Persist directory |
|------|-------------------|
| `HermesOKFProvider.rag_search()` | `{bundle_path}/.chroma` |
| `examples/rag_integration.py` | `./chroma_okf_db` (caller-defined) |

## Provider API: `HermesOKFProvider.rag_search()`

`HermesOKFProvider` in `hermes_okf.hermes_integration` exposes semantic search when the `[rag]` extra is installed.

<RequestExample>

```python
from hermes_okf import HermesOKFProvider

provider = HermesOKFProvider()

results = provider.rag_search("deployment strategies for Python services", top_k=5)
for r in results:
    print(f"{r['source']}: {r['content'][:100]}")
```

</RequestExample>

<ResponseExample>

```python
[
    {"source": "/path/to/bundle/hermes/decisions/3_replicas.md", "content": "## Decision\nUse 3 replicas for..."},
    # ...
]
```

</ResponseExample>

### Index build behavior

On the first `rag_search()` call, if `{bundle_path}/.chroma` does not exist, the provider runs `_build_rag_index()`:

1. `DirectoryLoader` loads all `**/*.md` under `bundle_path` with UTF-8 `TextLoader`.
2. `MarkdownHeaderTextSplitter` splits on `#` (Header 1) and `##` (Header 2).
3. `Chroma.from_documents()` embeds splits and writes to `{bundle_path}/.chroma`.

Subsequent calls load the existing Chroma directory. The index is **not** automatically rebuilt when bundle markdown changes.

<Warning>
After adding or editing concepts, delete `{bundle_path}/.chroma` (or call your own rebuild logic) to refresh vectors. Stale indexes return outdated chunks.
</Warning>

### Missing extra error

If `[rag]` is not installed, `rag_search()` raises:

```
ImportError: RAG requires hermes-okf[rag]. Install: pip install hermes-okf[rag]
```

## Custom pipeline: `examples/rag_integration.py`

For full control over persist path, chunking, and embedding provider, build the index yourself. The example script mirrors the provider's ingestion steps but keeps Chroma outside the bundle:

<CodeGroup>

```python title="Load bundle markdown"
from hermes_okf.bundle import OKFBundle
from langchain_community.document_loaders import DirectoryLoader, TextLoader

bundle = OKFBundle("./my_knowledge")

loader = DirectoryLoader(
    str(bundle.root),
    glob="**/*.md",
    loader_cls=TextLoader,
    loader_kwargs={"encoding": "utf-8"},
)
docs = loader.load()
```

```python title="Split on headers"
from langchain.text_splitter import MarkdownHeaderTextSplitter

splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=[("#", "Header 1"), ("##", "Header 2")]
)
splits = []
for doc in docs:
    splits.extend(splitter.split_text(doc.page_content))
```

```python title="Embed and query"
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=OpenAIEmbeddings(),  # swap for your preferred embedding model
    persist_directory="./chroma_okf_db",
)

retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
results = retriever.invoke("What GPU decisions did we make?")
```

</CodeGroup>

OKF files are unchanged regardless of which embedding class you pass to Chroma.

## Provider-neutral embedding selection

The `[rag]` extra installs `langchain-openai` as the default adapter, but embedding choice is not tied to OKF format or Hermes runtime.

<Tabs>
<Tab title="OpenAI (default)">

```python
from langchain_openai import OpenAIEmbeddings

embedding = OpenAIEmbeddings(model="openai/text-embedding-3-small")
```

Used by `HermesOKFProvider` via `self.config.rag_model` (default `openai/text-embedding-3-small`).

</Tab>

<Tab title="OpenAI-compatible / OpenRouter">

```python
from langchain_openai import OpenAIEmbeddings

embedding = OpenAIEmbeddings(
    model="openai/text-embedding-3-small",
    openai_api_base="https://openrouter.ai/api/v1",
    openai_api_key="your-key",
)
```

The example script comment references `OpenRouterEmbeddings` as an alternative — any LangChain `Embeddings` subclass works in custom pipelines.

</Tab>

<Tab title="Custom provider">

```python
from langchain_core.embeddings import Embeddings

class MyEmbeddings(Embeddings):
    def embed_documents(self, texts: list[str]) -> list[list[float]]: ...
    def embed_query(self, text: str) -> list[float]: ...

vectorstore = Chroma.from_documents(documents=splits, embedding=MyEmbeddings(), ...)
```

</Tab>
</Tabs>

<Info>
`HermesOKFProvider._build_rag_index()` and `rag_search()` hardcode `OpenAIEmbeddings`. To use a non-OpenAI embedding provider with the provider API today, either point `OpenAIEmbeddings` at a compatible API base URL or use the custom pipeline in `examples/rag_integration.py`.
</Info>

## Configuration

RAG settings live on `HermesOKFConfig` and resolve through the same order as other provider settings: environment variables → `~/.hermes/hermes-okf.yaml` → `plugins.hermes_okf` in `~/.hermes/config.yaml` → defaults.

<ParamField body="enable_rag" type="boolean" default="false">
Documents whether your deployment uses vector retrieval. Loaded from `HERMES_OKF_ENABLE_RAG` (truthy: `1`, `true`, `yes`) or YAML `enable_rag`. Does not gate `rag_search()` — call that method explicitly when you need semantic results.
</ParamField>

<ParamField body="rag_model" type="string" default="openai/text-embedding-3-small">
Embedding model passed to `OpenAIEmbeddings` in `_build_rag_index()` and `rag_search()`. Set in `~/.hermes/hermes-okf.yaml` or under `plugins.hermes_okf` in Hermes `config.yaml`. No dedicated `HERMES_OKF_RAG_MODEL` environment variable.
</ParamField>

<ParamField body="HERMES_OKF_ENABLE_RAG" type="string">
Environment override for `enable_rag`. Accepted truthy values: `1`, `true`, `yes` (case-insensitive).
</ParamField>

### Example YAML

```yaml
# ~/.hermes/hermes-okf.yaml
bundle_path: ~/.hermes/okf_memory
enable_rag: true
rag_model: openai/text-embedding-3-small
```

Or under Hermes main config:

```yaml
plugins:
  hermes_okf:
    enable_rag: true
    rag_model: openai/text-embedding-3-small
```

## Hybrid memory model

RAG fits the two-memory pattern alongside Hermes hot memory and the OKF cold archive:

| Layer | Role | Search type |
|-------|------|-------------|
| Hermes hot memory (`MEMORY.md`, `USER.md`) | Always-in-prompt facts | None (inline) |
| OKF cold archive | Typed concepts, graph links | Full-text (`SearchIndex`) |
| Chroma vector index | Derived semantic index | Vector (`rag_search` / custom retriever) |

Use hot memory for critical facts, OKF search for typed/linked knowledge, and RAG when the query is semantic rather than lexical.

## Operational notes

<AccordionGroup>
<Accordion title="No CLI subcommand for RAG">

Neither `hermes-okf` nor `hermes okf` exposes a RAG-specific subcommand. Semantic search runs through Python (`provider.rag_search()`) or a custom script based on `examples/rag_integration.py`. Keyword search remains available via `hermes-okf search` and `hermes okf search`.

</Accordion>

<Accordion title="Chroma lives beside OKF markdown">

The provider stores vectors at `{bundle_path}/.chroma`. Add `.chroma/` to `.gitignore` if you version-control the bundle — vectors are reproducible from markdown.

</Accordion>

<Accordion title="Header splitting limits chunk boundaries">

Both the provider and the example split only on `#` and `##` headers. Content without those headers becomes a single chunk per file. Adjust `headers_to_split_on` in custom pipelines for deeper splits.

</Accordion>
</AccordionGroup>

## Related pages

<CardGroup>
<Card title="Installation" href="/installation">
Install `hermes-okf[rag]` and optional extras (`[dev]`, `[all]`).
</Card>

<Card title="Configuration reference" href="/configuration-reference">
Full `HermesOKFConfig` fields, env vars, and resolution order.
</Card>

<Card title="RAG pipeline example" href="/example-rag-pipeline">
End-to-end vector workflow from `examples/rag_integration.py`.
</Card>

<Card title="Hermes provider integration" href="/hermes-provider-integration">
`HermesOKFProvider` lifecycle hooks and `search()` vs `rag_search()`.
</Card>

<Card title="Two-memory model" href="/two-memory-model">
Hot buffer vs cold OKF archive and flush triggers.
</Card>
</CardGroup>
