# Enable RAG > Optional vector retrieval over OKF bundles: `pip install hermes-okf[rag]`, LangChain `DirectoryLoader` + `MarkdownHeaderTextSplitter`, Chroma persistence, `HERMES_OKF_ENABLE_RAG` config flag, and provider-neutral embedding model selection. - Repository: EliaszDev/hermes-okf - GitHub: https://github.com/EliaszDev/hermes-okf - Human docs: https://www.grok-wiki.com/public/docs/eliaszdev-hermes-okf-b71befaafe02 - Complete Markdown: https://www.grok-wiki.com/public/docs/eliaszdev-hermes-okf-b71befaafe02/llms-full.txt ## Source Files - `examples/rag_integration.py` - `pyproject.toml` - `README.md` - `src/hermes_okf/hermes_integration.py` - `docs/HERMES_USERS.md` --- --- title: "Enable RAG" description: "Optional vector retrieval over OKF bundles: `pip install hermes-okf[rag]`, LangChain `DirectoryLoader` + `MarkdownHeaderTextSplitter`, Chroma persistence, `HERMES_OKF_ENABLE_RAG` config flag, and provider-neutral embedding model selection." --- Vector retrieval in `hermes-okf` is an optional layer on top of the filesystem OKF bundle. The core package depends only on `pyyaml` and ships inverted-index full-text search via `SearchIndex`. Installing `hermes-okf[rag]` adds LangChain and ChromaDB so you can embed bundle markdown, persist vectors locally, and run semantic queries. OKF markdown remains the source of truth; Chroma is a derived index you rebuild when the bundle changes. ## When to use RAG vs full-text search | Capability | Mechanism | Best for | |------------|-----------|----------| | Keyword / token match | `SearchIndex` (stdlib inverted index) | Exact terms, concept IDs, tags | | `hermes-okf search` / `provider.search()` | Same `SearchIndex` | CLI and provider keyword recall | | Semantic similarity | Chroma + embeddings via `[rag]` extra | Paraphrased queries, conceptually related content | RAG does not replace OKF storage or the built-in search index. It adds a vector index derived from the same `.md` files. For hybrid recall, combine `provider.search()` (keywords) with `provider.rag_search()` (semantics). ## Install the RAG extra ```bash pip install hermes-okf[rag] ``` This pulls in: | Package | Role | |---------|------| | `langchain` | Text splitting utilities | | `langchain-community` | `DirectoryLoader`, `TextLoader` | | `langchain-chroma` | Chroma vector store | | `langchain-openai` | Default `OpenAIEmbeddings` adapter | The built-in provider path uses `OpenAIEmbeddings`. Export the API key your embedding provider expects (for OpenAI-compatible endpoints, typically `OPENAI_API_KEY`). Custom pipelines can swap in any LangChain `Embeddings` implementation without changing OKF files. ```python from langchain_chroma import Chroma from langchain_openai import OpenAIEmbeddings ``` If this raises `ImportError`, the `[rag]` extra is not installed in the active environment. ## Architecture ```mermaid flowchart LR subgraph OKF["OKF bundle (source of truth)"] MD["**/*.md concepts"] end subgraph LC["LangChain ingestion"] DL["DirectoryLoader + TextLoader"] SP["MarkdownHeaderTextSplitter"] end subgraph VS["Vector store"] CH["Chroma persist dir"] end subgraph EMB["Embedding provider (swappable)"] OE["OpenAIEmbeddings (default)"] end MD --> DL --> SP --> CH OE --> CH CH --> RET["retriever.invoke() / rag_search()"] ``` Chroma persistence locations differ by integration path: | Path | Persist directory | |------|-------------------| | `HermesOKFProvider.rag_search()` | `{bundle_path}/.chroma` | | `examples/rag_integration.py` | `./chroma_okf_db` (caller-defined) | ## Provider API: `HermesOKFProvider.rag_search()` `HermesOKFProvider` in `hermes_okf.hermes_integration` exposes semantic search when the `[rag]` extra is installed. ```python from hermes_okf import HermesOKFProvider provider = HermesOKFProvider() results = provider.rag_search("deployment strategies for Python services", top_k=5) for r in results: print(f"{r['source']}: {r['content'][:100]}") ``` ```python [ {"source": "/path/to/bundle/hermes/decisions/3_replicas.md", "content": "## Decision\nUse 3 replicas for..."}, # ... ] ``` ### Index build behavior On the first `rag_search()` call, if `{bundle_path}/.chroma` does not exist, the provider runs `_build_rag_index()`: 1. `DirectoryLoader` loads all `**/*.md` under `bundle_path` with UTF-8 `TextLoader`. 2. `MarkdownHeaderTextSplitter` splits on `#` (Header 1) and `##` (Header 2). 3. `Chroma.from_documents()` embeds splits and writes to `{bundle_path}/.chroma`. Subsequent calls load the existing Chroma directory. The index is **not** automatically rebuilt when bundle markdown changes. After adding or editing concepts, delete `{bundle_path}/.chroma` (or call your own rebuild logic) to refresh vectors. Stale indexes return outdated chunks. ### Missing extra error If `[rag]` is not installed, `rag_search()` raises: ``` ImportError: RAG requires hermes-okf[rag]. Install: pip install hermes-okf[rag] ``` ## Custom pipeline: `examples/rag_integration.py` For full control over persist path, chunking, and embedding provider, build the index yourself. The example script mirrors the provider's ingestion steps but keeps Chroma outside the bundle: ```python title="Load bundle markdown" from hermes_okf.bundle import OKFBundle from langchain_community.document_loaders import DirectoryLoader, TextLoader bundle = OKFBundle("./my_knowledge") loader = DirectoryLoader( str(bundle.root), glob="**/*.md", loader_cls=TextLoader, loader_kwargs={"encoding": "utf-8"}, ) docs = loader.load() ``` ```python title="Split on headers" from langchain.text_splitter import MarkdownHeaderTextSplitter splitter = MarkdownHeaderTextSplitter( headers_to_split_on=[("#", "Header 1"), ("##", "Header 2")] ) splits = [] for doc in docs: splits.extend(splitter.split_text(doc.page_content)) ``` ```python title="Embed and query" from langchain_chroma import Chroma from langchain_openai import OpenAIEmbeddings vectorstore = Chroma.from_documents( documents=splits, embedding=OpenAIEmbeddings(), # swap for your preferred embedding model persist_directory="./chroma_okf_db", ) retriever = vectorstore.as_retriever(search_kwargs={"k": 5}) results = retriever.invoke("What GPU decisions did we make?") ``` OKF files are unchanged regardless of which embedding class you pass to Chroma. ## Provider-neutral embedding selection The `[rag]` extra installs `langchain-openai` as the default adapter, but embedding choice is not tied to OKF format or Hermes runtime. ```python from langchain_openai import OpenAIEmbeddings embedding = OpenAIEmbeddings(model="openai/text-embedding-3-small") ``` Used by `HermesOKFProvider` via `self.config.rag_model` (default `openai/text-embedding-3-small`). ```python from langchain_openai import OpenAIEmbeddings embedding = OpenAIEmbeddings( model="openai/text-embedding-3-small", openai_api_base="https://openrouter.ai/api/v1", openai_api_key="your-key", ) ``` The example script comment references `OpenRouterEmbeddings` as an alternative — any LangChain `Embeddings` subclass works in custom pipelines. ```python from langchain_core.embeddings import Embeddings class MyEmbeddings(Embeddings): def embed_documents(self, texts: list[str]) -> list[list[float]]: ... def embed_query(self, text: str) -> list[float]: ... vectorstore = Chroma.from_documents(documents=splits, embedding=MyEmbeddings(), ...) ``` `HermesOKFProvider._build_rag_index()` and `rag_search()` hardcode `OpenAIEmbeddings`. To use a non-OpenAI embedding provider with the provider API today, either point `OpenAIEmbeddings` at a compatible API base URL or use the custom pipeline in `examples/rag_integration.py`. ## Configuration RAG settings live on `HermesOKFConfig` and resolve through the same order as other provider settings: environment variables → `~/.hermes/hermes-okf.yaml` → `plugins.hermes_okf` in `~/.hermes/config.yaml` → defaults. Documents whether your deployment uses vector retrieval. Loaded from `HERMES_OKF_ENABLE_RAG` (truthy: `1`, `true`, `yes`) or YAML `enable_rag`. Does not gate `rag_search()` — call that method explicitly when you need semantic results. Embedding model passed to `OpenAIEmbeddings` in `_build_rag_index()` and `rag_search()`. Set in `~/.hermes/hermes-okf.yaml` or under `plugins.hermes_okf` in Hermes `config.yaml`. No dedicated `HERMES_OKF_RAG_MODEL` environment variable. Environment override for `enable_rag`. Accepted truthy values: `1`, `true`, `yes` (case-insensitive). ### Example YAML ```yaml # ~/.hermes/hermes-okf.yaml bundle_path: ~/.hermes/okf_memory enable_rag: true rag_model: openai/text-embedding-3-small ``` Or under Hermes main config: ```yaml plugins: hermes_okf: enable_rag: true rag_model: openai/text-embedding-3-small ``` ## Hybrid memory model RAG fits the two-memory pattern alongside Hermes hot memory and the OKF cold archive: | Layer | Role | Search type | |-------|------|-------------| | Hermes hot memory (`MEMORY.md`, `USER.md`) | Always-in-prompt facts | None (inline) | | OKF cold archive | Typed concepts, graph links | Full-text (`SearchIndex`) | | Chroma vector index | Derived semantic index | Vector (`rag_search` / custom retriever) | Use hot memory for critical facts, OKF search for typed/linked knowledge, and RAG when the query is semantic rather than lexical. ## Operational notes Neither `hermes-okf` nor `hermes okf` exposes a RAG-specific subcommand. Semantic search runs through Python (`provider.rag_search()`) or a custom script based on `examples/rag_integration.py`. Keyword search remains available via `hermes-okf search` and `hermes okf search`. The provider stores vectors at `{bundle_path}/.chroma`. Add `.chroma/` to `.gitignore` if you version-control the bundle — vectors are reproducible from markdown. Both the provider and the example split only on `#` and `##` headers. Content without those headers becomes a single chunk per file. Adjust `headers_to_split_on` in custom pipelines for deeper splits. ## Related pages Install `hermes-okf[rag]` and optional extras (`[dev]`, `[all]`). Full `HermesOKFConfig` fields, env vars, and resolution order. End-to-end vector workflow from `examples/rag_integration.py`. `HermesOKFProvider` lifecycle hooks and `search()` vs `rag_search()`. Hot buffer vs cold OKF archive and flush triggers.