# Use extraction methods

> Invoke algorithm templates via `he parse -m light_rag` or `Template.create("method/hyper_rag")`; direct method classes (`Light_RAG`, `Atom`, etc.); and method-specific kwargs such as `observation_time` for temporal extractors.

- Repository: yifanfeng97/Hyper-Extract
- GitHub: https://github.com/yifanfeng97/Hyper-Extract
- Human docs: https://www.grok-wiki.com/public/docs/yifanfeng97-hyper-extract-7891c7254cdf
- Complete Markdown: https://www.grok-wiki.com/public/docs/yifanfeng97-hyper-extract-7891c7254cdf/llms-full.txt

## Source Files

- `hyperextract/methods/registry.py`
- `hyperextract/utils/template_engine/factory.py`
- `hyperextract/methods/rag/light_rag.py`
- `hyperextract/methods/typical/atom.py`
- `hyperextract/cli/cli.py`
- `examples/en/methods/light_rag_demo.py`

---

---
title: "Use extraction methods"
description: "Invoke algorithm templates via `he parse -m light_rag` or `Template.create(\"method/hyper_rag\")`; direct method classes (`Light_RAG`, `Atom`, etc.); and method-specific kwargs such as `observation_time` for temporal extractors."
---

Extraction methods are nine registered algorithms (`graph_rag`, `light_rag`, `hyper_rag`, `hypergraph_rag`, `cog_rag`, `itext2kg`, `itext2kg_star`, `kg_gen`, `atom`) resolved through `hyperextract/methods/registry.py`. Each method instantiates an `AutoGraph` or `AutoHypergraph` subclass with fixed English prompts. Invoke them through the CLI (`he parse -m <name>`), the `Template.create("method/<name>")` API, or by constructing method classes directly (`Light_RAG`, `Atom`, etc.).

<Note>
Method templates always use English prompts. The `--lang` flag is ignored when parsing with `-m`, and `Template.create` hardcodes `metadata["lang"] = "en"`.
</Note>

## How methods resolve

Method names map to concrete classes through a central registry. `TemplateFactory.create_method` looks up the class, forwards constructor `**kwargs`, and stamps metadata before returning the instance.

```mermaid
classDiagram
    direction LR
    class Registry {
        +register_method(name, class, autotype)
        +get_method(name)
        +list_methods()
    }
    class Template {
        +create(source, **kwargs)
        +get(path)
        +list()
    }
    class TemplateFactory {
        +create_method(name, llm, embedder, **kwargs)
        +create(source, **kwargs)
    }
    class Light_RAG
    class Hyper_RAG
    class Atom
    class AutoGraph
    class AutoHypergraph

    Registry --> Light_RAG
    Registry --> Hyper_RAG
    Registry --> Atom
    Light_RAG --|> AutoGraph
    Atom --|> AutoGraph
    Hyper_RAG --|> AutoHypergraph
    Template --> TemplateFactory
    TemplateFactory --> Registry : get_method
```

| Invocation path | Entry point | Resolves to |
|---|---|---|
| CLI | `he parse <input> -m light_rag -o <dir>` | `template = "method/light_rag"` → `Template.create(...)` |
| Python API | `Template.create("method/hyper_rag")` | `TemplateFactory.create_method("hyper_rag", ...)` |
| Direct class | `Light_RAG(llm_client=llm, embedder=emb)` | Bypasses registry; same runtime behavior |

## List available methods

<Steps>
<Step title="CLI">

```bash
he list method
he list method -q rag    # filter by name or description
```

Displays method ID (`method/<name>`), output autotype (`graph` or `hypergraph`), and description.

</Step>
<Step title="Python">

```python
from hyperextract import Template
from hyperextract.methods import list_methods, list_method_cfgs

# Registry view: class, autotype, description
for name, info in list_methods().items():
    print(name, info["type"], info["description"])

# TemplateCfg view (keys are "method/<name>")
for path, cfg in list_method_cfgs().items():
    print(path, cfg.type, cfg.description)

# Methods also appear in Template.list()
all_templates = Template.list(include_methods=True)
```

</Step>
</Steps>

## Registered methods

| Method ID | Class | Autotype | Category |
|---|---|---|---|
| `method/graph_rag` | `Graph_RAG` | `graph` | RAG — community detection |
| `method/light_rag` | `Light_RAG` | `graph` | RAG — lightweight binary edges |
| `method/hyper_rag` | `Hyper_RAG` | `hypergraph` | RAG — n-ary hyperedges |
| `method/hypergraph_rag` | `HyperGraph_RAG` | `hypergraph` | RAG — advanced hypergraph |
| `method/cog_rag` | `Cog_RAG` | `hypergraph` | RAG — cognitive retrieval |
| `method/itext2kg` | `iText2KG` | `graph` | Typical — triple extraction |
| `method/itext2kg_star` | `iText2KG_Star` | `graph` | Typical — enhanced triples |
| `method/kg_gen` | `KG_Gen` | `graph` | Typical — configurable KG generation |
| `method/atom` | `Atom` | `graph` | Typical — temporal KG with evidence |

RAG methods target larger documents with retrieval-augmented extraction. Typical methods run direct LLM extraction pipelines without a separate retrieval stage.

## CLI: parse with a method

<ParamField body="--method / -m" type="string">
Method name without the `method/` prefix (e.g., `light_rag`, `atom`). When set, overrides `--template` and resolves to `method/<name>`.
</ParamField>

<ParamField body="--lang / -l" type="string">
Ignored for method templates. CLI forces `lang = "en"` and prints a note if `--lang` is supplied.
</ParamField>

<ParamField body="--output / -o" type="string" required>
Output directory for the Knowledge Abstract (`data.json`, `metadata.json`, optional `index/`).
</ParamField>

<ParamField body="--no-index" type="boolean">
Skip vector index build after extraction. Rebuild later with `he build-index`.
</ParamField>

<RequestExample>

```bash
# Prerequisites: he config init (LLM + embedder configured)
he parse examples/en/tesla.md -m light_rag -o ./ka-light-rag/
```

</RequestExample>

<RequestExample>

```bash
# Hypergraph extraction
he parse examples/en/tesla.md -m hyper_rag -o ./ka-hyper-rag/

# Skip indexing during parse
he parse examples/en/tesla.md -m atom -o ./ka-atom/ --no-index
```

</RequestExample>

After a successful parse, the CLI suggests follow-on commands: `he show`, `he search`, `he talk`, and `he feed` for incremental updates. Method-created Knowledge Abstracts store `metadata.template` as `method/<name>` and `metadata.lang` as `en`.

<Warning>
The CLI does not expose method constructor kwargs (e.g., `observation_time`). Pass those through the Python API when temporal anchoring or tuning parameters matter.
</Warning>

## Python: Template.create

`Template.create` is the unified entry point for both domain YAML templates and method templates. For methods, omit `language` — it is ignored and always set to `"en"`.

<CodeGroup>

```python CLI-equivalent workflow
from hyperextract import Template

ka = Template.create("method/light_rag")
ka.feed_text(open("examples/en/tesla.md").read())
ka.build_index()
ka.dump("./ka-light-rag/")
```

```python With explicit clients
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from hyperextract import Template

llm = ChatOpenAI(model="gpt-4o-mini")
emb = OpenAIEmbeddings(model="text-embedding-3-small")

ka = Template.create(
    "method/graph_rag",
    llm_client=llm,
    embedder=emb,
)
ka.feed_text(text)
```

```python Non-destructive preview
ka = Template.create("method/light_rag")
preview = ka.parse(text)   # returns new instance; current ka unchanged
ka.feed_text(text)         # merges into current instance
```

</CodeGroup>

`Template.get("method/light_rag")` returns a `MethodCfg` with `name`, `type`, and `description`. `Template.list(include_methods=True)` merges gallery templates with all registered methods.

## Python: direct method classes

Import method classes when you need full control over construction, post-processing hooks, or kwargs the CLI cannot pass.

<CodeGroup>

```python Light_RAG
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from hyperextract.methods.rag import Light_RAG

llm = ChatOpenAI(model="gpt-4o-mini")
emb = OpenAIEmbeddings(model="text-embedding-3-small")

rag = Light_RAG(llm_client=llm, embedder=emb)
rag.feed_text(text)
print(len(rag.nodes), len(rag.edges))
rag.chat("Who founded the company?")
rag.show()
```

```python Hyper_RAG
from hyperextract.methods.rag import Hyper_RAG

rag = Hyper_RAG(llm_client=llm, embedder=emb)
rag.feed_text(text)
print(len(rag.nodes), len(rag.hyper_edges))
```

```python Atom with temporal anchor
from hyperextract.methods.typical import Atom

atom = Atom(
    llm_client=llm,
    embedder=emb,
    observation_time="2024-06-15",
)
atom.feed_text(text)
atom.match_nodes_and_update_edges(threshold=0.85)
atom.dump("./ka-atom/")
```

</CodeGroup>

| Import path | Exported classes |
|---|---|
| `hyperextract.methods.rag` | `Light_RAG`, `Graph_RAG`, `Hyper_RAG`, `HyperGraph_RAG`, `Cog_RAG` |
| `hyperextract.methods.typical` | `Atom`, `iText2KG`, `iText2KG_Star`, `KG_Gen` |

Direct instantiation skips registry metadata stamping. Set metadata manually before `dump` if you rely on `he show` / `he search` reloading via `metadata.template`:

```python
ka.metadata["template"] = "method/atom"
ka.metadata["lang"] = "en"
ka.metadata["type"] = "graph"
```

## Constructor kwargs

`TemplateFactory.create_method` and `Template.create("method/...")` forward `**kwargs` to the method constructor.

### Shared parameters

Most methods accept:

<ParamField body="chunk_size" type="int" default="2048">
Characters per text chunk during extraction and indexing.
</ParamField>

<ParamField body="chunk_overlap" type="int" default="256">
Overlap between consecutive chunks.
</ParamField>

<ParamField body="max_workers" type="int" default="10">
Maximum concurrent LLM calls in batch extraction.
</ParamField>

<ParamField body="verbose" type="bool" default="false">
Enable detailed execution logging.
</ParamField>

```python
ka = Template.create(
    "method/light_rag",
    chunk_size=4096,
    chunk_overlap=512,
    max_workers=5,
    verbose=True,
)
```

### Method-specific parameters

| Method | Parameter | Type | Default | Purpose |
|---|---|---|---|---|
| `atom` | `observation_time` | `str \| None` | current date | Anchor for resolving relative temporal expressions (`today`, `last week`, etc.) in factoid and edge prompts |
| `atom` | `facts_per_chunk` | `int` | `10` | Max atomic facts batched per edge-extraction call |
| `itext2kg_star` | `observation_date` | `str \| None` | current datetime | Populates `edge.properties.observation_date` post-extraction |

<RequestExample>

```python
# Atom: anchor relative dates in news text
ka = Template.create(
    "method/atom",
    observation_time="2024-06-15",
    facts_per_chunk=15,
)
ka.feed_text(
    "John Doe is no longer the CEO of GreenIT since a few months ago."
)
```

</RequestExample>

`Atom` resolves `observation_time` into absolute `t_start` / `t_end` on edges and sets `t_obs` to the observation date. When `observation_time` is omitted, `Atom` defaults to `datetime.now().strftime("%Y-%m-%d")`.

`Atom` also exposes `match_nodes_and_update_edges(threshold=0.8)` for semantic node deduplication via `SemHash` embeddings — call this after `feed_text` when alias merging is needed.

## Knowledge Abstract lifecycle

Method instances inherit `BaseAutoType` lifecycle methods. A typical end-to-end Python workflow:

<Steps>
<Step title="Configure providers">

Run `he config init` or call `create_client()` so `Template.create` can read default LLM and embedder clients from `~/.he/config.toml`.

</Step>
<Step title="Extract">

```python
ka = Template.create("method/light_rag")
ka.feed_text(document_text)
```

`feed_text` merges extracted structure into the current instance. `parse(text)` returns a new instance without modifying the caller.

</Step>
<Step title="Persist">

```python
ka.build_index()
ka.dump("./my-ka/")
```

Produces `data.json`, `metadata.json`, and `index/` (when indexed).

</Step>
<Step title="Query and visualize">

```python
ka.search("wireless power", top_k=3)
ka.chat("What did Tesla invent?")
ka.show()  # OntoSight visualization
```

Equivalent CLI commands against the dumped directory: `he search`, `he talk`, `he show`.

</Step>
<Step title="Evolve">

```python
ka.feed_text(additional_text)
ka.build_index()
ka.dump("./my-ka/")
```

Or via CLI: `he feed ./my-ka/ new_doc.md` followed by `he build-index ./my-ka/`.

</Step>
</Steps>

## Choosing a method

| Goal | Start with | Output shape |
|---|---|---|
| General-purpose graph, fast | `light_rag` | `AutoGraph` — `nodes`, `edges` |
| Very large documents | `graph_rag` | `AutoGraph` with community-oriented extraction |
| Multi-entity relationships | `hyper_rag` | `AutoHypergraph` — `nodes`, `hyper_edges` |
| High-quality triples | `itext2kg` / `itext2kg_star` | `AutoGraph` |
| Temporal facts with evidence | `atom` | `AutoGraph` with `t_start`, `t_end`, `atomic_facts` on edges |
| Flexible prototyping | `kg_gen` | `AutoGraph` |

<Info>
Methods produce algorithm-driven graphs with fixed schemas baked into each class (e.g., `Light_RAG` node `name`/`type`/`description`, edge `source`/`target`/`keywords`/`strength`). Domain YAML templates under `general/`, `finance/`, etc. let you customize field schemas and multilingual prompts — see [Templates vs methods](/templates-vs-methods).
</Info>

## Troubleshooting

| Symptom | Cause | Fix |
|---|---|---|
| `Unknown method: <name>` | Name not in registry | Run `he list method`; use exact registry key (e.g., `light_rag`, not `Light_RAG`) |
| `--lang is required` error | Used `-t` with a knowledge template but omitted `-m` | Add `--lang en` or `--lang zh`, or switch to `-m <method>` |
| `--lang` ignored message | Expected multilingual prompts on a method | Methods are English-only; use a domain template with `--lang` instead |
| Relative dates resolve incorrectly | `observation_time` not set on `atom` | Pass `observation_time="YYYY-MM-DD"` via `Template.create` or `Atom(...)` |
| `he search` / `he talk` fails | Index not built | Omit `--no-index` during parse, or run `he build-index <ka_path>` |
| Empty output directory error | Target dir exists and is non-empty | Pass `--force` to overwrite |

Enable debug logging with `HYPER_EXTRACT_LOG_LEVEL=DEBUG` when tracing extraction phases (Atom logs atomic-fact and edge-extraction stages).

## Related pages

<CardGroup>
<Card title="Templates vs methods" href="/templates-vs-methods">
When to pick a domain YAML template over an algorithm method, and language requirements.
</Card>
<Card title="Extraction methods reference" href="/extraction-methods-reference">
Per-method autotype output, registry API, and full constructor signatures.
</Card>
<Card title="Method demos" href="/method-demos">
Runnable scripts under `examples/en/methods/` for each engine.
</Card>
<Card title="Configure providers" href="/configure-providers">
Set up LLM and embedder clients before parsing or calling `Template.create`.
</Card>
<Card title="CLI reference" href="/cli-reference">
Complete `he parse`, `he list method`, and related flag documentation.
</Card>
</CardGroup>
