# Provider system

> BYOC/BYOK provider model: `openai`, `bailian`, and `vllm` presets; `provider:model@url` shorthand; `CompatibleEmbeddings` for non-OpenAI endpoints; and verified model compatibility requirements (`json_schema` / function calling).

- Repository: yifanfeng97/Hyper-Extract
- GitHub: https://github.com/yifanfeng97/Hyper-Extract
- Human docs: https://www.grok-wiki.com/public/docs/yifanfeng97-hyper-extract-7891c7254cdf
- Complete Markdown: https://www.grok-wiki.com/public/docs/yifanfeng97-hyper-extract-7891c7254cdf/llms-full.txt

## Source Files

- `hyperextract/utils/client.py`
- `hyperextract/cli/config.py`
- `hyperextract/cli/commands/config.py`
- `README.md`
- `.env.example`

---

---
title: Provider system
description: BYOC/BYOK provider model with openai, bailian, and vllm presets; provider:model@url shorthand; CompatibleEmbeddings for non-OpenAI endpoints; and verified model compatibility requirements.
---

Hyper-Extract is **bring-your-own-cloud (BYOC)** and **bring-your-own-key (BYOK)** by design. You choose where LLM and embedding traffic goes — OpenAI, Alibaba Bailian, a local vLLM stack, or any OpenAI-compatible endpoint — and Hyper-Extract wires the same extraction pipeline to that backend. The provider layer does not lock you to a single vendor; presets and shorthand strings are convenience, not a hard dependency.

Every extraction path ultimately needs two LangChain clients: a **chat model** for structured extraction and an **embedder** for semantic search. Hyper-Extract centralizes both in `hyperextract.utils.client`.

## Architecture overview

```mermaid
flowchart LR
  subgraph consumers [Consumers]
    CLI["he parse / search / talk"]
    Template["Template.create()"]
    AutoType["AutoGraph, AutoList, ..."]
  end

  subgraph factory [Client factory]
    CC["create_client()"]
    GL["get_client()"]
    CL["create_llm()"]
    CE["create_embedder()"]
  end

  subgraph backends [OpenAI-compatible backends]
    OAI["openai preset"]
    BL["bailian preset"]
    VLLM["vllm + custom URL"]
  end

  CLI --> GL
  Template --> GL
  AutoType --> CC
  CC --> CL
  CC --> CE
  GL --> CL
  GL --> CE
  CL --> OAI
  CL --> BL
  CL --> VLLM
  CE --> OAI
  CE --> BL
  CE --> VLLM
```

| Layer | Role |
|-------|------|
| **Presets** | Named bundles of `base_url` and default models for `openai`, `bailian`, and `vllm` |
| **Shorthand parser** | Turns `provider:model@url` strings into resolved config dicts |
| **LLM client** | `ChatOpenAI` pointed at the resolved endpoint |
| **Embedder client** | `OpenAIEmbeddings` for official OpenAI, or `CompatibleEmbeddings` for everything else |
| **Config file** | `~/.he/config.toml` read by `get_client()` for CLI and `Template.create()` defaults |

## Provider presets

Three first-class presets ship in `PROVIDER_PRESETS`. Each defines a default LLM model, default embedder model, and (when applicable) a base URL.

| Preset | Base URL | Default LLM | Default embedder |
|--------|----------|-------------|------------------|
| `openai` | `https://api.openai.com/v1` | `gpt-4o-mini` | `text-embedding-3-small` |
| `bailian` | `https://dashscope.aliyuncs.com/compatible-mode/v1` | `qwen3.6-plus` | `text-embedding-v4` |
| `vllm` | *(none — you must supply)* | *(none)* | *(none)* |

The `vllm` preset intentionally has no defaults. Local deployments vary by host, port, and served model name, so you always specify `provider:model@url` explicitly or set `base_url` in config.

<AccordionGroup>
<Accordion title="Why presets instead of hard-coded providers">
Presets are URL and model shortcuts, not proprietary connectors. Any endpoint that speaks the OpenAI chat-completions and embeddings APIs can work when you pass a custom `base_url`. The `custom` option in `he config init` follows the same code path as `openai` or `bailian` — only the resolved URL and model names change.
</Accordion>
</AccordionGroup>

## String shorthand: `provider:model@url`

`create_client()`, `create_llm()`, and `create_embedder()` accept a compact string syntax parsed by `_parse_client_spec()`:

| Format | Example | Resolved behavior |
|--------|---------|-------------------|
| `provider` | `"bailian"` | Preset URL + default LLM/embedder models |
| `provider:model` | `"bailian:qwen-plus"` | Preset URL + overridden model |
| `provider:model@url` | `"vllm:Qwen3.5-9B@http://localhost:8000/v1"` | Full manual specification |

Dict specs are also supported for fine-grained control (temperature, extra kwargs):

```python
create_llm({"provider": "bailian", "model": "qwen-plus", "temperature": 0.5}, api_key="sk-xxx")
```

## `create_client()` patterns

`create_client()` exposes three common deployment shapes:

<Tabs>
<Tab title="Pattern A — single cloud provider">

One preset string configures both LLM and embedder. Simplest path for OpenAI or Bailian.

```python
from hyperextract import create_client

llm, emb = create_client("bailian", api_key="sk-xxx")
# → qwen3.6-plus + text-embedding-v4 at Bailian compatible-mode URL
```

</Tab>
<Tab title="Pattern B — local vLLM (split services)">

LLM and embedder often run on different ports locally. Pass separate specs:

```python
llm, emb = create_client(
    llm="vllm:Qwen3.5-9B@http://localhost:8000/v1",
    embedder="vllm:bge-m3@http://localhost:8001/v1",
    api_key="dummy",
)
```

</Tab>
<Tab title="Pattern C — mixed cloud + local">

Cloud LLM with on-prem embeddings (or the reverse):

```python
llm, emb = create_client(
    llm="bailian:qwen-plus",
    embedder="vllm:bge-m3@http://localhost:8001/v1",
    api_key="sk-xxx",
)
```

</Tab>
</Tabs>

Lower-level factories are available when you only need one side:

- `create_llm(spec, api_key=..., **kwargs)` → `ChatOpenAI`
- `create_embedder(spec, api_key=..., **kwargs)` → `OpenAIEmbeddings` or `CompatibleEmbeddings`
- `get_client(config_path=None)` → reads `~/.he/config.toml` (used by CLI and `Template.create()`)

## `CompatibleEmbeddings` for non-OpenAI endpoints

LangChain's `OpenAIEmbeddings` can send **pre-tokenized integer lists** to the API. Official OpenAI accepts that format; most OpenAI-compatible providers (Bailian, Ollama, LiteLLM, local vLLM) do not.

Hyper-Extract routes embedders through `CompatibleEmbeddings` whenever `base_url` is set and is not exactly `https://api.openai.com/v1`:

| Condition | Embedder class |
|-----------|----------------|
| Official OpenAI URL (or no custom URL) | `langchain_openai.OpenAIEmbeddings` |
| Any other `base_url` | `CompatibleEmbeddings` |

`CompatibleEmbeddings` always sends **string inputs**, uses tiktoken for chunking (falling back to `cl100k_base` for unknown model names), and batches requests conservatively (`max_batch_size=10` by default) because providers like Bailian cap batch size. Long texts are split at the token limit and averaged across chunks.

<Warning>
Semantic search quality depends on the embedding model you point at. Hyper-Extract does not translate between embedding spaces — if you change embedder model or provider after building an index, rebuild with `he build-index`.
</Warning>

## Structured output requirement

Hyper-Extract extraction depends on the LLM returning **schema-constrained JSON**. AutoTypes chain prompts through LangChain's `with_structured_output()`:

```python
self.data_extractor = (
    self.prompt_template
    | self.llm_client.with_structured_output(self._data_schema)
)
```

That requires backend support for **`json_schema`** or **function calling**. Models that only support loose `json_object` mode will fail extraction or return unusable output.

### Verified LLM compatibility

| Platform | Model | `json_schema` | Status | Notes |
|----------|-------|:-------------:|:------:|-------|
| **OpenAI** | gpt-4o / gpt-4o-mini / gpt-5 | ✅ | ✅ Verified | Recommended cloud default |
| **Alibaba Bailian** | qwen-plus / qwen-turbo / qwen3.6-plus / deepseek-r1 | ✅ | ✅ Verified | Works out of the box |
| **Alibaba Bailian** | qwen-max / deepseek-v3 | ❌ | ❌ Incompatible | Only `json_object`; switch to qwen-plus, qwen-turbo, or deepseek-r1 |
| **Local vLLM** | Qwen3.5-9B (GPTQ-Marlin 4bit) | ✅ | ✅ Verified | AutoList / AutoGraph tested |

<AccordionGroup>
<Accordion title="Bailian troubleshooting symptoms">
If you see `messages must contain the word 'json'` or non-JSON model output, the model likely lacks `json_schema` support. Switch to qwen-plus, qwen-turbo, or deepseek-r1.
</Accordion>
<Accordion title="Thinking models on local vLLM">
Thinking models (e.g. Qwen3.5 with thinking enabled) emit `</think>` blocks that conflict with constrained JSON decoding. Disable thinking when serving locally:

```bash
--default-chat-template-kwargs '{"enable_thinking": false}'
```

DeepSeek-R1 via Bailian is verified because Bailian strips thinking tags server-side.
</Accordion>
</AccordionGroup>

Some extraction methods explicitly request function calling — for example, GraphRAG community reports use `method="function_calling"`. Prefer models and vLLM builds with structured-output support enabled.

### Verified embedding compatibility

| Platform | Model | Dimensions | Status |
|----------|-------|------------|--------|
| **OpenAI** | text-embedding-3-small | 1536 | ✅ Verified |
| **Alibaba Bailian** | text-embedding-v4 | 1024 | ✅ Verified |
| **Local vLLM** | BAAI/bge-m3 | — | ✅ Verified |

Any OpenAI-compatible embeddings endpoint can work when reached through `CompatibleEmbeddings`.

## CLI and config file integration

The CLI stores provider settings in `~/.he/config.toml` under `[llm]` and `[embedder]`. Each section holds `provider`, `model`, `api_key`, and `base_url`.

<Steps>
<Step title="Initialize or set a preset">

<CodeGroup>
```bash CLI quick init (OpenAI)
he config init -p openai -k sk-xxx
```

```bash CLI quick init (Bailian)
he config init -p bailian -k sk-xxx
```

```bash Interactive (vLLM)
he config init
# Select local vLLM; enter model names and base URLs
```
</CodeGroup>

</Step>
<Step title="Configure services independently">

Mixed deployments use per-service commands:

```bash
he config llm -p bailian -k sk-xxx
he config embedder -p vllm -m bge-m3 -u http://localhost:8001/v1 -k dummy
```

</Step>
<Step title="Verify before extraction">

`he parse`, `he search`, and `he talk` call `validate_config()` first. Validation rules:

- **Cloud providers** (`openai`, `bailian`, custom): `api_key` required (from config or `OPENAI_API_KEY`)
- **vLLM**: `api_key` may be empty or `dummy`, but **`base_url` is mandatory** for both LLM and embedder

Environment variables override empty config fields:

<ParamField body="OPENAI_API_KEY" type="string">
API key fallback when `api_key` is not set in `config.toml`.
</ParamField>

<ParamField body="OPENAI_BASE_URL" type="string">
Base URL fallback when `base_url` is not set in `config.toml`.
</ParamField>

</Step>
</Steps>

After configuration, CLI commands and `Template.create()` automatically call `get_client()` — no inline provider code required.

## Local vLLM deployment sketch

Typical verified layout: LLM on port 8000, embeddings on port 8001.

<CodeGroup>
```bash Start LLM service
vllm serve /path/to/qwen3.5-9b-gptq-marlin \
  --served-model-name Qwen/Qwen3.5-9B \
  --trust-remote-code \
  --quantization gptq_marlin \
  --dtype bfloat16 \
  --max-model-len 8192 \
  --gpu-memory-utilization 0.90 \
  --default-chat-template-kwargs '{"enable_thinking": false}' \
  --port 8000 \
  --api-key dummy
```

```bash Start embedding service
vllm serve BAAI/bge-m3 \
  --task embed \
  --dtype float16 \
  --max-model-len 8192 \
  --port 8001
```
</CodeGroup>

<RequestExample>
```python Python client for local vLLM
from hyperextract import create_client, AutoGraph

llm, emb = create_client(
    llm="vllm:Qwen3.5-9B@http://localhost:8000/v1",
    embedder="vllm:bge-m3@http://localhost:8001/v1",
    api_key="dummy",
)

graph = AutoGraph(
    instruction="Extract people and their relationships",
    llm_client=llm,
    embedder=emb,
    node_key_extractor=lambda n: n.name,
    edge_key_extractor=lambda e: (e.source, e.target, e.type),
    nodes_in_edge_extractor=lambda e: (e.source, e.target),
)
graph.parse("Zhang San founded ByteDance. Li Si serves as CEO.")
```
</RequestExample>

Prefer **GPTQ-Marlin** over AWQ for Qwen3.5-9B on vLLM 0.21.x due to known AWQ compatibility issues.

## Failure modes

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `vLLM provider requires base_url` | `vllm` preset without URL | Set `--base-url` or use `provider:model@url` shorthand |
| `LLM API key is not configured` | Missing key for cloud provider | `he config llm -k ...` or export `OPENAI_API_KEY` |
| Empty or partial extraction | Model lacks `json_schema` | Switch to a verified model (see table above) |
| Embedding batch errors on Bailian | Batch too large | `CompatibleEmbeddings` defaults to 10; reduce if needed |
| Search returns garbage after provider change | Embedding space mismatch | Rebuild index with `he build-index` |

Enable debug logging with `HYPER_EXTRACT_LOG_LEVEL=DEBUG` when diagnosing client or schema failures.

## API surface summary

| Function | Input | Output |
|----------|-------|--------|
| `create_client(provider=...)` or `create_client("bailian", ...)` | Shorthand or split `llm`/`embedder` specs | `(ChatOpenAI, Embeddings)` tuple |
| `create_llm(spec)` | Shorthand or dict | `ChatOpenAI` |
| `create_embedder(spec)` | Shorthand or dict | `OpenAIEmbeddings` or `CompatibleEmbeddings` |
| `get_client(path?)` | Optional config path | Reads TOML, returns client tuple |

Runnable provider demos live under `examples/providers/` (`openai_demo.py`, `bailian_demo.py`, `vllm_demo.py`).

## Related pages

<CardGroup cols={2}>
<Card title="Configure providers" href="/configure-providers">
Step-by-step setup for `he config init`, per-service commands, environment variables, and programmatic `create_client()` for mixed deployments.
</Card>
<Card title="Configuration reference" href="/configuration-reference">
Full `~/.he/config.toml` schema, preset defaults, env var precedence, and validation rules.
</Card>
<Card title="Python API reference" href="/python-api-reference">
`create_client`, `create_llm`, `create_embedder`, `get_client`, and AutoType lifecycle methods.
</Card>
<Card title="Troubleshooting" href="/troubleshooting">
Missing API keys, vLLM `base_url` requirements, schema failures, and debug logging.
</Card>
</CardGroup>