fusionHarness

Open-source Mixture-of-Agents compound-model server — a self-hostable alternative to OpenRouter's Fusion API.

Fan a prompt out to a panel of LLMs in parallel, let a judge extract the structure of their answers (consensus, contradictions, partial coverage, unique insights), then a synthesizer writes one final answer grounded in that analysis. The result beats any single panelist — and a panel of budget models can rival a frontier model at a fraction of the cost.

It speaks the OpenAI API, so it drops into any existing OpenAI client: point base_url at fusionHarness and use the model slug fusion.

          ┌─────────────┐
prompt ─► │   fan-out   │ ─► model A ─┐
          │   (panel)   │ ─► model B ─┤  (parallel, each tool-enabled)
          └─────────────┘ ─► model C ─┘
                                  │
                                  ▼
                            ┌───────────┐     ┌──────────────┐
                            │   judge   │ ──► │ synthesizer  │ ─► final answer
                            │ structure │     │  grounded    │    + cost / latency
                            └───────────┘     └──────────────┘

Why it works (OpenRouter's own ablation): ~¾ of the lift comes from synthesis, ~¼ from diversity.

Quickstart

# 1. Install
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

# 2. Configure — one OpenRouter key reaches every model in the catalog
cp .env.example .env       # then put your key in FUSION_API_KEY

# 3. Run the OpenAI-compatible server (omit --config to use the built-in budget panel)
fusion serve --config configs/budget.yaml

# 4. Call it like any OpenAI endpoint
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"fusion","messages":[{"role":"user","content":"Compare CRDTs vs OT for collaborative editing."}]}'

From the OpenAI Python SDK (pip install openai):

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
resp = client.chat.completions.create(
    model="fusion",
    messages=[{"role": "user", "content": "..."}],
)
print(resp.choices[0].message.content)

Or straight from the terminal, no server:

export FUSION_API_KEY=sk-or-...
fusion ask "What are the trade-offs between gRPC and REST?" --config configs/budget.yaml

Use it from a harness

Because fusion speaks OpenAI, it drops into any agent harness — or use our own.

# Our own TUI harness — streaming, multi-turn chat (/reset /stats /help /exit)
fusion chat --config configs/budget.yaml

# Pi (pi.dev): install the package, register the provider, point Pi at it
pi install ./integrations/pi
bash integrations/pi/install.sh
pi --model fusion

Adapters for Pi, Claude Code, aider, Continue, LangChain, and the OpenAI SDK are in integrations/. Verify the whole stack end-to-end with no API key:

scripts/smoke.sh --fake     # boots a key-free fake backend + the real server

Agentic coding (`fusion code`)

fusionHarness is also its own agentic coding harness — like Claude Code, but the brain can convene the fusion panel. The agent reads, writes, and edits files, searches, and runs bash in a tool-use loop confined to a project root, and can call council to escalate a hard sub-question to the full panel.

fusion code "add a /version endpoint and a test for it" --root .
fusion code                       # interactive agent session
fusion code "refactor X" --plan   # write a plan first, then act
fusion code "delete dead code" --approve   # confirm each file/bash action

Each step is printed as it happens; the agent calls finish when the task is done and verified. Tools are confined to --root (default: cwd). For a hard sub-problem the agent can call the council tool, which convenes the full fusion panel and returns a synthesized answer. --approve gates every mutating tool (write/edit/bash); --plan makes it write a numbered plan before acting.

⚠️ Security: the agent runs bash and edits files. Confinement blocks path escapes, not arbitrary command effects — run it on projects you trust, or in a container.

Configuration

A config picks the panel, judge, and synthesizer. Two presets ship in configs/:

Preset	Panel	Use it for
`configs/budget.yaml`	Gemini 3 Flash · Kimi K2.6 · DeepSeek V4 Pro	frontier-ish quality at ~half the price
`configs/frontier.yaml`	Opus 4.8 · GPT-5.5 · Gemini 3.1 Pro	beyond-frontier quality

Custom panel:

# my-panel.yaml
name: fusion
panel:
  - anthropic/claude-opus-4.8
  - openai/gpt-5.5
  - model: deepseek/deepseek-v4-pro    # long form allows per-model overrides
    temperature: 0.3
    tools: [web_search]
judge: openai/gpt-5.5
synthesizer: anthropic/claude-opus-4.8
temperature: 0.7
max_tokens: 4096
tools_enabled: false

Model slugs follow OpenRouter conventions (vendor/model). Point at a different backend with FUSION_BASE_URL (OpenAI, a local vLLM/Ollama server, Groq, Together — anything OpenAI-compatible). API keys come from the environment only (FUSION_API_KEY, OPENROUTER_API_KEY, or OPENAI_API_KEY), never from YAML.

Quality & reliability knobs

All optional config fields (with defaults):

Key	Default	What it does
`refine`	`false`	Run one extra self-critique pass over the synthesized answer (quality ↑, cost ↑).
`layers`	`1`	Multi-layer MoA — with `layers>1`, proposers see the previous layer's drafts and improve before the final synthesis.
`samples`	`1`	Self-consistency — sample each proposer K times so the judge/synthesizer see more drafts.
`diversity`	`true`	Spread panelist temperatures so drafts differ (≈¼ of the lift).
`diversity_jitter`	`0.3`	How wide to spread temperatures (the MoA diversity↔quality trade-off — keep it modest).
`max_retries`	`2`	Retry transient upstream failures (429/5xx/timeout) so a flaky panelist doesn't shrink the panel.
`retry_backoff`	`0.5`	Base seconds for exponential retry backoff.
`max_concurrency`	`0`	Cap concurrent panelist calls (0 = unlimited).

If the judge fails, synthesis still runs from the raw responses; if the synthesizer fails, the best panelist's answer is returned. Anything that degraded is reported in the response's fusion.degraded list — never silently.

Tools (web search + bash)

Panelists can call tools while drafting — useful for deep-research tasks. Tools are off by default. Enable globally and per-model:

tools_enabled: true
panel:
  - model: deepseek/deepseek-v4-pro
    tools: [web_search, bash]

web_search — keyless DuckDuckGo Instant Answer by default; swap in a Tavily/Brave/SerpAPI backend via default_registry(search_fn=...).
bash — runs in a sandboxed shell (timeout, stripped env, output truncation).

⚠️ Security: bash executes commands the model writes. The sandbox is not a container. Run the server in a disposable VM/container before enabling bash with untrusted input. It is opt-in because it is dangerous.

Cost & latency tracking

Every response carries the real numbers. Non-streaming responses include a fusion block plus headers:

{
  "choices": [ ... ],
  "usage": { "prompt_tokens": 1234, "completion_tokens": 567, "total_tokens": 1801 },
  "fusion": {
    "config": "fusion",
    "panel_models": ["google/gemini-3-flash", "moonshotai/kimi-k2.6", "deepseek/deepseek-v4-pro"],
    "panel_succeeded": 3,
    "cost_usd": 0.0123,
    "cost_breakdown": [ { "model": "...", "role": "panel", "cost_usd": 0.004 }, ... ],
    "timing_s": { "panel": 2.1, "judge": 0.8, "synth": 3.4, "total": 6.3 }
  }
}

Headers: x-fusion-cost-usd, x-fusion-latency-s. When the backend reports an authoritative per-call cost, that value is used instead of the local price table (fusion/pricing.py).

Evaluation harness

Reproduce the panel-vs-solo comparison on DRACO-style weighted tasks (negative criteria penalize wrong claims, so you can't bluff a high score):

# deterministic stubs — no API key, proves the pipeline
fusion eval --dry-run

# A/B: solo vs panel vs panel+refine, with deltas vs the best solo
fusion eval --ab --dry-run

# live: grade with an LLM judge (needs an API key)
fusion eval --config configs/budget.yaml --tasks eval/tasks.sample.yaml

Panel vs solo — scored on 3 task(s)

panel+refine                     100.0%  ████████████████████  ★ (+28.6 vs best solo)
panel                             76.2%  ███████████████  ★ (+4.8 vs best solo)
google/gemini-3-flash (solo)      71.4%  ██████████████

--runs N repeats each task K times (self-consistency). Add your own tasks in eval/tasks.sample.yaml (id, prompt, weighted criteria). The dry-run numbers are from deterministic stubs — real lift needs a key; run the live command above.

API

Method	Path	Description
`POST`	`/v1/chat/completions`	OpenAI-compatible; `stream:true` supported. Model slug `fusion`.
`GET`	`/v1/models`	Lists `fusion` plus the configured panel models.
`GET`	`/health`	Liveness + active config + panel.

Per-request overrides. Customize the panel with per-request overrides (like OpenRouter Fusion's "pass your own participant models and synthesizer") via a fusion block in the body. Only safe model-selection/flag keys are honored — the backend URL and keys can never be set from the request:

{
  "model": "fusion",
  "messages": [{"role": "user", "content": "..."}],
  "fusion": {
    "panel": ["anthropic/claude-opus-4.8", "openai/gpt-5.5"],
    "synthesizer": "anthropic/claude-opus-4.8",
    "refine": true,
    "layers": 2
  }
}

Project layout

fusion/        engine + server + harnesses
               ├─ providers · panel · judge · synthesize · fusion   (MoA engine)
               ├─ tools · server · streaming · schemas · pricing · config
               ├─ tui.py        (fusion chat — TUI harness)
               └─ agent.py · agent_tools.py · cli.py   (fusion code — agent harness)
eval/          DRACO-style evaluation harness (scorer, harness, tasks.sample.yaml)
configs/       panel presets (budget.yaml, frontier.yaml)
integrations/  harness adapters — Pi package, OpenAI-SDK example, adapter guide
scripts/       smoke.sh, verify_install.sh, verify_all.sh
docs/          architecture.md, parity.md
tests/         pytest suite — unit (providers mocked) + real-HTTP e2e (live servers), no API key

Development

pip install -e ".[dev]"
pytest -q          # full suite, no network required

See docs/architecture.md for the design and docs/parity.md for the parity matrix & roadmap.

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fusionHarness

Quickstart

Use it from a harness

Agentic coding (`fusion code`)

Configuration

Quality & reliability knobs

Tools (web search + bash)

Cost & latency tracking

Evaluation harness

API

Project layout

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
configs		configs
docs		docs
eval		eval
fusion		fusion
integrations		integrations
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

fusionHarness

Quickstart

Use it from a harness

Agentic coding (fusion code)

Configuration

Quality & reliability knobs

Tools (web search + bash)

Cost & latency tracking

Evaluation harness

API

Project layout

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Agentic coding (`fusion code`)

Packages