Open-source Mixture-of-Agents compound-model server — a self-hostable alternative to OpenRouter's Fusion API.
Fan a prompt out to a panel of LLMs in parallel, let a judge extract the structure of their answers (consensus, contradictions, partial coverage, unique insights), then a synthesizer writes one final answer grounded in that analysis. The result beats any single panelist — and a panel of budget models can rival a frontier model at a fraction of the cost.
It speaks the OpenAI API, so it drops into any existing OpenAI client: point base_url at fusionHarness and use the model slug fusion.
┌─────────────┐
prompt ─► │ fan-out │ ─► model A ─┐
│ (panel) │ ─► model B ─┤ (parallel, each tool-enabled)
└─────────────┘ ─► model C ─┘
│
▼
┌───────────┐ ┌──────────────┐
│ judge │ ──► │ synthesizer │ ─► final answer
│ structure │ │ grounded │ + cost / latency
└───────────┘ └──────────────┘
Why it works (OpenRouter's own ablation): ~¾ of the lift comes from synthesis, ~¼ from diversity.
# 1. Install
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
# 2. Configure — one OpenRouter key reaches every model in the catalog
cp .env.example .env # then put your key in FUSION_API_KEY
# 3. Run the OpenAI-compatible server (omit --config to use the built-in budget panel)
fusion serve --config configs/budget.yaml
# 4. Call it like any OpenAI endpoint
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"fusion","messages":[{"role":"user","content":"Compare CRDTs vs OT for collaborative editing."}]}'From the OpenAI Python SDK (pip install openai):
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
resp = client.chat.completions.create(
model="fusion",
messages=[{"role": "user", "content": "..."}],
)
print(resp.choices[0].message.content)Or straight from the terminal, no server:
export FUSION_API_KEY=sk-or-...
fusion ask "What are the trade-offs between gRPC and REST?" --config configs/budget.yamlBecause fusion speaks OpenAI, it drops into any agent harness — or use our own.
# Our own TUI harness — streaming, multi-turn chat (/reset /stats /help /exit)
fusion chat --config configs/budget.yaml
# Pi (pi.dev): install the package, register the provider, point Pi at it
pi install ./integrations/pi
bash integrations/pi/install.sh
pi --model fusionAdapters for Pi, Claude Code, aider, Continue, LangChain, and the OpenAI SDK are in integrations/. Verify the whole stack end-to-end with no API key:
scripts/smoke.sh --fake # boots a key-free fake backend + the real serverfusionHarness is also its own agentic coding harness — like Claude Code, but
the brain can convene the fusion panel. The agent reads, writes, and edits files,
searches, and runs bash in a tool-use loop confined to a project root, and can
call council to escalate a hard sub-question to the full panel.
fusion code "add a /version endpoint and a test for it" --root .
fusion code # interactive agent session
fusion code "refactor X" --plan # write a plan first, then act
fusion code "delete dead code" --approve # confirm each file/bash actionEach step is printed as it happens; the agent calls finish when the task is
done and verified. Tools are confined to --root (default: cwd). For a hard
sub-problem the agent can call the council tool, which convenes the full fusion
panel and returns a synthesized answer. --approve gates every mutating tool
(write/edit/bash); --plan makes it write a numbered plan before acting.
⚠️ Security: the agent runs bash and edits files. Confinement blocks path escapes, not arbitrary command effects — run it on projects you trust, or in a container.
A config picks the panel, judge, and synthesizer. Two presets ship in configs/:
| Preset | Panel | Use it for |
|---|---|---|
configs/budget.yaml |
Gemini 3 Flash · Kimi K2.6 · DeepSeek V4 Pro | frontier-ish quality at ~half the price |
configs/frontier.yaml |
Opus 4.8 · GPT-5.5 · Gemini 3.1 Pro | beyond-frontier quality |
Custom panel:
# my-panel.yaml
name: fusion
panel:
- anthropic/claude-opus-4.8
- openai/gpt-5.5
- model: deepseek/deepseek-v4-pro # long form allows per-model overrides
temperature: 0.3
tools: [web_search]
judge: openai/gpt-5.5
synthesizer: anthropic/claude-opus-4.8
temperature: 0.7
max_tokens: 4096
tools_enabled: falseModel slugs follow OpenRouter conventions (vendor/model). Point at a different
backend with FUSION_BASE_URL (OpenAI, a local vLLM/Ollama server, Groq,
Together — anything OpenAI-compatible). API keys come from the environment only
(FUSION_API_KEY, OPENROUTER_API_KEY, or OPENAI_API_KEY), never from YAML.
All optional config fields (with defaults):
| Key | Default | What it does |
|---|---|---|
refine |
false |
Run one extra self-critique pass over the synthesized answer (quality ↑, cost ↑). |
layers |
1 |
Multi-layer MoA — with layers>1, proposers see the previous layer's drafts and improve before the final synthesis. |
samples |
1 |
Self-consistency — sample each proposer K times so the judge/synthesizer see more drafts. |
diversity |
true |
Spread panelist temperatures so drafts differ (≈¼ of the lift). |
diversity_jitter |
0.3 |
How wide to spread temperatures (the MoA diversity↔quality trade-off — keep it modest). |
max_retries |
2 |
Retry transient upstream failures (429/5xx/timeout) so a flaky panelist doesn't shrink the panel. |
retry_backoff |
0.5 |
Base seconds for exponential retry backoff. |
max_concurrency |
0 |
Cap concurrent panelist calls (0 = unlimited). |
If the judge fails, synthesis still runs from the raw responses; if the
synthesizer fails, the best panelist's answer is returned. Anything that
degraded is reported in the response's fusion.degraded list — never silently.
Panelists can call tools while drafting — useful for deep-research tasks. Tools are off by default. Enable globally and per-model:
tools_enabled: true
panel:
- model: deepseek/deepseek-v4-pro
tools: [web_search, bash]web_search— keyless DuckDuckGo Instant Answer by default; swap in a Tavily/Brave/SerpAPI backend viadefault_registry(search_fn=...).bash— runs in a sandboxed shell (timeout, stripped env, output truncation).
⚠️ Security:bashexecutes commands the model writes. The sandbox is not a container. Run the server in a disposable VM/container before enablingbashwith untrusted input. It is opt-in because it is dangerous.
Every response carries the real numbers. Non-streaming responses include a
fusion block plus headers:
Headers: x-fusion-cost-usd, x-fusion-latency-s. When the backend reports an
authoritative per-call cost, that value is used instead of the local price table
(fusion/pricing.py).
Reproduce the panel-vs-solo comparison on DRACO-style weighted tasks (negative criteria penalize wrong claims, so you can't bluff a high score):
# deterministic stubs — no API key, proves the pipeline
fusion eval --dry-run
# A/B: solo vs panel vs panel+refine, with deltas vs the best solo
fusion eval --ab --dry-run
# live: grade with an LLM judge (needs an API key)
fusion eval --config configs/budget.yaml --tasks eval/tasks.sample.yamlPanel vs solo — scored on 3 task(s)
panel+refine 100.0% ████████████████████ ★ (+28.6 vs best solo)
panel 76.2% ███████████████ ★ (+4.8 vs best solo)
google/gemini-3-flash (solo) 71.4% ██████████████
--runs N repeats each task K times (self-consistency). Add your own tasks in
eval/tasks.sample.yaml (id, prompt, weighted criteria). The dry-run numbers are
from deterministic stubs — real lift needs a key; run the live command above.
| Method | Path | Description |
|---|---|---|
POST |
/v1/chat/completions |
OpenAI-compatible; stream:true supported. Model slug fusion. |
GET |
/v1/models |
Lists fusion plus the configured panel models. |
GET |
/health |
Liveness + active config + panel. |
Per-request overrides. Customize the panel with per-request overrides (like
OpenRouter Fusion's "pass your own participant models and synthesizer") via a
fusion block in the body. Only safe model-selection/flag keys are honored — the
backend URL and keys can never be set from the request:
{
"model": "fusion",
"messages": [{"role": "user", "content": "..."}],
"fusion": {
"panel": ["anthropic/claude-opus-4.8", "openai/gpt-5.5"],
"synthesizer": "anthropic/claude-opus-4.8",
"refine": true,
"layers": 2
}
}fusion/ engine + server + harnesses
├─ providers · panel · judge · synthesize · fusion (MoA engine)
├─ tools · server · streaming · schemas · pricing · config
├─ tui.py (fusion chat — TUI harness)
└─ agent.py · agent_tools.py · cli.py (fusion code — agent harness)
eval/ DRACO-style evaluation harness (scorer, harness, tasks.sample.yaml)
configs/ panel presets (budget.yaml, frontier.yaml)
integrations/ harness adapters — Pi package, OpenAI-SDK example, adapter guide
scripts/ smoke.sh, verify_install.sh, verify_all.sh
docs/ architecture.md, parity.md
tests/ pytest suite — unit (providers mocked) + real-HTTP e2e (live servers), no API key
pip install -e ".[dev]"
pytest -q # full suite, no network requiredSee docs/architecture.md for the design and docs/parity.md for the parity matrix & roadmap.
MIT — see LICENSE.
{ "choices": [ ... ], "usage": { "prompt_tokens": 1234, "completion_tokens": 567, "total_tokens": 1801 }, "fusion": { "config": "fusion", "panel_models": ["google/gemini-3-flash", "moonshotai/kimi-k2.6", "deepseek/deepseek-v4-pro"], "panel_succeeded": 3, "cost_usd": 0.0123, "cost_breakdown": [ { "model": "...", "role": "panel", "cost_usd": 0.004 }, ... ], "timing_s": { "panel": 2.1, "judge": 0.8, "synth": 3.4, "total": 6.3 } } }