Skip to content

jackulau/fusionHarness

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fusionHarness

Open-source Mixture-of-Agents compound-model server — a self-hostable alternative to OpenRouter's Fusion API.

Fan a prompt out to a panel of LLMs in parallel, let a judge extract the structure of their answers (consensus, contradictions, partial coverage, unique insights), then a synthesizer writes one final answer grounded in that analysis. The result beats any single panelist — and a panel of budget models can rival a frontier model at a fraction of the cost.

It speaks the OpenAI API, so it drops into any existing OpenAI client: point base_url at fusionHarness and use the model slug fusion.

          ┌─────────────┐
prompt ─► │   fan-out   │ ─► model A ─┐
          │   (panel)   │ ─► model B ─┤  (parallel, each tool-enabled)
          └─────────────┘ ─► model C ─┘
                                  │
                                  ▼
                            ┌───────────┐     ┌──────────────┐
                            │   judge   │ ──► │ synthesizer  │ ─► final answer
                            │ structure │     │  grounded    │    + cost / latency
                            └───────────┘     └──────────────┘

Why it works (OpenRouter's own ablation): ~¾ of the lift comes from synthesis, ~¼ from diversity.

Quickstart

# 1. Install
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

# 2. Configure — one OpenRouter key reaches every model in the catalog
cp .env.example .env       # then put your key in FUSION_API_KEY

# 3. Run the OpenAI-compatible server (omit --config to use the built-in budget panel)
fusion serve --config configs/budget.yaml

# 4. Call it like any OpenAI endpoint
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"fusion","messages":[{"role":"user","content":"Compare CRDTs vs OT for collaborative editing."}]}'

From the OpenAI Python SDK (pip install openai):

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
resp = client.chat.completions.create(
    model="fusion",
    messages=[{"role": "user", "content": "..."}],
)
print(resp.choices[0].message.content)

Or straight from the terminal, no server:

export FUSION_API_KEY=sk-or-...
fusion ask "What are the trade-offs between gRPC and REST?" --config configs/budget.yaml

Use it from a harness

Because fusion speaks OpenAI, it drops into any agent harness — or use our own.

# Our own TUI harness — streaming, multi-turn chat (/reset /stats /help /exit)
fusion chat --config configs/budget.yaml

# Pi (pi.dev): install the package, register the provider, point Pi at it
pi install ./integrations/pi
bash integrations/pi/install.sh
pi --model fusion

Adapters for Pi, Claude Code, aider, Continue, LangChain, and the OpenAI SDK are in integrations/. Verify the whole stack end-to-end with no API key:

scripts/smoke.sh --fake     # boots a key-free fake backend + the real server

Agentic coding (fusion code)

fusionHarness is also its own agentic coding harness — like Claude Code, but the brain can convene the fusion panel. The agent reads, writes, and edits files, searches, and runs bash in a tool-use loop confined to a project root, and can call council to escalate a hard sub-question to the full panel.

fusion code "add a /version endpoint and a test for it" --root .
fusion code                       # interactive agent session
fusion code "refactor X" --plan   # write a plan first, then act
fusion code "delete dead code" --approve   # confirm each file/bash action

Each step is printed as it happens; the agent calls finish when the task is done and verified. Tools are confined to --root (default: cwd). For a hard sub-problem the agent can call the council tool, which convenes the full fusion panel and returns a synthesized answer. --approve gates every mutating tool (write/edit/bash); --plan makes it write a numbered plan before acting.

⚠️ Security: the agent runs bash and edits files. Confinement blocks path escapes, not arbitrary command effects — run it on projects you trust, or in a container.

Configuration

A config picks the panel, judge, and synthesizer. Two presets ship in configs/:

Preset Panel Use it for
configs/budget.yaml Gemini 3 Flash · Kimi K2.6 · DeepSeek V4 Pro frontier-ish quality at ~half the price
configs/frontier.yaml Opus 4.8 · GPT-5.5 · Gemini 3.1 Pro beyond-frontier quality

Custom panel:

# my-panel.yaml
name: fusion
panel:
  - anthropic/claude-opus-4.8
  - openai/gpt-5.5
  - model: deepseek/deepseek-v4-pro    # long form allows per-model overrides
    temperature: 0.3
    tools: [web_search]
judge: openai/gpt-5.5
synthesizer: anthropic/claude-opus-4.8
temperature: 0.7
max_tokens: 4096
tools_enabled: false

Model slugs follow OpenRouter conventions (vendor/model). Point at a different backend with FUSION_BASE_URL (OpenAI, a local vLLM/Ollama server, Groq, Together — anything OpenAI-compatible). API keys come from the environment only (FUSION_API_KEY, OPENROUTER_API_KEY, or OPENAI_API_KEY), never from YAML.

Quality & reliability knobs

All optional config fields (with defaults):

Key Default What it does
refine false Run one extra self-critique pass over the synthesized answer (quality ↑, cost ↑).
layers 1 Multi-layer MoA — with layers>1, proposers see the previous layer's drafts and improve before the final synthesis.
samples 1 Self-consistency — sample each proposer K times so the judge/synthesizer see more drafts.
diversity true Spread panelist temperatures so drafts differ (≈¼ of the lift).
diversity_jitter 0.3 How wide to spread temperatures (the MoA diversity↔quality trade-off — keep it modest).
max_retries 2 Retry transient upstream failures (429/5xx/timeout) so a flaky panelist doesn't shrink the panel.
retry_backoff 0.5 Base seconds for exponential retry backoff.
max_concurrency 0 Cap concurrent panelist calls (0 = unlimited).

If the judge fails, synthesis still runs from the raw responses; if the synthesizer fails, the best panelist's answer is returned. Anything that degraded is reported in the response's fusion.degraded list — never silently.

Tools (web search + bash)

Panelists can call tools while drafting — useful for deep-research tasks. Tools are off by default. Enable globally and per-model:

tools_enabled: true
panel:
  - model: deepseek/deepseek-v4-pro
    tools: [web_search, bash]
  • web_search — keyless DuckDuckGo Instant Answer by default; swap in a Tavily/Brave/SerpAPI backend via default_registry(search_fn=...).
  • bash — runs in a sandboxed shell (timeout, stripped env, output truncation).

⚠️ Security: bash executes commands the model writes. The sandbox is not a container. Run the server in a disposable VM/container before enabling bash with untrusted input. It is opt-in because it is dangerous.

Cost & latency tracking

Every response carries the real numbers. Non-streaming responses include a fusion block plus headers:

{
  "choices": [ ... ],
  "usage": { "prompt_tokens": 1234, "completion_tokens": 567, "total_tokens": 1801 },
  "fusion": {
    "config": "fusion",
    "panel_models": ["google/gemini-3-flash", "moonshotai/kimi-k2.6", "deepseek/deepseek-v4-pro"],
    "panel_succeeded": 3,
    "cost_usd": 0.0123,
    "cost_breakdown": [ { "model": "...", "role": "panel", "cost_usd": 0.004 }, ... ],
    "timing_s": { "panel": 2.1, "judge": 0.8, "synth": 3.4, "total": 6.3 }
  }
}

Headers: x-fusion-cost-usd, x-fusion-latency-s. When the backend reports an authoritative per-call cost, that value is used instead of the local price table (fusion/pricing.py).

Evaluation harness

Reproduce the panel-vs-solo comparison on DRACO-style weighted tasks (negative criteria penalize wrong claims, so you can't bluff a high score):

# deterministic stubs — no API key, proves the pipeline
fusion eval --dry-run

# A/B: solo vs panel vs panel+refine, with deltas vs the best solo
fusion eval --ab --dry-run

# live: grade with an LLM judge (needs an API key)
fusion eval --config configs/budget.yaml --tasks eval/tasks.sample.yaml
Panel vs solo — scored on 3 task(s)

panel+refine                     100.0%  ████████████████████  ★ (+28.6 vs best solo)
panel                             76.2%  ███████████████  ★ (+4.8 vs best solo)
google/gemini-3-flash (solo)      71.4%  ██████████████

--runs N repeats each task K times (self-consistency). Add your own tasks in eval/tasks.sample.yaml (id, prompt, weighted criteria). The dry-run numbers are from deterministic stubs — real lift needs a key; run the live command above.

API

Method Path Description
POST /v1/chat/completions OpenAI-compatible; stream:true supported. Model slug fusion.
GET /v1/models Lists fusion plus the configured panel models.
GET /health Liveness + active config + panel.

Per-request overrides. Customize the panel with per-request overrides (like OpenRouter Fusion's "pass your own participant models and synthesizer") via a fusion block in the body. Only safe model-selection/flag keys are honored — the backend URL and keys can never be set from the request:

{
  "model": "fusion",
  "messages": [{"role": "user", "content": "..."}],
  "fusion": {
    "panel": ["anthropic/claude-opus-4.8", "openai/gpt-5.5"],
    "synthesizer": "anthropic/claude-opus-4.8",
    "refine": true,
    "layers": 2
  }
}

Project layout

fusion/        engine + server + harnesses
               ├─ providers · panel · judge · synthesize · fusion   (MoA engine)
               ├─ tools · server · streaming · schemas · pricing · config
               ├─ tui.py        (fusion chat — TUI harness)
               └─ agent.py · agent_tools.py · cli.py   (fusion code — agent harness)
eval/          DRACO-style evaluation harness (scorer, harness, tasks.sample.yaml)
configs/       panel presets (budget.yaml, frontier.yaml)
integrations/  harness adapters — Pi package, OpenAI-SDK example, adapter guide
scripts/       smoke.sh, verify_install.sh, verify_all.sh
docs/          architecture.md, parity.md
tests/         pytest suite — unit (providers mocked) + real-HTTP e2e (live servers), no API key

Development

pip install -e ".[dev]"
pytest -q          # full suite, no network required

See docs/architecture.md for the design and docs/parity.md for the parity matrix & roadmap.

License

MIT — see LICENSE.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors