The local context-funnel for GitHub Copilot CLI. Trims, dedups, prunes, ranks, and compresses noisy tool output before the model sees it — so the metered input tokens you pay for are the ones that carry signal. Plugs into Copilot CLI's hook system for automatic, universal interception (Bash, file reads, MCP tool blobs) with zero code changes to the model side.
Also usable as a CLI or library for processing arbitrary noisy payloads outside Copilot CLI sessions.
Two commands. End state: every tool output above ~2 000 tokens is silently
funneled through coagula before Copilot CLI's model sees (and bills) it.
macOS / Linux:
pip install https://github.com/pat-nel87/coagula/releases/download/v0.7.3/coagula-0.7.3-py3-none-any.whl
curl -fsSL https://raw.githubusercontent.com/pat-nel87/coagula/main/integrations/copilot-cli/install.sh | bashWindows (PowerShell):
pip install https://github.com/pat-nel87/coagula/releases/download/v0.7.3/coagula-0.7.3-py3-none-any.whl
iwr -useb https://raw.githubusercontent.com/pat-nel87/coagula/main/integrations/copilot-cli/install.ps1 | iexVerify in a copilot session:
copilot -p "Run 'cat tests/fixtures/crashloop.log' and tell me the dominant error pattern" \
--allow-all-tools --allow-all-paths --no-colorYou should see the bash output prefixed with
[coagula: 134715 → 37 tok | tool=bash profile=passthrough] and the model
should still answer correctly. If you instead see the raw 4 000 lines, jump
to the Copilot CLI walkthrough — it covers
prerequisites (gh, jq), tuning env vars, and troubleshooting.
Kill switch: export COAGULA_DISABLE=1 (PowerShell: $env:COAGULA_DISABLE=1).
GitHub flipped Copilot to usage-based billing on June 1, 2026. Every Copilot Chat / Copilot CLI premium request now draws from a monthly AI Credit budget at the underlying model's API rate — input + output + cached tokens, all metered. Heavy diagnostic-tool sessions ("investigate this incident", "explain this cluster's state") burn proportionally more.
coagula trims the input side of those sessions before the tokens are billed. What gets trimmed, and by how much, depends on payload shape:
| Payload shape (measured on test fixtures) | Reduction | Why |
|---|---|---|
| Crashloop / journalctl-style logs with repeated templates | ~99% | Dedup collapses N identical-modulo-timestamp lines to 1 + (xN) |
Bloated kubectl JSON (-o json of a healthy pod) |
~70% | Prune strips managedFields, resourceVersion, annotations, etc. |
| Structured Postgres / Azure ARM JSON | ~50-80% | Profile-specific denylist + array truncation |
Mixed prose / code output (e.g. cargo build) |
~10-30% | Limited dedup; Relevance + Summarize need a query set |
| Pure code, well-structured text | ~5-15% | coagula barely helps — there's not much to compress |
Reductions are real and reproducible from tests/fixtures/. Percentages
do not directly equal dollar savings — that depends on which model your
Copilot session uses, how often you run noisy-tool investigations, and your
plan tier. Real-session dollar benchmarks are TBD.
For flat-fee plans (Claude Pro, Cursor, Windsurf) the dollar impact is zero. You still get faster responses, less context-window pressure, and cleaner inputs to the model.
The compression engine works; the question is whether the firing rate × savings per firing product is positive on your workload. Below are results from counterbalanced 4-trial experiments (baseline-funneled- baseline-funneled, cache held constant at ~26-87k cached tokens across all trials so any delta is attributable to coagula, not prompt-cache priming).
| Workload | Model | Baseline | Funneled | Δ |
|---|---|---|---|---|
bash dump (force cat large_file, no pipes) |
GPT-5 | 9.29 cr | 7.93 cr | −14.6% |
Paginated single-line views (20× view file:N-N) |
GPT-5 | 8.11 cr | 8.20 cr | +1.0% (noise) |
Paginated single-line views (20× view file:N-N) |
Sonnet 4.6 | 8.14 cr | 8.18 cr | +0.5% (noise) |
Two real-world conclusions from those numbers:
-
coagula provides ~15% credit savings on bash-dump diagnostic workflows. This is the original design target —
kubectl get pods -o yaml,journalctl -u svc,psql ... select * from big_table, etc. The model dispatches one big command, the postToolUse hook intercepts the multi-KB output, the funnel's lossless stages (Dedup + Prune) compress it 70-99%. Even when Copilot CLI spills the output to a temp file and the model views the spill, coagula's view-summary path catches it. -
coagula is essentially breakeven on paginated single-line view workloads regardless of model. Both GPT-5 and Claude Sonnet 4.6 ignored coagula's nudge + file-summary injection text and made all 20 view calls anyway. v0.7.3 made the summary one-shot per file so the overhead is ~zero, but the model behavior doesn't change — there's no compression opportunity in a 1-line view output. If your sessions are 100% surgical line-range reads with no bash dumps, this tool will neither help nor hurt; that's a structural limit of the postToolUse hook interface, not a bug.
Workloads coagula provably cannot help:
| Workload | Why |
|---|---|
web_fetch tool results |
Doesn't dispatch postToolUse — upstream issue |
| User pastes raw content into chat | Prompts don't go through postToolUse — only tool calls do |
Sessions where the model grep/uniq-pre-filters everything |
Hook fires but on already-summarized output (model did the work) |
The compression-percentages-on-fixtures table above is real, but it measures only the second multiplier (savings per firing). Real session deltas depend on what fraction of your tool calls are bash dumps vs surgical reads.
References:
- GitHub Copilot is moving to usage-based billing — GitHub Blog
- Models and pricing for GitHub Copilot — GitHub Docs
- "What a joke": GitHub Copilot's new token-based billing — TechCrunch
The honest version of "30-99% reduction" is paired with a regression check. coagula ships an in-repo eval harness that measures the answer-accuracy delta — does the model still answer correctly when fed the compressed context vs the original?
python -m coagula.evalsOutput:
CASE COMPRESSION RAW FUNNELED DELTA
------------------------------------------------------------
crashloop-fatal 99.9% OK OK =
k8s-crashloop 93.8% OK OK =
raw accuracy: 2/2 funneled accuracy: 2/2 regressions: 0
By default the suite uses a deterministic substring-overlap "judge" — good
enough to catch regressions where compression destroyed the signal-bearing
line, not a substitute for real-model evaluation. Set RUN_EVALS=1 (with
Azure or Ollama configured) to route through a real LLM. Add your own cases
via EvalRunner.add_case.
CI gate: python -m coagula.evals --fail-on-regression exits non-zero if
any case had a funneled-vs-raw accuracy regression.
Step-by-step for setting up automatic context funneling in copilot on a
fresh machine. Skip the steps you've already done.
macOS / Linux:
brew install gh
gh auth login # GitHub auth + Copilot subscription
gh extension install github/copilot-cli # or: npm i -g @github/copilot-cli
copilot --version # confirm ≥ 1.0Windows (PowerShell):
winget install GitHub.cli # or: scoop install gh
gh auth login
gh extension install github/copilot-cli
copilot --versionIf copilot isn't on your PATH after npm install, add $(npm prefix -g)/bin
to PATH. If you're inside an org that disabled hooks, see Troubleshooting.
pip install https://github.com/pat-nel87/coagula/releases/download/v0.7.3/coagula-0.7.3-py3-none-any.whl
coagula --help # verify on PATHjq is needed for the bash hook to parse the JSON Copilot CLI streams in;
PowerShell native users don't need it:
brew install jq # macOS
sudo apt install jq # Debian/Ubuntu
winget install jqlang.jq # Windows (only if using Git Bash)macOS / Linux / Git Bash:
./integrations/copilot-cli/install.shWindows PowerShell:
.\integrations\copilot-cli\install.ps1Either installer copies the hook scripts to ~/.copilot/hooks-bin/ and
writes ~/.copilot/hooks/coagula.json with both bash and powershell
command fields — Copilot CLI auto-picks per platform, and the PowerShell
hooks themselves further auto-defer to Git Bash if it's on PATH. So a
single config works on macOS, Linux, Windows native, and Windows + Git
Bash. Re-run with --force (or -Force) to overwrite an existing config.
copilot -p "Run 'cat tests/fixtures/crashloop.log' and tell me the dominant error pattern" \
--allow-all-tools --allow-all-paths --no-colorWhat you should see:
- Bash call output prefixed with
[coagula: 134715 → 37 tok | tool=bash profile=passthrough]instead of 130 KB of log lines. - Total
intokens in the session footer ~60 k, not ~150 k. - The answer ("connection refused: upstream postgres unreachable") still correct.
If you instead see the raw 4 000 lines, jump to Troubleshooting below.
Set in ~/.zshrc / ~/.bashrc (macOS/Linux) or $PROFILE (Windows):
export COAGULA_QUERY="default query" # overrides per-session inference
export COAGULA_BUDGET=2000 # funneled output token cap
export COAGULA_KEEP=5 # top-K chunks kept by relevance
export COAGULA_NOISY_PATTERNS="helm|terraform" # extra preToolUse commands to intercept
export COAGULA_SKIP_TOOLS="my_internal_tool" # extra postToolUse tools to bypass
export COAGULA_DISABLE=1 # kill switchTwo layers control when the postToolUse hook actually fires.
Per-tool defaults (don't usually need to change):
| Tool type | Default token threshold |
|---|---|
bash / shell / powershell |
2000 |
view / read / read_file |
500 |
MCP tools (mcp:* / *__*) |
1000 |
| Other tools | 1500 |
Why per-tool: a 1KB bash output is usually a real answer (version probes,
ls), but a 1KB view output is often paginated noise that compresses 60-80%.
The lower view threshold was empirically validated on real Windows Copilot
CLI sessions before being promoted.
Override globally with COAGULA_THRESHOLD=<n> (replaces every per-tool
default with that single number — useful for A/B testing).
Cumulative session tracking catches the death-by-a-thousand-cuts case
where a model makes many sub-threshold view_range or grep calls that
individually slip through:
export COAGULA_CUMULATIVE_THRESHOLD=8000 # default; 0 disablesState lives in ~/.copilot/coagula-session-state/<ppid>.json (per copilot
process). Once the session total crosses the threshold, subsequent
sub-threshold calls get funneled too. Entries older than 1h are pruned.
Lite mode (default — no COAGULA_QUERY / COAGULA_TASK set): only the
lossless stages run (Normalize → Dedup → Prune → Budget → Assemble). Still
gets 95%+ reduction on log-shaped output via dedup alone. Skips Relevance +
Summarize entirely so no Azure / Ollama call happens by default.
v0.6.0+ logs every postToolUse decision (not just successful funnelings). Tail it to see your actual firing rate:
tail -f ~/.copilot/coagula-debug.logEach line categorizes the verdict:
| Tag | Meaning |
|---|---|
fired |
Compression happened; line shows in=X out=Y reason=... |
under-threshold |
Call was below the threshold (per-call AND cumulative) — passed through |
skipped |
Tool was on the skip list (internal bookkeeping or COAGULA_SKIP_TOOLS) |
funnel-noop |
Funnel ran but couldn't shrink — original passed through |
coagula-missing |
coagula not on PATH; hooks are no-ops. Install coagula globally to fix. |
disabled |
COAGULA_DISABLE=1 set |
If you see lots of under-threshold and few fired, your workload is
mostly small-tool-output and per-call thresholds don't catch it. The
cumulative-trigger should kick in eventually; if not, lower
COAGULA_CUMULATIVE_THRESHOLD. If you see coagula-missing, the hooks
are running but doing nothing — install coagula in a Python that's on the
shell's PATH (not just in a venv).
Debug log — every real transform appends one line to
~/.copilot/coagula-debug.log (Windows: $env:USERPROFILE\.copilot\coagula-debug.log).
Passthroughs don't log, so the file is signal-dense. Tail with tail -f
(or Get-Content -Wait on Windows). Disable with COAGULA_DEBUG_LOG=off.
First step on any problem (Windows): run the smoke-test diagnostic. It walks 10 checks and tells you exactly what's missing:
.\integrations\windows-smoke.ps1
.\integrations\windows-smoke.ps1 -SkipLiveSession # don't burn a premium requestNo Linux/macOS equivalent script yet — walk the same checks manually:
coagula --help, jq --version, cat ~/.copilot/hooks/coagula.json,
tail -f ~/.copilot/logs/*.log during a session.
Common specifics:
- Hook doesn't fire. Check
~/.copilot/logs/(Windows:%USERPROFILE%\.copilot\logs\) forpreToolUse/postToolUselines. Usuallyjqmissing (bash path) orcoagulanot on PATH for the shellcopilotlaunched (use absolute paths incoagula.jsonif your shell rc isn't sourced for non-interactive bash). - "Permission denied" on the hook script. macOS/Linux:
chmod +x ~/.copilot/hooks-bin/*.sh. Windows: usually an ExecutionPolicy issue — the installer writespowershell -ExecutionPolicy Bypasswhich should sidestep it, but corporate AppLocker can override. RunSet-ExecutionPolicy -Scope CurrentUser RemoteSignedor ask IT to allowlist the script path. - Funneled output too aggressive / signal lost. Raise
COAGULA_BUDGETandCOAGULA_KEEP, or setCOAGULA_THRESHOLD=10000so only enormous outputs get intercepted. - Org disabled hooks. Some GitHub orgs disable Copilot CLI hooks via
policy. There's no workaround inside Copilot CLI itself; either pre-funnel
payloads manually with the
coagulaCLI before feeding them in (kubectl … | coagula --query "…" | pbcopy), orgit checkout v0.4.0for the LLM-invokedcoagula-mcpserver, which isn't hook-policy gated. - Tool repeatedly re-reads the spill file. Copilot CLI persists original
output at
/tmp/copilot-tool-output-*.txt(Windows:%TEMP%\copilot-tool-output-*.txt); the model may go fetch the raw blob if it doesn't trust the funneled version. TuneCOAGULA_QUERYto be specific so the funneled result actually contains the signal the model is after.
- macOS: Library, CLI, Ollama, bash hooks in live Copilot CLI session — all verified locally. Counterbalanced n=4 bash-dump test shows −14.6% credits (see "When coagula actually helps"). Counterbalanced paginated-view tests on both GPT-5 and Sonnet 4.6 show ~0% delta (within noise) — the nudge + summary features fire correctly but current models ignore the injected hint text.
- Windows + GitHub Copilot CLI: Verified end-to-end on Windows 11
Enterprise + Copilot CLI 1.0.59 + Python 3.13. Hooks fire in live
sessions; Azure OpenAI route succeeds when env vars are set; PowerShell
native fallback engages without Git Bash present. CI matrix on
windows-latestcovers the same surface continuously. - Azure OpenAI: Verified end-to-end against a real tenant (gpt-5.4 +
gpt-5.4-nano, API version
2024-10-21). Mocked unit tests run on every push; opt-in integration tests gate onRUN_AZURE_TESTS=1. - Ollama: Verified end-to-end on macOS with
nomic-embed-text+llama3.2:3b. Not yet exercised on Windows.
| Variable | Purpose | Default |
|---|---|---|
COAGULA_QUERY |
Per-session query — unlocks Relevance + Summarize stages | (unset → lite mode) |
COAGULA_BUDGET |
Funneled output token cap | 2000 |
COAGULA_KEEP |
Top-K chunks kept by Relevance | 5 |
COAGULA_THRESHOLD |
Override per-tool default thresholds with one global number. Unset: per-tool defaults (bash 2000, view 500, MCP 1000, other 1500). | (per-tool table) |
COAGULA_CUMULATIVE_THRESHOLD |
Session-level token total above which sub-threshold calls also get funneled. Catches paginated view_range/grep patterns. 0 disables. |
8000 |
COAGULA_VIEW_NUDGE_AFTER |
After N view-tool calls per session, inject a one-time hint encouraging bash alternatives that compress (v0.7.0+). 0 disables. Currently dormant: tested models (GPT-5, Sonnet 4.6) ignore the hint. Costs nothing — kept for future model compatibility. | 4 |
COAGULA_SUMMARY_INJECT |
On first view of a file, if file content compresses well, prepend a coagula summary to that one response (v0.7.3+: one-shot, not every call). Set to off to disable. Currently dormant for the same reason as the nudge. |
on |
COAGULA_NOISY_PATTERNS |
Extra preToolUse Bash commands to intercept (regex) |
(built-in list) |
COAGULA_SKIP_TOOLS |
Extra postToolUse tools to bypass |
(built-in skiplist) |
COAGULA_DISABLE |
Kill switch — hooks no-op | (off) |
COAGULA_BACKEND |
Force backend: azure, ollama, fallback |
(auto) |
COAGULA_CACHE_STABLE |
Force temperature=0 on LLM hooks so output is byte-stable across identical inputs — required for upstream provider prompt caches (Anthropic ~90% / OpenAI ~50% discount on cached input tokens) to hit. Truthy: on, 1, yes, true, enabled. |
off |
COAGULA_WORKSPACE_KEY |
DeferredStore scoping key for embedded library users running coagula across multiple project dirs in one process | process CWD |
COAGULA_DEBUG_LOG |
Override path, or off to disable |
~/.copilot/coagula-debug.log |
OLLAMA_HOST / COAGULA_EMBED_MODEL / COAGULA_LLM_MODEL |
Ollama overrides | see below |
AZURE_OPENAI_* |
Azure deployment selection | see below |
By default Relevance uses TF-IDF and Summarize is extractive — both
stdlib, no model call. For better quality on noisy diagnostic payloads,
route those two stages through a cheaper-than-frontier LLM. The coagula
CLI auto-detects at startup with this priority: COAGULA_BACKEND
override → Azure OpenAI → Ollama → stdlib fallback.
Azure OpenAI:
$env:AZURE_OPENAI_ENDPOINT = "https://my-resource.openai.azure.com"
$env:AZURE_OPENAI_API_KEY = "..."
$env:AZURE_OPENAI_LLM_DEPLOYMENT = "gpt-5.4-nano" # or gpt-4o-mini
$env:AZURE_OPENAI_EMBED_DEPLOYMENT = "text-embedding-3-small" # optional
$env:AZURE_OPENAI_API_VERSION = "2024-10-21" # optionalLLM_DEPLOYMENT is required to wire Azure; EMBED_DEPLOYMENT is optional
(Relevance keeps TF-IDF without it). Both factories accept a fallback
callable that fires on Azure errors so transient outages degrade to TF-IDF
rather than crashing.
Ollama (local + offline):
brew install --cask ollama-app && open -a Ollama
ollama pull nomic-embed-text llama3.2:3b
# Override defaults via OLLAMA_HOST, COAGULA_EMBED_MODEL, COAGULA_LLM_MODEL.python -m coagula.cli \
--query "why is the payments pod crashlooping" \
--budget 800 --keep 4 --report \
tests/fixtures/noisy_mixed.txt
# Pipe real noisy context:
kubectl get pod <name> -o json | coagula --query "why is this pod failing" --reportfrom coagula import coagula_payload, ChunkSpec
result = coagula_payload(
[
ChunkSpec(text=kubectl_json, kind="json", source="kubectl/pod"),
ChunkSpec(text=crashloop, kind="log", source="logs/payments"),
ChunkSpec(text=fatal_line, source="alerts", severity="FATAL"),
],
query="why is this pod crashlooping",
profile="k8s",
max_tokens=2000,
extra_critical_patterns=[r"payment_id=\d+"],
)
print(result.prompt) # the cleaned context
print(result.deferred_manifest) # what got demoted, retrievable by id
print(result.report) # per-stage savings tableOr wire the backends directly:
import os
from coagula import default_funnel
from coagula.models.azure_openai import make_embedder, make_llm
embed = make_embedder("text-embedding-3-small",
endpoint="https://my-resource.openai.azure.com",
api_key=os.environ["AZURE_OPENAI_API_KEY"])
llm = make_llm("gpt-5.4-nano",
endpoint="https://my-resource.openai.azure.com",
api_key=os.environ["AZURE_OPENAI_API_KEY"])
funnel = default_funnel(embedder=embed, llm=llm, max_tokens=2000, keep=5)Stages run in fixed order, cheap-before-expensive:
normalize → dedup → prune → relevance → summarize → budget → assemble
Everything is demote, not delete: pruned chunks become DEFERRED and
are retrievable on demand via DeferredStore. CRITICAL chunks are
sacrosanct — never demoted, never dropped. Severity pinning at ingestion
(severity="FATAL"|"ERROR" → CRITICAL) is the correctness guarantee
that compensates for an imperfect relevance ranker.
See SPEC.md for the full contract.
- Not a coding-agent universal tool. v0.5.0 is Copilot-CLI-specific.
Claude Code and VSCode Copilot Chat use flat-fee plans with large
context windows, so the dollar/window-pressure motivation for compression
doesn't apply. Earlier versions shipped Claude Code / VSCode hooks and
an MCP server; both were removed in v0.5.0 to focus on the one host that
actually benefits — see
v0.4.0if you need those. - Not a vector DB / RAG store. The funnel is stateless per request except for the per-request deferred store.
- No telemetry, no network egress on the default path. Ollama is local;
hooks invoke the local
coagulabinary over stdio.
v0.7.3 (current) — Hardens the v0.7 view-mitigation features:
- One-shot summary injection. v0.7.0/v0.7.2 injected the cached file summary on every view response of the same file, adding ~180 tokens × N calls. n=4 counterbalanced testing showed this cost +15% credits without changing model behavior. v0.7.3 injects the summary only on the first view of each file. Re-tested counterbalanced: now ~0% delta (within noise).
- Defensive hook robustness. Dropped
set -ein favor of an EXIT trap that converts any unexpected exit to a clean{}passthrough + decision-log entry. Fixed an empirically-discovered jq-on-real-payload bug that caused 19/20 hook invocations to silently fail in real Copilot CLI sessions on v0.7.0/v0.7.1. - Schema-correct file-path extraction. Copilot CLI passes the view
tool's
toolArgsas a JSON string (not an object); v0.7.2 fixed the extractor to handle both shapes so the summary path actually fires in production sessions.
Honest read on the v0.7 mitigation story: the nudge + summary
features ship and fire correctly, but neither GPT-5 nor Claude
Sonnet 4.6 changes behavior in response to the injected hint text.
They're dormant — costless after v0.7.3, but also savingless. They
remain in the codebase as kindling for when a model that heeds
additionalContext-style hints comes along, but provide no real value
today. The actual measured value lives entirely in the v0.3-era
bash-dump compression path (~15% credit savings on counterbalanced
real-session tests; see "When coagula actually helps" above).
Native PowerShell users get v0.6.0 features (per-tool thresholds + cumulative tracking + decision logging); v0.7 injection features require Git Bash (auto-detected by the PS hook when present).
21 hook integration tests validate the bash hook end-to-end with synthetic Copilot CLI payloads, including the v0.7.2 string-shaped toolArgs regression check and the v0.7.3 one-shot summary injection.
v0.6.0 — Per-tool thresholds (bash 2000, view 500, MCP 1000) replace
the single global default — increases firing rate on paginated view_range
and small-tool-output patterns without harming response quality. Cumulative
session tracking catches death-by-a-thousand-cuts: once total tool-output
tokens cross COAGULA_CUMULATIVE_THRESHOLD (default 8000), subsequent
sub-threshold calls also funnel. Every postToolUse decision now writes a
debug-log line categorizing the verdict (fired/under-threshold/skipped
/etc.), so actual firing rate is finally observable — see "Reading the debug
log" above. 12 new integration tests validate the bash hook end-to-end
with synthetic Copilot CLI payloads.
v0.5.x scoped coagula exclusively to GitHub Copilot CLI via host hooks
(PowerShell on Windows, Bash on macOS/Linux/Git Bash). The seven-stage
funnel runs on stdlib alone; Azure OpenAI and Ollama are optional drop-in
backends. Claude Code / VSCode Copilot Chat hook integrations and the
coagula-mcp MCP server were removed to reduce maintenance surface for a
target audience that didn't benefit from them. The accuracy-preservation
eval harness and cache-stable mode shipped in v0.4.0 carry over.
coagula is licensed under the Apache License, Version 2.0.
See NOTICE for attribution requirements.
- Free for any use — personal, commercial, hosted, embedded.
- Modify and redistribute freely; keep the copyright + NOTICE.
- Patent grant from contributors; patent retaliation if you sue.
- Provided as-is, no warranty.
For vulnerability reports see SECURITY.md. To contribute,
see CONTRIBUTING.md — every commit must be DCO-signed
(git commit -s).
