Real-time failure detection for production AI agents - Slack alert within 15 seconds.
If Dunetrace helps you, consider giving it a ⭐ on top right, it helps others find the project.
AI agents fail silently:
- ✓ API returns 200 ✓ Logs are clean
- ✗ Agent called the same tool 12 times, burned $10, and gave the user a wrong answer
Langfuse and LangSmith answer "what happened?" — after you already know it broke. Dunetrace answers "is something breaking right now?" and fires an alert in 15 seconds.
| Dunetrace | Langfuse / LangSmith | |
|---|---|---|
| When it fires | Within 15s of run completion | You query it after you notice a problem |
| What it watches | Structural failure patterns | Raw trace data |
| Alert channel | Slack / webhook / Dashboard | Dashboard only |
| Fix path | One-click prompt apply or GitHub PR | Manual |
1. Start the backend
git clone https://github.com/dunetrace/dunetrace
cd dunetrace && cp .env.example .env
docker compose -f docker-compose.ghcr.yml up -d2. Install the SDK
pip install dunetrace # Python
npm install dunetrace # Node.js / TypeScript3. Instrument your agent
Python
from dunetrace import Dunetrace
dt = Dunetrace()
@dt.tool
def web_search(query: str) -> list: ...
@dt.trace
def my_agent(question: str) -> str:
return web_search(question)[0]TypeScript / Node.js
import { Dunetrace } from "dunetrace";
import OpenAI from "openai";
const dt = new Dunetrace();
const openai = dt.wrapOpenAI(new OpenAI());
await dt.run("my-agent", { model: "gpt-4o" }, async (run) => {
await openai.chat.completions.create({ model: "gpt-4o", messages });
run.finalAnswer();
});Try the built-in failure scenarios
cd packages/sdk-py
python examples/basic_agent.py # No LLM calls
SCENARIO=tool_loop python examples/langchain_agent.py # TOOL_LOOP via LangChain
SCENARIO=failures python examples/decorator_agent.py # TOOL_LOOP, RETRY_STORM, RAG_EMPTY_RETRIEVAL
SCENARIO=tool_loop python examples/langfuse_agent.py # TOOL_LOOP + Langfuse explainOpen the dashboard: http://localhost:3000
17 detectors run on every completed run — no configuration, no LLM.
| Signal | What it catches |
|---|---|
TOOL_LOOP |
Same tool called repeatedly with identical args |
TOOL_THRASHING |
Oscillating between two tools, unable to commit |
RETRY_STORM |
Tool failing, agent retrying it repeatedly |
CASCADING_TOOL_FAILURE |
Multiple different tools failing in sequence |
CONTEXT_BLOAT |
Prompt tokens growing unsustainably across LLM calls |
LLM_TRUNCATION_LOOP |
Model output truncated repeatedly |
GOAL_ABANDONMENT |
Agent stopped using tools before finishing |
REASONING_STALL |
Too many LLM calls per tool call — agent deliberating in circles |
TOOL_AVOIDANCE |
Agent answered without using any tools |
RAG_EMPTY_RETRIEVAL |
Retrieval returned nothing, agent answered anyway |
EMPTY_LLM_RESPONSE |
Model returned an empty response |
FIRST_STEP_FAILURE |
Failed on the first step — config or setup issue |
SLOW_STEP |
Single step latency well above threshold |
STEP_COUNT_INFLATION |
Far more steps than the agent's baseline |
SESSION_LATENCY |
Wall-clock run time anomalously long vs per-agent baseline |
COST_SPIKE |
Total token consumption unusually high vs per-agent baseline |
PROMPT_INJECTION_SIGNAL |
Input matched adversarial injection patterns |
Each alert includes: what fired, why it matters, a concrete fix, and a rate context line (first occurrence / recurring / systemic).
Custom detectors — write a detector in plain English. Dunetrace translates it to a structured condition set, runs it in shadow mode against real traffic, and lets you review the fire rate before any alert fires. In the dashboard: Config → Custom detectors → Add detector.
Live at http://localhost:3000. Auto-refreshes every 15s.
Slack and generic webhook (PagerDuty, Linear, custom).
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/...
SLACK_MIN_SEVERITY=LOW # LOW | MEDIUM | HIGH | CRITICALA weekly digest (Monday 9am UTC) summarises top failure types and systemic patterns. Enable with DIGEST_ENABLED=true.
Connect Langfuse to get LLM-powered root-cause analysis on any signal. Click Explain + on any alert — Dunetrace fetches the full trace, extracts the system prompt, and returns a specific root cause and fix.
- Prompt fixes → Apply via Langfuse creates a new prompt version in one click
- Code/infra fixes → Open PR on GitHub creates a draft PR with a unified diff
Fix effectiveness is tracked automatically.
Runtime guardrails that fire mid-run — before a failure propagates.
dt.add_policy(
name="cap tool calls",
condition={"trigger": "tool_call_count", "operator": "gt", "value": 5},
action={"type": "stop"},
)
dt.add_policy(
name="cost cap",
condition={"trigger": "cost_usd", "operator": "gt", "value": 0.50},
action={"type": "switch_model", "params": {"model": "gpt-4o-mini"}},
)Policies can also be created in the dashboard and fetched automatically by the SDK (60s TTL).
Query agent signals directly from Claude Code, Cursor, or Codex — without leaving your editor.
pip install dunetrace-mcp10 tools — ask your editor things like "what failed in the last 24 hours?"
| Tool | What you can ask |
|---|---|
list_agents |
"Which agents are monitored and how healthy are they?" |
get_agent_signals |
"What failures did my agent have today?" |
get_agent_health |
"Show me the health score breakdown for my agent." |
get_signal_detail |
"Show me signal #42 with full evidence and fix code." |
get_agent_patterns |
"Is this failure systemic or a one-off?" |
get_run_detail |
"Walk me through run abc123 step by step." |
get_agent_runs |
"List recent runs for my agent with their status." |
search_signals |
"Show me all CRITICAL signals in the last 24 hours." |
summarize_agent |
"Give me a one-shot diagnosis of my agent." |
get_agent_token_stats |
"How much is my agent wasting on failed runs?" |
Claude Code: registered automatically in ~/.claude.json after pip install dunetrace-mcp. Restart Claude Code to load.
Cursor: add .cursor/mcp.json to your project root:
{
"mcpServers": {
"dunetrace": {
"command": "dunetrace-mcp",
"env": {
"DUNETRACE_API_URL": "http://localhost:8002",
"DUNETRACE_API_KEY": "dt_dev_test"
}
}
}
}No raw content ever leaves your agent process. Every prompt, tool argument, and model output is SHA-256 hashed before transmission.
Agent Code
└─► Dunetrace SDK (hashes content → ingest events)
└─► Ingest API (POST /v1/ingest → Postgres)
├─► Detector (poll → 17 detectors → signals)
├─► Alerts (poll → explain → Slack / webhook)
└─► Customer API (runs, signals, explanations → dashboard)
LangChain, CrewAI, AutoGen, Haystack, LlamaIndex, TypeScript, and more
Fork, branch, change, make test, PR. For larger changes (new detectors, architecture changes), open an issue first.
Requires Python 3.11+, Node.js 18+, Docker + Docker Compose.





