Dunetrace dunetrace

Dunetrace

Real-time failure detection for production AI agents - Slack alert within 15 seconds.

If Dunetrace helps you, consider giving it a ⭐ on top right, it helps others find the project.

The problem

AI agents fail silently:

✓ API returns 200 ✓ Logs are clean
✗ Agent called the same tool 12 times, burned $10, and gave the user a wrong answer

Langfuse and LangSmith answer "what happened?" — after you already know it broke. Dunetrace answers "is something breaking right now?" and fires an alert in 15 seconds.

Why it's different

	Dunetrace	Langfuse / LangSmith
When it fires	Within 15s of run completion	You query it after you notice a problem
What it watches	Structural failure patterns	Raw trace data
Alert channel	Slack / webhook / Dashboard	Dashboard only
Fix path	One-click prompt apply or GitHub PR	Manual

Quick Start

1. Start the backend

git clone https://github.com/dunetrace/dunetrace
cd dunetrace && cp .env.example .env
docker compose -f docker-compose.ghcr.yml up -d

2. Install the SDK

pip install dunetrace                       # Python
npm install dunetrace                       # Node.js / TypeScript

3. Instrument your agent

Python

from dunetrace import Dunetrace

dt = Dunetrace()

@dt.tool
def web_search(query: str) -> list: ...

@dt.trace
def my_agent(question: str) -> str:
    return web_search(question)[0]

TypeScript / Node.js

import { Dunetrace } from "dunetrace";
import OpenAI from "openai";

const dt     = new Dunetrace();
const openai = dt.wrapOpenAI(new OpenAI());

await dt.run("my-agent", { model: "gpt-4o" }, async (run) => {
  await openai.chat.completions.create({ model: "gpt-4o", messages });
  run.finalAnswer();
});

Try the built-in failure scenarios

cd packages/sdk-py

python examples/basic_agent.py                          # No LLM calls
SCENARIO=tool_loop python examples/langchain_agent.py   # TOOL_LOOP via LangChain
SCENARIO=failures python examples/decorator_agent.py    # TOOL_LOOP, RETRY_STORM, RAG_EMPTY_RETRIEVAL
SCENARIO=tool_loop python examples/langfuse_agent.py    # TOOL_LOOP + Langfuse explain

Open the dashboard: http://localhost:3000

Detectors

17 detectors run on every completed run — no configuration, no LLM.

Signal	What it catches
`TOOL_LOOP`	Same tool called repeatedly with identical args
`TOOL_THRASHING`	Oscillating between two tools, unable to commit
`RETRY_STORM`	Tool failing, agent retrying it repeatedly
`CASCADING_TOOL_FAILURE`	Multiple different tools failing in sequence
`CONTEXT_BLOAT`	Prompt tokens growing unsustainably across LLM calls
`LLM_TRUNCATION_LOOP`	Model output truncated repeatedly
`GOAL_ABANDONMENT`	Agent stopped using tools before finishing
`REASONING_STALL`	Too many LLM calls per tool call — agent deliberating in circles
`TOOL_AVOIDANCE`	Agent answered without using any tools
`RAG_EMPTY_RETRIEVAL`	Retrieval returned nothing, agent answered anyway
`EMPTY_LLM_RESPONSE`	Model returned an empty response
`FIRST_STEP_FAILURE`	Failed on the first step — config or setup issue
`SLOW_STEP`	Single step latency well above threshold
`STEP_COUNT_INFLATION`	Far more steps than the agent's baseline
`SESSION_LATENCY`	Wall-clock run time anomalously long vs per-agent baseline
`COST_SPIKE`	Total token consumption unusually high vs per-agent baseline
`PROMPT_INJECTION_SIGNAL`	Input matched adversarial injection patterns

Each alert includes: what fired, why it matters, a concrete fix, and a rate context line (first occurrence / recurring / systemic).

→ docs/detectors.md

Custom detectors — write a detector in plain English. Dunetrace translates it to a structured condition set, runs it in shadow mode against real traffic, and lets you review the fire rate before any alert fires. In the dashboard: Config → Custom detectors → Add detector.

Dashboard

Live at http://localhost:3000. Auto-refreshes every 15s.

→ docs/dashboard.md

Alerts

Slack and generic webhook (PagerDuty, Linear, custom).

SLACK_WEBHOOK_URL=https://hooks.slack.com/services/...
SLACK_MIN_SEVERITY=LOW   # LOW | MEDIUM | HIGH | CRITICAL

A weekly digest (Monday 9am UTC) summarises top failure types and systemic patterns. Enable with DIGEST_ENABLED=true.

→ docs/alerts.md

Diagnose with Langfuse

Connect Langfuse to get LLM-powered root-cause analysis on any signal. Click Explain + on any alert — Dunetrace fetches the full trace, extracts the system prompt, and returns a specific root cause and fix.

Prompt fixes → Apply via Langfuse creates a new prompt version in one click
Code/infra fixes → Open PR on GitHub creates a draft PR with a unified diff

Fix effectiveness is tracked automatically.

→ docs/integrate-langfuse.md

Policies

Runtime guardrails that fire mid-run — before a failure propagates.

dt.add_policy(
    name="cap tool calls",
    condition={"trigger": "tool_call_count", "operator": "gt", "value": 5},
    action={"type": "stop"},
)
dt.add_policy(
    name="cost cap",
    condition={"trigger": "cost_usd", "operator": "gt", "value": 0.50},
    action={"type": "switch_model", "params": {"model": "gpt-4o-mini"}},
)

Policies can also be created in the dashboard and fetched automatically by the SDK (60s TTL).

→ docs/policies.md

MCP server

Query agent signals directly from Claude Code, Cursor, or Codex — without leaving your editor.

pip install dunetrace-mcp

10 tools — ask your editor things like "what failed in the last 24 hours?"

Tool	What you can ask
`list_agents`	"Which agents are monitored and how healthy are they?"
`get_agent_signals`	"What failures did my agent have today?"
`get_agent_health`	"Show me the health score breakdown for my agent."
`get_signal_detail`	"Show me signal #42 with full evidence and fix code."
`get_agent_patterns`	"Is this failure systemic or a one-off?"
`get_run_detail`	"Walk me through run abc123 step by step."
`get_agent_runs`	"List recent runs for my agent with their status."
`search_signals`	"Show me all CRITICAL signals in the last 24 hours."
`summarize_agent`	"Give me a one-shot diagnosis of my agent."
`get_agent_token_stats`	"How much is my agent wasting on failed runs?"

Claude Code: registered automatically in ~/.claude.json after pip install dunetrace-mcp. Restart Claude Code to load.

Cursor: add .cursor/mcp.json to your project root:

{
  "mcpServers": {
    "dunetrace": {
      "command": "dunetrace-mcp",
      "env": {
        "DUNETRACE_API_URL": "http://localhost:8002",
        "DUNETRACE_API_KEY": "dt_dev_test"
      }
    }
  }
}

→ docs/mcp-server.md

Privacy

No raw content ever leaves your agent process. Every prompt, tool argument, and model output is SHA-256 hashed before transmission.

→ docs/architecture.md

Architecture

Agent Code
  └─► Dunetrace SDK        (hashes content → ingest events)
        └─► Ingest API      (POST /v1/ingest → Postgres)
                ├─► Detector       (poll → 17 detectors → signals)
                ├─► Alerts         (poll → explain → Slack / webhook)
                └─► Customer API   (runs, signals, explanations → dashboard)

Integrations

LangChain, CrewAI, AutoGen, Haystack, LlamaIndex, TypeScript, and more

Contributing

Fork, branch, change, make test, PR. For larger changes (new detectors, architecture changes), open an issue first.

Requires Python 3.11+, Node.js 18+, Docker + Docker Compose.

⭐ Star this if it saves you debugging time

Contact

dunetrace@gmail.com

License

Apache 2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly