Skip to content
View dunetrace's full-sized avatar
  • https://dunetrace.com/
  • Berlin

Block or report dunetrace

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
dunetrace/README.md

Dunetrace

Dunetrace

Real-time failure detection for production AI agents - Slack alert within 15 seconds.

PyPI version Python versions PyPI Downloads npm version CI CodeQL GitHub Stars License: Apache 2.0 Discord

If Dunetrace helps you, consider giving it a ⭐ on top right, it helps others find the project.

Slack alert


The problem

AI agents fail silently:

  • ✓ API returns 200   ✓ Logs are clean
  • ✗ Agent called the same tool 12 times, burned $10, and gave the user a wrong answer

Langfuse and LangSmith answer "what happened?" — after you already know it broke. Dunetrace answers "is something breaking right now?" and fires an alert in 15 seconds.


Why it's different

Dunetrace Langfuse / LangSmith
When it fires Within 15s of run completion You query it after you notice a problem
What it watches Structural failure patterns Raw trace data
Alert channel Slack / webhook / Dashboard Dashboard only
Fix path One-click prompt apply or GitHub PR Manual

Quick Start

1. Start the backend

git clone https://github.com/dunetrace/dunetrace
cd dunetrace && cp .env.example .env
docker compose -f docker-compose.ghcr.yml up -d

2. Install the SDK

pip install dunetrace                       # Python
npm install dunetrace                       # Node.js / TypeScript

3. Instrument your agent

Python

from dunetrace import Dunetrace

dt = Dunetrace()

@dt.tool
def web_search(query: str) -> list: ...

@dt.trace
def my_agent(question: str) -> str:
    return web_search(question)[0]

TypeScript / Node.js

import { Dunetrace } from "dunetrace";
import OpenAI from "openai";

const dt     = new Dunetrace();
const openai = dt.wrapOpenAI(new OpenAI());

await dt.run("my-agent", { model: "gpt-4o" }, async (run) => {
  await openai.chat.completions.create({ model: "gpt-4o", messages });
  run.finalAnswer();
});

Try the built-in failure scenarios

cd packages/sdk-py

python examples/basic_agent.py                          # No LLM calls
SCENARIO=tool_loop python examples/langchain_agent.py   # TOOL_LOOP via LangChain
SCENARIO=failures python examples/decorator_agent.py    # TOOL_LOOP, RETRY_STORM, RAG_EMPTY_RETRIEVAL
SCENARIO=tool_loop python examples/langfuse_agent.py    # TOOL_LOOP + Langfuse explain

Open the dashboard: http://localhost:3000


Detectors

17 detectors run on every completed run — no configuration, no LLM.

Signal What it catches
TOOL_LOOP Same tool called repeatedly with identical args
TOOL_THRASHING Oscillating between two tools, unable to commit
RETRY_STORM Tool failing, agent retrying it repeatedly
CASCADING_TOOL_FAILURE Multiple different tools failing in sequence
CONTEXT_BLOAT Prompt tokens growing unsustainably across LLM calls
LLM_TRUNCATION_LOOP Model output truncated repeatedly
GOAL_ABANDONMENT Agent stopped using tools before finishing
REASONING_STALL Too many LLM calls per tool call — agent deliberating in circles
TOOL_AVOIDANCE Agent answered without using any tools
RAG_EMPTY_RETRIEVAL Retrieval returned nothing, agent answered anyway
EMPTY_LLM_RESPONSE Model returned an empty response
FIRST_STEP_FAILURE Failed on the first step — config or setup issue
SLOW_STEP Single step latency well above threshold
STEP_COUNT_INFLATION Far more steps than the agent's baseline
SESSION_LATENCY Wall-clock run time anomalously long vs per-agent baseline
COST_SPIKE Total token consumption unusually high vs per-agent baseline
PROMPT_INJECTION_SIGNAL Input matched adversarial injection patterns

Each alert includes: what fired, why it matters, a concrete fix, and a rate context line (first occurrence / recurring / systemic).

docs/detectors.md

Custom detectors — write a detector in plain English. Dunetrace translates it to a structured condition set, runs it in shadow mode against real traffic, and lets you review the fire rate before any alert fires. In the dashboard: Config → Custom detectors → Add detector.


Dashboard

Overview

Live at http://localhost:3000. Auto-refreshes every 15s.

docs/dashboard.md


Alerts

Slack and generic webhook (PagerDuty, Linear, custom).

SLACK_WEBHOOK_URL=https://hooks.slack.com/services/...
SLACK_MIN_SEVERITY=LOW   # LOW | MEDIUM | HIGH | CRITICAL

A weekly digest (Monday 9am UTC) summarises top failure types and systemic patterns. Enable with DIGEST_ENABLED=true.

docs/alerts.md


Diagnose with Langfuse

Connect Langfuse to get LLM-powered root-cause analysis on any signal. Click Explain + on any alert — Dunetrace fetches the full trace, extracts the system prompt, and returns a specific root cause and fix.

  • Prompt fixesApply via Langfuse creates a new prompt version in one click
  • Code/infra fixesOpen PR on GitHub creates a draft PR with a unified diff

Fix effectiveness is tracked automatically.

docs/integrate-langfuse.md


Policies

Runtime guardrails that fire mid-run — before a failure propagates.

dt.add_policy(
    name="cap tool calls",
    condition={"trigger": "tool_call_count", "operator": "gt", "value": 5},
    action={"type": "stop"},
)
dt.add_policy(
    name="cost cap",
    condition={"trigger": "cost_usd", "operator": "gt", "value": 0.50},
    action={"type": "switch_model", "params": {"model": "gpt-4o-mini"}},
)

Policies can also be created in the dashboard and fetched automatically by the SDK (60s TTL).

docs/policies.md


MCP server

Query agent signals directly from Claude Code, Cursor, or Codex — without leaving your editor.

pip install dunetrace-mcp
10 tools — ask your editor things like "what failed in the last 24 hours?"
Tool What you can ask
list_agents "Which agents are monitored and how healthy are they?"
get_agent_signals "What failures did my agent have today?"
get_agent_health "Show me the health score breakdown for my agent."
get_signal_detail "Show me signal #42 with full evidence and fix code."
get_agent_patterns "Is this failure systemic or a one-off?"
get_run_detail "Walk me through run abc123 step by step."
get_agent_runs "List recent runs for my agent with their status."
search_signals "Show me all CRITICAL signals in the last 24 hours."
summarize_agent "Give me a one-shot diagnosis of my agent."
get_agent_token_stats "How much is my agent wasting on failed runs?"

Claude Code: registered automatically in ~/.claude.json after pip install dunetrace-mcp. Restart Claude Code to load.

Cursor: add .cursor/mcp.json to your project root:

{
  "mcpServers": {
    "dunetrace": {
      "command": "dunetrace-mcp",
      "env": {
        "DUNETRACE_API_URL": "http://localhost:8002",
        "DUNETRACE_API_KEY": "dt_dev_test"
      }
    }
  }
}

docs/mcp-server.md


Privacy

No raw content ever leaves your agent process. Every prompt, tool argument, and model output is SHA-256 hashed before transmission.

docs/architecture.md


Architecture

Agent Code
  └─► Dunetrace SDK        (hashes content → ingest events)
        └─► Ingest API      (POST /v1/ingest → Postgres)
                ├─► Detector       (poll → 17 detectors → signals)
                ├─► Alerts         (poll → explain → Slack / webhook)
                └─► Customer API   (runs, signals, explanations → dashboard)

Integrations

LangChain, CrewAI, AutoGen, Haystack, LlamaIndex, TypeScript, and more

Contributing

Fork, branch, change, make test, PR. For larger changes (new detectors, architecture changes), open an issue first.

Requires Python 3.11+, Node.js 18+, Docker + Docker Compose.

⭐ Star this if it saves you debugging time

Star History Chart

Contact

dunetrace@gmail.com

License

Apache 2.0

Popular repositories Loading

  1. dunetrace dunetrace Public

    Real-time monitoring of production AI agents. No raw content transmitted.

    Python 54 10