Observability and evidence tooling for generative systems.
Tracebench helps developers collect, compare, replay, and validate traces through artifact signatures, comparison, clustering, and validation packets.
Observation before interpretation.
Tracebench is an open observability bench for generative systems.
It is designed for workflows where a system produces traces, artifacts, signatures, or run records and the user needs to ask:
- What happened?
- Can this behavior be compared across runs?
- Does a pattern recur under replay or perturbation?
- What evidence can be packaged for review?
- What should not be claimed from the evidence?
Tracebench focuses on observable structure and evidence packaging.
Tracebench does not infer meaning, intent, diagnosis, causality, governance status, or admissibility.
Tracebench does not issue receipts.
Validation packets are evidence bundles. Validation packets are not receipts.
trace-like input
-> artifact
-> signature
-> retrieval / comparison
-> clustering
-> validation packet
-> human review
The public reference release uses synthetic fixtures and generic artifact examples. Domain adapters, production telemetry integrations, private corpora, and deployment-specific provenance mappings are intentionally outside this public release.
Tracebench is an observational substrate.
The Constitutional Runtime Substrate is an admissibility substrate.
Tracebench asks:
What happened?
The Constitutional Runtime Substrate asks:
What may be committed?
Tracebench can be used independently. Teams that need reviewable commitment workflows may pair Tracebench with downstream admissibility systems such as the Constitutional Runtime Substrate.
Observation comes first. Commitment comes second. Neither replaces the other.
python -m pip install -e .
python -m pytest -qExpected result for this release:
19 passed
Apache License 2.0.