The shared trust primitive. Third-party-issued, independently verifiable, revocable credentials; an append-only, tamper-evident hash-chained ledger; and deterministic reconciliation of two independently-grown logs into one verifiable timeline — surfacing actions taken during a disconnection on a credential that was revoked while the actor couldn't hear it.
It is domain-neutral on purpose. The same primitive applies to the three things that show up at any checkpoint:
- the agent that acts — does it carry a revocable authorization, checked at the moment of action, that still works and reconciles when it can't phone home?
- the good that moves — can an authority verify its provenance credential offline, with a pre-decided posture when the issuer is unreachable?
- the person who presents — can a privacy-preserving, revocable eligibility credential be checked at the point of use, offline?
provenance-core is the layer those three reference implementations share, so
the trust mechanism is written and audited once rather than reimplemented
three times. This repository is that core and its tests; the implementations
live in their own repositories and depend on it.
| Module | What it is |
|---|---|
canonical |
Deterministic JSON canonicalization (a documented JCS / RFC 8785 subset). The byte-stable foundation everything signed or hashed rests on. Rejects floats rather than mis-serializing them. |
keys |
Ed25519 keypairs, signatures, and did:key identifiers. Identifiers are generated, not network-resolved — there is no resolution problem to solve in a single-issuer sandbox, and pretending otherwise would be unnecessary plumbing. |
credentials |
W3C Verifiable Credentials: issue, verify (signature and revocation), with the two failure axes — forged vs. revoked — reported separately. The credential subject type is caller-supplied, so an agent, a good, or a person all use the same path. |
ledger |
An append-only, hash-chained record. verify_chain walks it and names the first entry that fails, so a UI can point at exactly where the record stops being trustworthy. Tamper-evident, not immutable — the honest guarantee. |
reconcile |
Merge of exactly two independently-grown chains into one deterministically-ordered timeline. Classifies each entry's credential standing into four honest cases, the headline being acted_during_blackout_on_revoked. Not distributed consensus — see the module docstring for the line it does not cross. |
No allow/deny access gate — the ledger records what happened, it does not decide what is permitted, and keeping those separate is a load-bearing rule. No storage backend, no HTTP surface, no task or agent logic. Those belong to the implementations that consume this core. Keeping the core this small is what lets it stay auditable and lets three projects share it without dragging one project's domain into the others.
Provenance is not free: a signed, revocable, hash-chained record is larger and
slower than the bare claim it protects. The core is explicit about how much, so
a real deployment can be sized rather than surprised. make bench (or
python scripts/bench.py) prints the full table live; the figures below are
deterministic and are pinned in tests/test_overhead.py
so they cannot silently drift.
Space is exact and hardware-independent. Per credential, in this implementation's W3C VC shape with an Ed25519Signature2020 proof:
| Component | Cost |
|---|---|
| Ed25519 signature (raw) | 64 bytes |
proofValue (multibase base58) |
≈88–89 chars |
did:key identifier |
56 chars |
verificationMethod |
105 chars |
| proof block (canonical JSON) | 331 bytes |
| full credential (canonical JSON) | 778 bytes |
| — of which the bare claim | 97 bytes |
| — of which provenance overhead | 681 bytes |
Per ledger entry, the number worth extrapolating from is the fixed chaining
metadata: 128 bytes per entry — two SHA-256 digests (prev_hash +
entry_hash) rendered as 64 hex characters each. This is independent of the
payload: a tiny entry and a 50 KB entry both carry exactly 128 bytes of
tamper-evidence on top of their data.
Sizing follows directly. Because the chaining cost is payload-independent, a log of N sealed entries carries N × 128 bytes of chaining metadata over your data:
| Entries | Chaining metadata |
|---|---|
| 1,000 | 128 KB |
| 1,000,000 | 128 MB |
| 1,000,000,000 | 128 GB |
A signed credential adds a one-time ≈681 bytes of provenance over the bare claim, so a million issued credentials carry roughly 681 MB of provenance overhead. From these two rates — 128 bytes per sealed entry, ≈681 bytes per credential — you can size a deployment without running anything.
Time is hardware-dependent and is not pinned. The bench also reports sign/verify/canonicalize/seal/verify-chain timing, but those depend on the CPU and are labelled illustrative, name the machine they ran on, and are asserted in no test. Space is the number you can quote; time is the number you must measure on your own hardware.
make dev # or: pip install -e ".[dev]"
Runtime dependencies are intentionally minimal — the signing library and pydantic, nothing else. No web server, no database.
make test # or: pytest -q
make bench # print the overhead table above, computed live
The test suite covers canonicalization determinism and float rejection, credential issue/verify/revoke and the forged-vs-revoked split, hash-chain sealing and break detection, two-log reconciliation including the blackout-on-revoked finding, and the pinned space-overhead figures. No services, no network.
The architectural decisions are recorded as ADRs in docs/adr/:
canonicalization and signing (0001), the hash-chained ledger (0002), and two-log
reconciliation as deterministic merge rather than consensus (0003). Each
documents what was chosen, what was deliberately left out, and why.
Four repositories, one shared trust primitive — this core, plus three reference implementations that apply it to each thing that shows up at a checkpoint:
- provenance-core (this repository) — the shared primitive: revocable
verifiable credentials, a tamper-evident hash-chained ledger, two-log
reconciliation. Vendored into each implementation below via
git subtree. - agent-provenance — the agent that acts: every action authorized by a revocable credential and sealed into a tamper-evident log, with offline reconciliation and forensic replay.
- border-authority — the good that moves: a checkpoint that verifies provenance offline and stays honest under a committed posture when the issuer is unreachable.
- human-credential — the person who presents: prove one attribute, reveal nothing else, consent by signing, checked at use time.
Apache-2.0. Copyright 2026 SurroundApps, Inc. Author: Zeeshan Khan.