provenance-core

The shared trust primitive. Third-party-issued, independently verifiable, revocable credentials; an append-only, tamper-evident hash-chained ledger; and deterministic reconciliation of two independently-grown logs into one verifiable timeline — surfacing actions taken during a disconnection on a credential that was revoked while the actor couldn't hear it.

It is domain-neutral on purpose. The same primitive applies to the three things that show up at any checkpoint:

the agent that acts — does it carry a revocable authorization, checked at the moment of action, that still works and reconciles when it can't phone home?
the good that moves — can an authority verify its provenance credential offline, with a pre-decided posture when the issuer is unreachable?
the person who presents — can a privacy-preserving, revocable eligibility credential be checked at the point of use, offline?

provenance-core is the layer those three reference implementations share, so the trust mechanism is written and audited once rather than reimplemented three times. This repository is that core and its tests; the implementations live in their own repositories and depend on it.

What's in here

Module	What it is
`canonical`	Deterministic JSON canonicalization (a documented JCS / RFC 8785 subset). The byte-stable foundation everything signed or hashed rests on. Rejects floats rather than mis-serializing them.
`keys`	Ed25519 keypairs, signatures, and `did:key` identifiers. Identifiers are generated, not network-resolved — there is no resolution problem to solve in a single-issuer sandbox, and pretending otherwise would be unnecessary plumbing.
`credentials`	W3C Verifiable Credentials: issue, verify (signature and revocation), with the two failure axes — forged vs. revoked — reported separately. The credential subject type is caller-supplied, so an agent, a good, or a person all use the same path.
`ledger`	An append-only, hash-chained record. `verify_chain` walks it and names the first entry that fails, so a UI can point at exactly where the record stops being trustworthy. Tamper-evident, not immutable — the honest guarantee.
`reconcile`	Merge of exactly two independently-grown chains into one deterministically-ordered timeline. Classifies each entry's credential standing into four honest cases, the headline being `acted_during_blackout_on_revoked`. Not distributed consensus — see the module docstring for the line it does not cross.

What's deliberately NOT in here

No allow/deny access gate — the ledger records what happened, it does not decide what is permitted, and keeping those separate is a load-bearing rule. No storage backend, no HTTP surface, no task or agent logic. Those belong to the implementations that consume this core. Keeping the core this small is what lets it stay auditable and lets three projects share it without dragging one project's domain into the others.

What provenance costs

Provenance is not free: a signed, revocable, hash-chained record is larger and slower than the bare claim it protects. The core is explicit about how much, so a real deployment can be sized rather than surprised. make bench (or python scripts/bench.py) prints the full table live; the figures below are deterministic and are pinned in tests/test_overhead.py so they cannot silently drift.

Space is exact and hardware-independent. Per credential, in this implementation's W3C VC shape with an Ed25519Signature2020 proof:

Component	Cost
Ed25519 signature (raw)	64 bytes
`proofValue` (multibase base58)	≈88–89 chars
`did:key` identifier	56 chars
`verificationMethod`	105 chars
proof block (canonical JSON)	331 bytes
full credential (canonical JSON)	778 bytes
— of which the bare claim	97 bytes
— of which provenance overhead	681 bytes

Per ledger entry, the number worth extrapolating from is the fixed chaining metadata: 128 bytes per entry — two SHA-256 digests (prev_hash + entry_hash) rendered as 64 hex characters each. This is independent of the payload: a tiny entry and a 50 KB entry both carry exactly 128 bytes of tamper-evidence on top of their data.

Sizing follows directly. Because the chaining cost is payload-independent, a log of N sealed entries carries N × 128 bytes of chaining metadata over your data:

Entries	Chaining metadata
1,000	128 KB
1,000,000	128 MB
1,000,000,000	128 GB

A signed credential adds a one-time ≈681 bytes of provenance over the bare claim, so a million issued credentials carry roughly 681 MB of provenance overhead. From these two rates — 128 bytes per sealed entry, ≈681 bytes per credential — you can size a deployment without running anything.

Time is hardware-dependent and is not pinned. The bench also reports sign/verify/canonicalize/seal/verify-chain timing, but those depend on the CPU and are labelled illustrative, name the machine they ran on, and are asserted in no test. Space is the number you can quote; time is the number you must measure on your own hardware.

Install

make dev          # or: pip install -e ".[dev]"

Runtime dependencies are intentionally minimal — the signing library and pydantic, nothing else. No web server, no database.

Test and bench

make test         # or: pytest -q
make bench        # print the overhead table above, computed live

The test suite covers canonicalization determinism and float rejection, credential issue/verify/revoke and the forged-vs-revoked split, hash-chain sealing and break detection, two-log reconciliation including the blackout-on-revoked finding, and the pinned space-overhead figures. No services, no network.

Design notes

The architectural decisions are recorded as ADRs in docs/adr/: canonicalization and signing (0001), the hash-chained ledger (0002), and two-log reconciliation as deterministic merge rather than consensus (0003). Each documents what was chosen, what was deliberately left out, and why.

The family

Four repositories, one shared trust primitive — this core, plus three reference implementations that apply it to each thing that shows up at a checkpoint:

provenance-core (this repository) — the shared primitive: revocable verifiable credentials, a tamper-evident hash-chained ledger, two-log reconciliation. Vendored into each implementation below via git subtree.
agent-provenance — the agent that acts: every action authorized by a revocable credential and sealed into a tamper-evident log, with offline reconciliation and forensic replay.
border-authority — the good that moves: a checkpoint that verifies provenance offline and stays honest under a committed posture when the issuer is unreachable.
human-credential — the person who presents: prove one attribute, reveal nothing else, consent by signing, checked at use time.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs/adr		docs/adr
provenance_core		provenance_core
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

provenance-core

What's in here

What's deliberately NOT in here

What provenance costs

Install

Test and bench

Design notes

The family

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

provenance-core

What's in here

What's deliberately NOT in here

What provenance costs

Install

Test and bench

Design notes

The family

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages