Skip to content

surroundapps/provenance-core

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

provenance-core

The shared trust primitive. Third-party-issued, independently verifiable, revocable credentials; an append-only, tamper-evident hash-chained ledger; and deterministic reconciliation of two independently-grown logs into one verifiable timeline — surfacing actions taken during a disconnection on a credential that was revoked while the actor couldn't hear it.

It is domain-neutral on purpose. The same primitive applies to the three things that show up at any checkpoint:

  • the agent that acts — does it carry a revocable authorization, checked at the moment of action, that still works and reconciles when it can't phone home?
  • the good that moves — can an authority verify its provenance credential offline, with a pre-decided posture when the issuer is unreachable?
  • the person who presents — can a privacy-preserving, revocable eligibility credential be checked at the point of use, offline?

provenance-core is the layer those three reference implementations share, so the trust mechanism is written and audited once rather than reimplemented three times. This repository is that core and its tests; the implementations live in their own repositories and depend on it.

What's in here

Module What it is
canonical Deterministic JSON canonicalization (a documented JCS / RFC 8785 subset). The byte-stable foundation everything signed or hashed rests on. Rejects floats rather than mis-serializing them.
keys Ed25519 keypairs, signatures, and did:key identifiers. Identifiers are generated, not network-resolved — there is no resolution problem to solve in a single-issuer sandbox, and pretending otherwise would be unnecessary plumbing.
credentials W3C Verifiable Credentials: issue, verify (signature and revocation), with the two failure axes — forged vs. revoked — reported separately. The credential subject type is caller-supplied, so an agent, a good, or a person all use the same path.
ledger An append-only, hash-chained record. verify_chain walks it and names the first entry that fails, so a UI can point at exactly where the record stops being trustworthy. Tamper-evident, not immutable — the honest guarantee.
reconcile Merge of exactly two independently-grown chains into one deterministically-ordered timeline. Classifies each entry's credential standing into four honest cases, the headline being acted_during_blackout_on_revoked. Not distributed consensus — see the module docstring for the line it does not cross.

What's deliberately NOT in here

No allow/deny access gate — the ledger records what happened, it does not decide what is permitted, and keeping those separate is a load-bearing rule. No storage backend, no HTTP surface, no task or agent logic. Those belong to the implementations that consume this core. Keeping the core this small is what lets it stay auditable and lets three projects share it without dragging one project's domain into the others.

What provenance costs

Provenance is not free: a signed, revocable, hash-chained record is larger and slower than the bare claim it protects. The core is explicit about how much, so a real deployment can be sized rather than surprised. make bench (or python scripts/bench.py) prints the full table live; the figures below are deterministic and are pinned in tests/test_overhead.py so they cannot silently drift.

Space is exact and hardware-independent. Per credential, in this implementation's W3C VC shape with an Ed25519Signature2020 proof:

Component Cost
Ed25519 signature (raw) 64 bytes
proofValue (multibase base58) ≈88–89 chars
did:key identifier 56 chars
verificationMethod 105 chars
proof block (canonical JSON) 331 bytes
full credential (canonical JSON) 778 bytes
— of which the bare claim 97 bytes
— of which provenance overhead 681 bytes

Per ledger entry, the number worth extrapolating from is the fixed chaining metadata: 128 bytes per entry — two SHA-256 digests (prev_hash + entry_hash) rendered as 64 hex characters each. This is independent of the payload: a tiny entry and a 50 KB entry both carry exactly 128 bytes of tamper-evidence on top of their data.

Sizing follows directly. Because the chaining cost is payload-independent, a log of N sealed entries carries N × 128 bytes of chaining metadata over your data:

Entries Chaining metadata
1,000 128 KB
1,000,000 128 MB
1,000,000,000 128 GB

A signed credential adds a one-time ≈681 bytes of provenance over the bare claim, so a million issued credentials carry roughly 681 MB of provenance overhead. From these two rates — 128 bytes per sealed entry, ≈681 bytes per credential — you can size a deployment without running anything.

Time is hardware-dependent and is not pinned. The bench also reports sign/verify/canonicalize/seal/verify-chain timing, but those depend on the CPU and are labelled illustrative, name the machine they ran on, and are asserted in no test. Space is the number you can quote; time is the number you must measure on your own hardware.

Install

make dev          # or: pip install -e ".[dev]"

Runtime dependencies are intentionally minimal — the signing library and pydantic, nothing else. No web server, no database.

Test and bench

make test         # or: pytest -q
make bench        # print the overhead table above, computed live

The test suite covers canonicalization determinism and float rejection, credential issue/verify/revoke and the forged-vs-revoked split, hash-chain sealing and break detection, two-log reconciliation including the blackout-on-revoked finding, and the pinned space-overhead figures. No services, no network.

Design notes

The architectural decisions are recorded as ADRs in docs/adr/: canonicalization and signing (0001), the hash-chained ledger (0002), and two-log reconciliation as deterministic merge rather than consensus (0003). Each documents what was chosen, what was deliberately left out, and why.

The family

Four repositories, one shared trust primitive — this core, plus three reference implementations that apply it to each thing that shows up at a checkpoint:

  • provenance-core (this repository) — the shared primitive: revocable verifiable credentials, a tamper-evident hash-chained ledger, two-log reconciliation. Vendored into each implementation below via git subtree.
  • agent-provenancethe agent that acts: every action authorized by a revocable credential and sealed into a tamper-evident log, with offline reconciliation and forensic replay.
  • border-authoritythe good that moves: a checkpoint that verifies provenance offline and stays honest under a committed posture when the issuer is unreachable.
  • human-credentialthe person who presents: prove one attribute, reveal nothing else, consent by signing, checked at use time.

License

Apache-2.0. Copyright 2026 SurroundApps, Inc. Author: Zeeshan Khan.

About

The shared trust primitive behind the agent / good / person reference implementations: third-party-issued, revocable, independently verifiable credentials; a tamper-evident hash-chained ledger; deterministic two-log reconciliation. Ed25519 + W3C VCs, with an overhead bench so you can size it. Apache-2.0.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors