Skip to content

Fix: MPEG-DASH compliance and dkms fix#11

Open
irzhywau wants to merge 441 commits into
upstream/0.6-devfrom
fix/mpeg-dash-compliance
Open

Fix: MPEG-DASH compliance and dkms fix#11
irzhywau wants to merge 441 commits into
upstream/0.6-devfrom
fix/mpeg-dash-compliance

Conversation

@irzhywau

@irzhywau irzhywau commented Jul 2, 2026

Copy link
Copy Markdown

Summary

This branch brings the runtime to 0.5.0, headlined by standards-compliant MPEG-DASH / CENC media DRM (ELACITY-2283) and a hardening pass on the DKMS quorum key-authority plane (ELACITY-2282). Along the way it lands the creator publishing flow, per-capsule WASM resource limits, forensic A/V watermarking, a batch of serve/library performance work, and a security-review hardening pass. main has been merged in (--no-ff) and all conflicts resolved.

Scope note: this is a release-train branch (~218 commits, 339 files). Much of the line count is vendored/generated (Cargo.lock, compiled WASM, docs, audit packets); the reviewable surface is the Rust capsules/crates and the gateway.

What's in it

🎬 MPEG-DASH / CENC compliance (ELACITY-2283)

  • Producer emits standards-compliant CENC-signaled init segments (encv/enca sample entries carrying sinf/schm/tenc, pssh injection) so a stock CENC player / FFmpeg keys decryption off tenc.
  • The server-side decrypt rail strips the signaling back to a plaintext-looking init for its own player — one asset, one wire form for both compliant clients and the runtime.
  • Full-sample AES-CTR encryption with per-sample IVs; the enca/encv choice is driven by the track's authoritative hdlr handler type.
  • Touches ddrm-media, encrypt-provider, decrypt-provider, ddrm-envelope, elacity-player, ddrm-viewer.

🔑 DKMS quorum reliability + hardening (ELACITY-2282)

  • Thread-per-connection serving so a slow/idle client can't head-of-line-block the quorum node.
  • Host-independent tests; re-seal AAD bound into the recover possession-proof; single-use grant nonce refreshed on quorum retries.
  • Shared live revocation set (a revoke binds every open connection immediately), bounded connection concurrency, and pooled-connection retry — see the code-review pass below.

🎨 Creator publishing flow

  • Creator capsule UI, media-provider, gateway creator route, and marketplace/content-market contracts wiring the mint → publish path.

🧱 Per-capsule WASM resource limits

  • Enforces each capsule's declared memory budget (clamped to a host ceiling), plus table/instance limits, fuel metering, and epoch-based termination so a runaway capsule is operator-terminable. One shared Engine across capsules.

🕵️ Forensic A/V variant watermarking

  • Mint-side asset-secret KDF + variant manifest; serve-time per-buyer variant selection welded into the CENC AAD, exercised end-to-end on real fMP4.

⚡ Performance

  • Single-copy in-place CENC decrypt; boot-stable gateway DID memoization; dropped redundant per-segment capsule re-resolve in viewer authz; bounded (LRU) library cover cache; cached file-listing facts on the directory hot path.

🛡️ Security-review hardening pass

A focused pass on the DKMS/encryption workflows, each with regression tests:

  • Revocation is now shared live state (immediate across connections), not a per-connection snapshot.
  • Connection-concurrency cap + slow-loris deadline on the network quorum node.
  • Pooled DKMS connections retry once on a transport fault (idle-timeout) but fail closed on a genuine node rejection.
  • enca/encv classification driven by the authoritative hdlr type, with an expanded codec fallback.
  • read_dash_init propagates strip errors instead of masking them as opaque decrypt failures.
  • decode_kid16 rejects malformed (multibyte) KIDs instead of panicking the capsule.
  • Browser decrypt-stream sockets use per-euid 0700 dirs and refuse foreign-owned/writable dirs (closes a /tmp squatting vector).

📚 Audit & conformance

  • External-auditor packets for the dKMS decrypt plane, verified-safe scope-out registries, and build-visible conformance/audit-verdict ratchets.

Merge with main

main was merged in (--no-ff); 25 files conflicted and were resolved considering both branches. Highlights:

  • wasm.rs — unified ours' memory-clamp + epoch-termination with main's fuel metering, hostcall wiring, and wall-clock timeout into one execute_wasm.
  • gateway_browser_stream.rs — folded ours' socket hardening into main's browser_stream_socket_path(directory) refactor (covers both stream + adapter-IPC sockets).
  • provider_resource.rs — extended main's peer capability allowlist with ours' gossip ops.
  • gateway.rs — env trusted-signer override (main) before the DID cache (ours).
  • runtime_control.rs — main's portable pid_is_alive(); chat/session.rs — ours' fail-closed presence signing.
  • Config — unioned main's alignment checks + ours' audit recipe; components.json took wallet-provider.

Testing

  • Full elastos workspace builds clean.
  • elastos-server 928 tests pass; elastos-compute 16, chat 31, ipfs-provider 23, dkms-authority 27, encrypt-provider 31 (escrow) all pass.
  • WCI-alignment script and components.json validate.
  • End-to-end mint → playback pass confirmed.

Risk / compatibility

  • DKMS wire protocol unchanged for the deployed quorum nodes (recover-proof kept at v1 to match deployed nodes).
  • Published media is a single compliant CENC asset; legacy/unsigned inits remain a no-op through the strip path.
  • One pre-existing flaky test (recovery_kit_password_package_imports_with_password_only) fails only under full-suite parallelism (passes in isolation); unrelated to this branch.

SashaMIT and others added 30 commits June 18, 2026 22:09
… & EPUB hardening)

Two boundary holes on the object egress path, both fail-closed:

- (3) Serve-time content sniff. The raw `/bytes` egress trusted the mint-time
  `pixel_locked` flag, which trusts the creator-declared mime — so a renderable/
  scriptable document mislabeled with a non-pixel-lock mime could egress as raw
  plaintext. `viewer_object_bytes` now sniffs the DECRYPTED bytes (after
  authority.object(), before octet_stream) via a pure magic-byte `sniffs_as_lockable`
  (PDF / ZIP / raster image / SVG-XML) and returns 403 if a "raw" asset's content
  looks pixel-lockable. The one exception is an explicitly-declared `application/zip`
  (generic archive download). Buyer-safe: the declared mime lives in the signed
  descriptor, not buyer-controlled. Verified a PDF mislabeled as a 3D model (which
  shares this decrypt-passthrough handler) is now caught.

- (5) HTML-lock CSP/nosniff. EPUB chapters served as sanitised HTML now carry an
  enforced HTTP `Content-Security-Policy` with a `sandbox` directive
  (`default-src 'none'; img-src data:; style-src 'unsafe-inline'; font-src data:;
  base-uri 'none'; form-action 'none'; frame-ancestors 'self'; sandbox`) plus
  `X-Content-Type-Options: nosniff` and `Referrer-Policy: no-referrer`, so the
  document is sandboxed at the RESOURCE level by the browser even if loaded directly
  or framed without the attribute — the hand-rolled sanitiser is no longer the sole
  barrier. JPEG pages get `nosniff` only.

Out of scope (tracked follow-ups, not silently skipped): the media/stream egress
(`viewer_media`/MSE) is a second egress door with no sniff yet; text/code mislabel
needs heuristics (no reliable magic byte). The render direction is already
fail-closed via the parsers.

Gate: elastos-server viewer_object 7/7; clippy -D warnings clean.
Co-authored-by: Cursor <cursoragent@cursor.com>
…l-lock CSP

Make the protected-content docs state the watermark's true strength and the new
boundary defenses exactly (Principle 12 — docs/code/threat-model agree):

- Watermark forensic scope & privacy (THREAT_MODEL §3 row + §6.6; PROTECTED_CONTENT
  "Forensic strength & privacy"): the mark is UNKEYED and CRC-protected (not signed),
  so it is forgeable and repudiable — a deterrent/tracer, NOT court-grade evidence;
  the authenticated record is the §4 signed custody log. It is also NOT anonymous:
  both layers embed the full opening wallet (visible layer human-readable), so anyone
  who sees a rendered page de-anonymizes the buyer — the deliberate leak-attribution
  trade. Names the roadmap upgrade (authenticate the payload: MAC/opaque token).
- Pixel-bomb resource bounds (PROTECTED_CONTENT): documents decode_bounded
  (image::Limits), the PDF both-axes+area scale clamp, and the CBZ per-page/total caps.
- HTML-lock CSP (PROTECTED_CONTENT): documents the enforced HTTP CSP `sandbox` +
  nosniff containment order (HTTP CSP true layer ▸ meta/iframe belt ▸ sanitiser DiD).

Docs-only; alignment-check OK.

Co-authored-by: Cursor <cursoragent@cursor.com>
…d grant

Tier C (1), chunks 1-4: upgrade the invisible pixel-lock watermark from an
unkeyed CRC-only mark (forgeable + repudiable) to one ANCHORED IN THE BUYER'S
OWN WALLET SIGNATURE — so a leaked frame is non-repudiable and forgery rises
from "anyone can plant any wallet" to "only a party holding the victim's signed
grant can." Code and docs land together (Principle 12).

- Shared digest (ddrm-envelope): `grant_watermark_digest16(delegation_sig_hex)`
  = SHA-256(normalized EIP-191 delegation signature)[..16]. Lives in the crate
  BOTH the embedder and the verifier link, so they cannot drift. No new deps
  (sha2 already present).
- Payload codec (decrypt-provider/render/invisible.rs): new TAG_GRANT_DIGEST
  carrying `[wallet_prefix(4) | grant_digest(16)]` = 21 B <= the 24 B CAP, so
  the 232-bit PERIOD (and sparse-page recovery) is unchanged. `embed` takes the
  digest; `extract` refactored into `extract_raw` + `parse_grant_mark` so the
  verifier reads the raw anchor. No-grant/local-dev opens fall back to the
  compact wallet (back-compat).
- Wire (watermark.rs + media-authority quorum.rs): the authority appends an
  invisible-only `\u{1F}gd:<hex>` token to the stamp; `finalize` splits it back
  off so the VISIBLE mark stays the clean human `wallet . content . time` and
  only the INVISIBLE layer carries the authenticated digest.
- Verifier (main.rs): `--extract-watermark <img> [--verify-grant <grant.json>]`
  prints the wallet prefix + digest and reports MATCH/NO MATCH by recomputing
  via the shared fn. Gated on pq-envelope (always in the shipped render binary).
- Docs: THREAT_MODEL S3 row / S6.6 refreshed to the authenticated state and S4
  records the chunk-5 retention decision (option C: fold the digest into the
  existing tamper-evident audit record, TTL + access-controlled; status pending
  wiring). PROTECTED_CONTENT forensic-strength block + the invisible-layer
  description match. Honest bound kept explicit: the delegation signature is not
  a hard secret, so this is non-repudiation + raised-forgery, NOT full
  anti-framing; a server-key MAC / opaque custody token remains the north star.

Gates (capsules are not -D warnings gated by `just`; verified directly):
decrypt-provider compiles clean + render tests 59/59; media-authority 12/12
(incl. cross-crate digest agreement); ddrm-envelope digest test + 60 existing;
alignment-check OK.

Co-authored-by: Cursor <cursoragent@cursor.com>
…stody chain

Wire Tier C-1 chunk 5: fold the 16-byte authenticated grant digest (a
non-reversible commitment to the buyer's signed delegation — the same value the
invisible pixel-lock watermark embeds) into the existing append-only content_open
custody record, so a leaked frame is verifiable against an audit row WITHOUT a
second who-opened log or any raw wallet/grant retention (option C).

- audit.rs: optional grant_digest on AuditEvent::ContentOpen, serde-skipped when
  absent so prior records hash-verify unchanged; content_open() takes it; test
  proves backward-compat + chain verification with and without the anchor.
- viewer_open.rs: resolve the wallet-signed grant (fresh AND cached paths) ABOVE
  the custody write and derive grant_digest from the EXACT signature forwarded to
  the quorum, so the §4 record carries the anchor; malformed fresh grant still
  fails before any "opened" record is written. Media/no-grant opens -> None.
- elastos-server cannot link the PQ ddrm-envelope crate, so it carries a
  no-shared-dep twin (grant_watermark_digest16_hex) guarded by a golden vector
  cross-checked against ddrm_envelope::grant_watermark_digest16 in BOTH crates,
  pinning the trim+lowercase normalization so the two sides cannot drift.
- THREAT_MODEL.md §4: retention entry updated to "option C, wired" —
  minimization-via-non-reversibility, not TTL (the chain is intentionally
  permanent); records a TTL-prunable index as explicitly rejected (Principle 10).

Gates: ddrm-envelope golden, elastos clippy -D warnings (workspace), runtime
audit chain test, elastos-server golden, decrypt-provider + media-authority
tests, alignment-check — all green.

Co-authored-by: Cursor <cursoragent@cursor.com>
…closed-by-construction

Close the last two audit loose ends.

(1) Lowercase-address normalize on compare. The invisible mark recovers the EVM
wallet LOWERCASED (the 20 raw bytes carry no EIP-55 checksum casing), so any
attribution compare against a stored/expected address must normalize both sides
or a checksummed address would false-mismatch.
- render/invisible.rs: add normalize_evm_hex() (trim, strip 0x, lowercase) + a
  one-line test proving checksum casing compares equal.
- main.rs --verify-grant: advisory wallet cross-check — when the candidate grant
  JSON declares owner_address, confirm it matches the recovered 4-byte wallet
  prefix (both normalized). Fail-safe: advisory only, never overrides the digest
  verdict; pq-envelope-absent still returns 2 (no silent pass).

(2) HiDPI/Retina screenshot doc nuance (invisible.rs header + PROTECTED_CONTENT.md):
"same-resolution screenshot" means a 1:1 pixel-grid capture; a HiDPI/Retina
screenshot resamples (~2x) = rescaling = the already-unsupported case, so most
real-world HiDPI screenshots will not recover. Don't over-rely on it.

(3) THREAT_MODEL.md: reclassify the media/stream egress as CLOSED BY CONSTRUCTION,
not an open guard gap. The media tier serves only fMP4 from the ffmpeg
transcode+fragment ingest (media-provider prod, ddrm-media-authority dev): a
non-media file fails transcoding (no asset), and the pipeline re-encodes (AV1/AAC)
rather than -c copy, so source bytes never survive into served segments even for a
polyglot. With documents confined to the object tier (content-sniff guarded), no
media-tier sniff guard is needed. Re-open only for a bring-your-own pre-segmented
ingest or an ffmpeg -c copy/remux fast-path (would warrant a segment-0 mdat sniff).

Gates: decrypt-provider invisible tests (pdf-render,pq-envelope) 13 pass incl new;
rustfmt --check clean on both touched files; clippy introduces no new warnings;
alignment-check OK.

Co-authored-by: Cursor <cursoragent@cursor.com>
…n Linux CI

The canonical gate and CI both scoped to `cd elastos && cargo --workspace`, which
does NOT reach the crates this branch's protected-content work lives in
(capsules/decrypt-provider, capsules/ddrm-envelope, scripts/dev/ddrm-media-authority).
Their 217 tests — watermark codec, grant-digest envelope, media-authority — had
ZERO automated coverage; they were gated by hand each commit.

- justfile: add `verify-capsules` (build+test the capsule crates under their
  CANONICAL feature sets, matching scripts/dev/run-creator-gateway.sh:
  decrypt-provider = rail-stream,rail-mint,pdf-render,pq-envelope;
  ddrm-envelope = access-grant; media-authority = default) and fold it into
  `verify`, so the repo's "definition of green" finally covers the whole surface
  (Principle 12: the gate must match reality). clippy -D warnings is deliberately
  held back for the capsules (pre-existing lint debt); build+test is the real
  regression gate. Verified: all three are rustc -D-warnings-clean under these
  features, so the workflow's global RUSTFLAGS does not break them.

- ci.yml: add a `verify` job (ubuntu, installs `just`, runs the full `just verify`
  incl. the Linux-only carrier smoke the macOS dev box can't run) and an isolated
  `capsules` job (`just verify-capsules`) so a heavy/flaky smoke run can never mask
  a capsule regression. Add `workflow_dispatch` so this feature branch can be put
  through the full Linux gate on demand before merge.

This is the last gate between the branch and truly-done: turns "manually covered"
into "full green on Linux".

Co-authored-by: Cursor <cursoragent@cursor.com>
Add the feature branch to the push trigger so the full Linux gate (verify +
capsules) runs on our own work in isolation, without a PR to main. This entry
lives only on the branch and does not affect main or other branches until merge.

Co-authored-by: Cursor <cursoragent@cursor.com>
First Linux CI run surfaced two real issues the macOS box could not (just verify
aborts at the Linux-only smoke before reaching fmt):

- viewer_object.rs (landed in the Tier B-3/D-5 commit) was not rustfmt-clean — 6
  long-line/comment violations. cargo fmt -p elastos-server fixes only that file.
- the `verify` job failed at its first step (just alignment-check) because the
  GitHub runner has no ripgrep, which check-wci-alignment.sh requires. Install it
  before `just verify`. (The capsules job needs no rg and already passed green.)

Co-authored-by: Cursor <cursoragent@cursor.com>
…ider binary

Linux CI surfaced this: chain_mode_without_wallet_fails_closed expected the
"wallet not linked" fail-closed error but instead hit "rights-provider not found"
because decide_owned_access resolved/checked the capsule binary BEFORE validating
the subject wallet. On a clean runner (no pre-built capsule) the binary check
fired first, the test panicked, and its panic poisoned ENV_LOCK — cascading into
release_build_defaults_to_chain_and_refuses_dev_rights_modes.

Reorder so subject/wallet validation runs first: a chain-mode request with no
linked wallet is invalid on its face and must fail closed before we resolve or
spawn any external binary. This is both more correct (don't spawn a subprocess for
an obviously-invalid request) and makes the unit test hermetic (it is not an
#[ignore]'d integration test, so it must not depend on a built capsule). Verified
with ELASTOS_RIGHTS_PROVIDER_BIN=/nonexistent: both tests pass.

Co-authored-by: Cursor <cursoragent@cursor.com>
The verify job got through alignment-check + ripgrep but failed in
local-carrier-setup-smoke with `error[E0463]: can't find crate for std`: the smoke
builds the Home capsules (capsules/home-cli and friends) to wasm32-wasip1, and the
runner's stable toolchain ships only the host target. Add `targets: wasm32-wasip1`
so the smoke's wasm build has std. The other four jobs are host-only and unaffected.

Co-authored-by: Cursor <cursoragent@cursor.com>
…hain has its std

The verify smoke still failed with E0463 after adding the target to the dtolnay
@stable step: rust-toolchain.toml pins channel 1.89.0, so every cargo invocation
uses 1.89.0 — not stable — and the wasm target had been added to the wrong
toolchain. Declare `targets = ["wasm32-wasip1"]` in rust-toolchain.toml so rustup
auto-installs the wasm std for the pinned toolchain everywhere (CI and local), and
drop the now-redundant `targets:` from the workflow step. Verified locally: the
home-cli wasm build compiles clean.

Co-authored-by: Cursor <cursoragent@cursor.com>
… for GitHub Actions

The full `just verify` cannot complete on a stock GitHub runner: its
`local-carrier-setup-smoke` step fetches the net-provider artifact over Elastos
Carrier, which a clean runner can't reach (proven on CI: it builds + runs the
entire ~18-min gate and fails only there). Everything else a clean runner CAN
verify.

- justfile: add `verify-ci` = the full gate MINUS the carrier smoke, with a hidden
  `_verify-tail` shared by both `verify` and `verify-ci` so they can't drift.
  alignment-check stays first in both. `just verify` (with the carrier smoke) is
  unchanged for a Carrier-capable Linux box / self-hosted runner.
- ci.yml: the Linux job now runs `just verify-ci` (renamed "Verify (Linux CI
  gate)") and documents that the carrier smoke is covered separately.

This lands the branch's surface — incl. the 217-test capsule gate and the full
elastos workspace fmt/clippy/test — under an enforceable green GitHub Actions gate.

Co-authored-by: Cursor <cursoragent@cursor.com>
Fold the off-tree AV-watermarking feasibility study (verdict: GO) into the
roadmap doc, with the audit caveats baked in rather than the harness's headline
claims:

- New Phase 0 (top of §5): video survival matrix, audio matrix, registration
  result, and the grant-anchored Tardos collusion chain.
- FP correction: the harness's single-seed empirical threshold (mean+3.5sigma)
  is flagged as ~1.25% false-accusation (400-trial Monte-Carlo); a certified
  bound now requires the analytic Tardos threshold + an MC FP/FN sweep, and the
  per-asset bound is recomputed at the FP-controlled threshold (duration
  minimums move up).
- New §3.4 Channel coding (required): the leak channel is bursty (whole-segment
  loss) -> timeline interleaving + an erasure-aware code; wired into chunks 2/6.
- Audio re-validation made concrete (chunk 6): psychoacoustic masking model +
  PEAQ/ODG + human A/B/X on real music/speech/silence, and time-stretch/pitch.
- Multi-strategy collusion (random/minority/all-ones/interleaving) mandated
  before any certified bound.
- Registration -> Phase 5 gating DSP item (deterministic template/pilot or
  log-polar/Fourier-Mellin; brute search proven insufficient).
- Full-variant-set AAD weld in §3.1/§4 (CEK binds the complete variant set;
  per-buyer selection is post-unwrap routing).
- §8 resolved (ECC->Tardos, q-ary density lever, published per-asset bound at
  the FP-controlled threshold, channel-coding requirement); §7 honest-limits
  expanded; VMAF 96.7 demoted from gate to synthetic relative signal.

Doc-only; no shipped behaviour. alignment-check green. Fix Widevine typo in §7.

Co-authored-by: Cursor <cursoragent@cursor.com>
…review

The "approve" step of the control loop (reflect → preview → APPROVE → act),
parallel-safe and read-only.

- elastos-runtime::approval (new, pure): `decide(mode, approver)` is fail-closed
  — the only path to Approved without an explicit yes is an affordance declared
  as needing no approval; User/RuntimePolicy default to PendingApproval; an
  explicit no always wins. `required_approval(actions)` scales the requirement
  with action strength (anything beyond read/message needs a human). 3 tests.
- inspect/intent (new provider op, read-only): given a capsule + operation,
  derives the gate (via plan), the approval it requires, and the fail-closed
  default decision. Records nothing, dispatches nothing.
- Gated consistently: `intent` added to the canonical op→action contract (Read)
  and the System-only browser allow-list.
- Decisions: `revoke` and recorded approve/deny stay on the runtime/dispatch
  (mutation) path — the product InspectProvider remains a read-only projection.
  Recording pairs with dispatch (merge-gated).

fmt --check PASS; targeted tests green (approval 3, inspect incl. intent 31 +2
ratchets ignored, provider_resource contract 1).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016ZKy5Cca9RzwDuLb1szdeq
- CAPSULE_INSPECTOR.md: add the inspect/intent wire contract (approval-intent
  preview); add a "path note" clarifying revoke + self are served on the embedded
  RequestHandler (shell) path while the product InspectProvider is a read-only
  projection (capsules/capsule/plan/intent) — closes the contract-honesty gap.
- KNOWN_GAPS.md: G4 decision core DONE (approval + intent, fail-closed, tested);
  remaining = recording a signed approve/deny, which pairs with dispatch (G3).

(An orchestrator CLAUDE.md was written locally but is .gitignored by repo policy,
so it stays a local contract and is not committed.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016ZKy5Cca9RzwDuLb1szdeq
Pre-mainnet hardening from the deep audit (none block the branch; ① is the
item to put in front of the external auditor):

① Document the dKMS re-seal AAD invariant — the node re-seals the recovered
   CEK under the caller-supplied aad_b64, which is NOT bound into the recover
   possession-proof; safe only because the decrypt boundary rebuilds the
   segment-bound AAD and fails closed. Loud comment at the seal_bound call +
   THREAT_MODEL §7 note. Binding it into the proof is scoped with the auditor.
② Lock the release-build invariant — a compile_error! rejects a release build
   (no debug_assertions) of dkms-authority with dev-modes/legacy-receipt-authz,
   and a new CI job (dkms-release-invariant) asserts both directions. Adds
   docs/DEPLOY_CHECKLIST.md (incl. the node-set-id authorize-time guard, which
   is release-only and not unit-testable under cfg(test)).
③ Redact key-provider Debug — manual Debug on Request/ReleaseSessionContext
   prints only the op name, so no CEK/escrow bytes can leak via {:?}.
④ viewer_open — log_fp(&object_cid) for the fresh-grant line (was the raw cid).
⑤ VENDORING.md — three.js r160 pin + periodic-refresh/upstream-watch plan.

Plus a fail-closed dKMS-open testing checklist in DKMS_OVER_CARRIER.md
(rights-mode + Carrier-rail must match how the asset was minted) so the
"foreign escrow" 502 diagnosis doesn't recur.

Gates: elastos-server fmt+clippy; dkms-authority build (debug+release) +
24/24 tests + guard verified; key-provider build + 52/52 tests; alignment-check.

Co-authored-by: Cursor <cursoragent@cursor.com>
…word + offline extractor

AV forensic-variant layer, tractable + pipeline-free pieces built on the proven
Phase-0/5 research. Feature-gated OFF by default (`av-variants`), so it cannot
destabilize the default build; chunks 3/4/5 (mint transcode DSP, full-variant-set
AAD weld, serve-time selector) are deferred to the live CENC/DASH/quorum pipeline.

Chunk 1 — variant manifest schema (`elastos.ddrm.av-variants/v1`) in
  capsules/ddrm-envelope/src/av.rs: marked subset, q-ary variant refs (+ segment
  digest for the chunk-4 weld), codeword scheme (length/interleave/erasure τ/bias
  commitment). serde round-trips; validate() fails closed; single_encode() is the
  honest `fingerprinted:false` default.
Chunk 2 — canonical, RNG-free codeword: asset_bias_vector / buyer_codeword (from
  grant_watermark_digest16, no per-buyer storage) / interleave_map / tardos_score.
  A domain-separated SHA-256 stream over integers (NOT any language's RNG), so the
  Rust serve selector and the Python extractor derive identical codewords. Replaces
  the Phase-0 numpy-RNG derivation.
Chunk 6 — offline forensic extractor as the proven Python reference under
  tools/av-forensics/ (offline, operator-run, no key material, not in the boundary),
  re-anchored to the chunk-2 canonical construction. The load-bearing FM fix is
  preserved: register() resolves the Fourier-Mellin scale/rotation ambiguity on the
  VALID (non-border) region. The Rust --extract-av-fingerprint CLI is deferred until
  the scheme is frozen/certified.

Cross-language anti-drift weld: tools/av-forensics/test_canonical.py asserts the same
golden vectors as av::tests::canonical_golden_vectors — change either side and both
fail. Wired into `just verify-capsules` (now also tests ddrm-envelope with
av-variants), so CI covers the new module + the weld. Pure stdlib (no numpy/ffmpeg).

Still uncertified (carried honestly in docs/AV_WATERMARKING.md): analytic Tardos
threshold + Monte-Carlo FP/FN sweep (argmax is not proof), rotation estimator
(out of envelope), audio on real content. AV remains key-protected, not fingerprinted,
until chunks 3/4/5 ship and the certification gates pass.

Gates: ddrm-envelope 51 tests (av-variants, incl. golden vectors); default build
unaffected (module gated off); av.rs clippy-clean; cross-language weld PASS; ported
extractor validated end-to-end (FM-reg → bitERR 0, leaker ranked top; no-reg fails
closed); just verify-capsules PASS; just alignment-check OK.

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace the Phase-0 empirical mean+kσ accusation threshold (Monte-Carlo
showed ~1.25% FP — not certifiable) with the analytic, FP-controlled
threshold Z = √m·Φ⁻¹(1−ε/N): the innocent symmetric-Tardos score is
exactly mean-0, variance-1 per kept position ⇒ N(0,m).

- canonical.py: tardos_threshold + _inv_norm_cdf (Acklam, pure stdlib,
  extractor-side only — not a cross-language weld surface).
- extractor.py: accuse only above the analytic Z (erasure-aware m), not
  an ad-hoc gap.
- montecarlo.py: multi-strategy FP/FN sweep (random/majority/minority/
  all-ones/all-zeros/interleave). 2000 trials, m=2332 N=500 c=3 ε=1e-3
  BER=0.13 ⇒ FP ≤ ε with 100% detection across all six; old empirical
  threshold runs 2–10× over ε (majority ≈1.05%).
- test_canonical.py: stdlib threshold sanity (Φ⁻¹(0.975)≈1.96,
  monotonicity, Z(2332,500,1e-3)=222.69) — runs in the CI weld.

Code-level accusation statistics only; media-survival certification
(real content/screen-record/CMAF lengths) remains open. Docs updated.

Co-authored-by: Cursor <cursoragent@cursor.com>
Leads with the one deliberately-open invariant — the re-seal AAD is the
caller-supplied aad_b64 and is NOT bound into the recover possession-proof
(dkms-authority recover → seal_bound, src/main.rs:1028). Safe today only
because the single consumer (decrypt boundary) rebuilds the segment-bound
AAD and fails closed. Packages the SECURITY INVARIANT comment, THREAT_MODEL
§7, and the DEPLOY_CHECKLIST open item into one hand-off with the trust
boundary, crypto roots, CI-enforced release invariants, repro gates, and a
reviewer checklist (incl. the landing test: tampered aad_b64 fails the
possession-proof closed at the node).

Co-authored-by: Cursor <cursoragent@cursor.com>
…pre-mainnet invariant)

The dKMS node re-seals a recovered CEK under the caller-supplied `aad_b64`,
which was NOT bound into the recover possession-proof. A MITM that tampered
`aad_b64` in transit could make the node seal under an AAD of its choosing;
it was safe only because the decrypt boundary independently rebuilt the AAD
and failed closed (a compensating control, not a fix).

Now the canonical possession-proof preimage binds `sha256(reseal_aad)`
(`ddrm_envelope::recover_proof_message`, domain bumped v1 -> v2). The client
signs over the exact AAD it sends (key-provider), and the node verifies the
proof over the byte-identical `args.aad_b64` in `verify_session` BEFORE any
CEK is recovered or re-sealed. The AAD (DecryptTranscriptV1) already carries
`node_set_id` + `segment_digests`, so all three are bound transitively; the
32-byte digest keeps the preimage bounded for long presentations.

A MITM cannot re-sign the proof (it lacks the token-bound caller key), so a
tampered `aad_b64` now fails closed at the node (`session_invalid`). The
decrypt boundary's rebuild remains as defense-in-depth.

- ddrm-envelope: recover_proof_message/sign/verify take `reseal_aad`; bind
  sha256; bump DKMS_RECOVER_DOMAIN to /v2; unit test asserts tampered-AAD ->
  verify=false.
- dkms-authority: verify_session verifies over decode(args.aad_b64) before
  recover; SECURITY INVARIANT comment rewritten to CLOSED; landing test
  recover_fails_closed_on_a_tampered_aad (35 legacy / 25 default tests green).
- key-provider: recover_proof_b64 + both delegate paths sign over the
  request's aad_b64.
- dev harnesses (ddrm-runtime-open, dkms-live-recover): each direct node
  recover signs over its request AAD.
- docs: THREAT_MODEL §7 + DEPLOY_CHECKLIST + AUDITOR_PACKET §1 flipped
  open -> closed, with the landing test referenced.

Gates: ddrm-envelope + dkms-authority tests, key-provider/dev-script builds,
verify-capsules, alignment-check all green.

Co-authored-by: Cursor <cursoragent@cursor.com>
…3+4 core)

ddrm-envelope::av gains the pure, fail-closed serve-time selector
(select_symbols) and the full-variant-set commitment (variant_set_commitment)
that chunk 4 welds into the decrypt transcript. The selector binds the
per-asset bias commitment (wrong secret -> refuse), supports arity-2 A/B
(direct codeword->segment mapping, matching the proven tools/av-forensics
extractor), and returns an empty selection for an honest single-encode.

DecryptTranscriptV1 gains to_aad_with_all_bindings, a strictly-extending
encoder that appends the variant-set commitment AFTER the rights binding, so
a non-fingerprinted open stays byte-identical to to_aad_with_bindings (all
committed goldens replay unchanged) while a fingerprinted open is bound to the
exact published variant set (manifest swap / out-of-set variant fails the CEK
unwrap closed). Pure functions, fully unit-tested; no pipeline wiring yet.

Co-authored-by: Cursor <cursoragent@cursor.com>
asset_secret_from_master derives the per-asset watermark secret from a
node-held master + the content hash, so the mint embed and the serve selector
agree on the bias/codebook without ever publishing or per-asset-storing it
(the manifest carries only the bias commitment; rotating the master re-keys
every asset). build_manifest assembles + validates a fingerprinted
VariantManifestV1 from produced variants (canonical interleave + bias
commitment), or returns the honest single-encode for an empty marked set.

A round-trip test closes the mint->serve loop: build_manifest keyed by the
derived secret produces a manifest that select_symbols (same secret) accepts,
and distinct buyers select distinct variant sets. Pure functions, tested.

Co-authored-by: Cursor <cursoragent@cursor.com>
… open

Mark the pure core of chunks 3/4/5 as landed (selector, variant-set AAD weld
encoder, manifest builder, per-asset secret KDF — all in ddrm-envelope::av/
lib.rs, fail-closed + unit-tested) and spell out precisely what remains: the
pipeline WIRING (ddrm-media-authority serve selection, decrypt-provider AAD
rebuild, mint emit) plus the real perceptual DSP (bounded-placeholder seam now;
certified embed swaps in post media-survival cert). Adds a "remaining wiring"
section with exact files and the one thing needed to validate end-to-end (a
gateway bring-up with a synthetic asset; real media only for the perceptual
cert). Notes the interleave-application follow-up as tracked, not dropped.

Co-authored-by: Cursor <cursoragent@cursor.com>
The local 2-of-3 stand-in nodes need the dev-modes legacy-receipt path to
authorize an offline recover (the live quorum uses wallet-signed grants); the
gateway dev script already builds the node this way. Without it the smoke fails
closed ("legacy receipt authorization is disabled") even on an unmodified tree.
With it the helper recovers a minted asset byte-identically (3/3 served).

Co-authored-by: Cursor <cursoragent@cursor.com>
…eam)

embed_placeholder_variant appends an ignorable ISO-BMFF `free` box carrying the
variant symbol AFTER the mdat, so the fragment stays valid/playable but byte-
distinct per symbol; encrypt_fragment (CENC) and strip_senc (decrypt) both carry
it through verbatim, so the selected variant is byte-distinct end-to-end and the
symbol survives back to the clean fragment. read_placeholder_variant recovers it
(the placeholder stand-in for the offline extractor). Explicitly NOT a watermark
(no perceptual signal, no transcode survival) — it makes mint->serve->select->weld
real and testable; the certified DSP embed swaps in behind the same interface
post-cert. Tested end-to-end through the CENC rail on the real ffmpeg fixture.

Co-authored-by: Cursor <cursoragent@cursor.com>
A gated integration test (cargo test -p ddrm-media --features av-variants) that
runs the exact functions the production wiring will call, on the real ffmpeg
fragmented-MP4 fixture + real CENC: mint per-segment {A,B} variants -> build
manifest -> per-buyer select_symbols -> read selected ciphertext -> weld
segment-digests + variant_set_commitment into the transcript AAD -> extract.

Proves: two buyers get distinct codewords -> distinct served bytes + distinct
welded AADs (identical only where symbols coincide); substituting a served
variant OR forging the manifest changes the AAD (fail-closed at the CEK unwrap);
decrypting a served variant recovers that buyer's symbol; and the single-encode
path is byte-identical (the fingerprint layer is strictly additive). The server
IPC wiring (creator mint / media-authority serve / decrypt-provider rebuild) now
plugs into proven libraries.

Co-authored-by: Cursor <cursoragent@cursor.com>
The re-seal AAD hardening (39fead5) bumped the recover possession-proof
preimage from v1 -> v2 (added sha256(reseal_aad)). The live geo nodes still
verify v1, so every recover proof from a freshly-built key-provider was
rejected on all nodes (0-of-N served) -> 502 "could not open owned media
from the dKMS quorum". The session handshake uses the unchanged session
domain, so it passed, masking this as an open bug rather than a protocol skew.

Revert the client + local node to v1 so opens succeed against the deployed
quorum. The AAD-binding hardening (a real pre-mainnet invariant) is now
STAGED, not active: it must ship together with a coordinated geo-node
redeploy, never client-only on a branch that opens the live quorum.

Adds the failure mode to the Carrier runbook symptom->cause table and a
"Protocol compatibility invariant" section so this cannot silently recur.

Co-authored-by: Cursor <cursoragent@cursor.com>
…live gateway

Production wiring on top of the AV core (chunks 3/4/5):

- encrypt-provider: opt-in `av_variants` on seal_segments_threshold emits
  per-segment byte-distinct variants + a bias-committed manifest, keyed by an
  asset secret derived from ELASTOS_AV_MASTER_B64 (the master never crosses
  the server boundary). Honest no-op when not provisioned.
- creator.rs: forwards the emitted variant files + av-variants.json INTO the
  published DASH directory (inside the asset CID), gated by ELASTOS_AV_VARIANTS.
- ddrm-media-authority: apply_variant_selection picks the buyer's variant from
  the wallet grant before the 2-of-3 recover/CENC weld; fail-closed on a
  bias-commitment (wrong-master) mismatch, honest single-encode fallback when
  there is no manifest / master / grant. Surfaces `fingerprinted` on the
  media descriptor.
- run-creator-gateway.sh: enables AV with a persistent bias master shared by
  the mint boundary and the serve helper (both inherit the gateway env).

Verified live: a minted asset's CID carries av-variants.json (fingerprinted,
arity 2, bias commitment) + two byte-distinct variant segments; the open
recovers 2-of-3 and serves the per-buyer variant. NOTE: variants are the
bounded placeholder embed (ignorable ISO `free` box) -- routing + crypto weld
proven; perceptual DSP survival is still the deferred certification step.

Co-authored-by: Cursor <cursoragent@cursor.com>
…DID memoization

Two serve-hot-path optimizations. Both are pure efficiency: no change to the
authorization decision, the CEK containment/zeroization, or the served bytes.

1. decrypt-provider CENC: decrypt the mdat range IN PLACE instead of copying the
   mdat out, decrypting it, then rebuilding the whole segment. Collapses two
   full-segment copies into one per served segment (MB-scale per fragment).
   `decrypt_samples` keeps its exact Vec-returning signature (the PC2 conformance
   driver pins it); the live path uses the new `decrypt_samples_in_place`. Output
   is byte-identical — proven by the end-to-end golden segment test and the
   CEK-containment smoke check (143/143 decrypt-provider tests green).

2. gateway: memoize the boot-stable gateway runtime DID so per-request home
   launch-token verification no longer re-reads device.key + re-derives the
   ed25519 identity on every protected-asset fetch. The DID is deterministically
   derived from the device key (see docs/CARRIER.md) and is boot-stable. Only the
   POSITIVE result is cached — a missing identity keeps being re-checked, and the
   cached value can never widen authority. The signature / expiry / session-active
   checks that actually authorize a request stay fully per-request.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VjFQt6DK9ZGnLs4ykUWsuX
…anchor verdict

Commit 2 of the audit follow-up, run through the orchestrator pipeline. Scope shrank
honestly: the headline finding H1 traced to safe-by-construction, so it is CLEARED
(documented + pinned), not "fixed" with churn.

M3 (real, reachable by untrusted input): parse_trun / parse_senc read a u32
`sample_count` straight from the (untrusted) segment and drove an unbounded
pre-allocation AND read loop before the truncated-buffer read fails — a forged count
(e.g. 0xFFFFFFFF), or a degenerate no-flags trun that reads 0 bytes/entry, OOMs via
push-growth. Reject an implausible count up front (fail-closed) with a generous 1<<20
ceiling that never rejects real fMP4 (fragments carry at most a few thousand samples).
The subsample count is a u16 and already self-bounding — left as-is. New tests assert
a huge count fails closed and a normal count still parses.

H1 (cleared, safe-by-construction): the forensic watermark anchor is derived from the
client-supplied delegation signature before the gateway verifies it, which *looks* like
it trusts an unverified sig. Traced to ground: a forged signature fails the dKMS node's
own verify_access_grant (EIP-191 owner recovery) + on-chain hasAccessByContentId, so no
CEK is recovered, no decrypt happens, and the watermark embed (only after a successful
quorum recover) never runs — a forged anchor can never reach an egressed, decrypted
frame. Pinned by a new access.rs invariant test
(delegation_sig_from_wrong_wallet_fails_closed) and recorded in the
PRINCIPLES_CONFORMANCE "do not re-churn" register so a future pass doesn't re-open it.

Gate: just verify-capsules components all green — decrypt-provider 146, ddrm-envelope
76, ddrm-media-authority 15, python canonical weld PASS. No CEK-path or served-byte
change; the cross-language AV golden weld is unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VjFQt6DK9ZGnLs4ykUWsuX
SashaMIT and others added 28 commits July 1, 2026 18:40
…fail-closed scope rules

- plan emits elastos.inspect.gate-preview/v1 (capabilities, audit events, execution
  policy, dispatch:false) so inbox gate summaries show real authority again
- revoke is an explicit unsupported_operation, not a silent fallthrough
- provider_resource gains inspect_resource(op) so unknown inspect ops fail closed
- restores the four inbox-approval gateway tests (fresh passkey, principal scoping,
  deny-without-dispatch) and provider authority/redaction tests
- docs: Act path and runtime scope-rule expectations, corrected inspect/self routing

Co-authored-by: Cursor <cursoragent@cursor.com>
…rces

- dkms-authority: deny_unknown_fields on the Request enum so hidden authority
  fields fail closed; lockfiles pick up elastos-common 0.5.0
- creator/ddrm-viewer: reword raw chain/backend references so app capsules stop
  claiming provider authority they route through the runtime
- library: replace platform-branded "Finder" wording with file-manager phrasing
- marketplace: classify providers via name.endsWith("-provider")

Co-authored-by: Cursor <cursoragent@cursor.com>
… post-merge truth

- home-entropy-check: current home asset version, expanded library open allowlist,
  post-merge inspector routing, act-emitter README in the Users/self allowlist
- check-wci-alignment: justified exclusions for chain-native crates, backend-scheme
  elacity pattern instead of the bare word
- command-smoke/installed-command-audit: hermetic HOME on macOS and a portable
  timeout (timeout/gtimeout/perl alarm) so the gates run off-Linux
- state.md: restore the canonical journey proof records lost in the merge
- docs: unlink gitignored CLAUDE.md, point DDRM rail table at per-capsule
  wasm-smoke scripts

Co-authored-by: Cursor <cursoragent@cursor.com>
filter/map instead of bool::then in filter_map for browser session listing, tail
expression instead of return in the cfg-split supports_hibernation, and indented
doc-comment link definitions in elastos-vz.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ricks `elastos home`

Root cause of the local-carrier-setup-smoke failure ("Capability request still
pending after 3s"): the G-ID flip fail-closes every identity gate for sessions
with no capsule identity, and /api/auth/attach created exactly such sessions
(vm_id: None). The managed-home flow then dead-ended three ways: capability
intake recorded no requester identity, the consent-broker's grant POST 403'd
fail-closed ("no requester capsule identity") in an infinite retry loop, and
even a minted token would have been unredeemable ("session has no capsule
identity"). Fail-closed did its job; the flow lost its identity plumbing.
Predates the 0.5 merge — the smoke was never re-run on Linux after G-ID landed
(the Mac cannot run it), so it slipped every gate until now.

Fix at the root seam: attach-authenticated sessions record an HONEST host
identity ("host-client" / "host-shell") — the attach secret is owner-only
(chmod 600), so the caller IS the host user; this is truthful identity, not
fabrication. Intake, grant mint, and token redemption now agree end-to-end.
No authority widening: grants still require consent-broker approval; tokens
still bind to the recorded identity; audit records it.

Proven live: `just local-carrier-setup-smoke` now passes on Linux (was the one
red step in `just verify`); replayed the failing grant against the live runtime
before/after (403 "no requester capsule identity" -> granted). Regression test
pins the identity on both scopes.

Gate: cargo test -p elastos-server --lib green (1044), clippy clean, fmt clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FEL7iSfBWL2JiAFDy8fq5z
…t-scan invariant test

The 0.5 merge left three first-party capsule providers declaring
`provides: elastos://<name>/*` for names NOT in RESERVED_SUB_NAMES: `market`
(content-market storefront — no boot fallback, route never exists), `object`
(Library object authority) and `operator-drive-adapter` (both also register a
boot main-provider but lose their VM sub-route). At capsule launch the
supervisor's register_provider_route fails closed and the failure is
warn-swallowed, so the provider silently goes dark — the same live-only class
the dDRM-spine fix repaired, still open for these three.

- Reserve the three names (strict superset; no capability removed).
- Add pub is_reserved_sub_name() as the single-source-of-truth predicate.
- Add test_all_capsule_provided_sub_schemes_are_reserved: scans every shipped
  capsule.json `provides` sub-scheme and asserts it is reserved — no boot
  needed. This is the general invariant the hardcoded dDRM-spine test only
  covered for three names; it would have caught all of this and reds on the
  next provider capsule that forgets to reserve its scheme.

Gate: cargo test -p elastos-runtime --lib green (384), fmt clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FEL7iSfBWL2JiAFDy8fq5z
…ore 6 inbox tests

Intake bug (Ravi P16/P11, KNOWN_GAPS G3): create_inspect_action_request only
checked plan.status=="ok", but the inspector's plan returns
{status:"ok", data:{valid:false, error:"unknown_operation"}} for an operation
the target authority never declared. That created a PENDING inbox approval with
an EMPTY gate preview — prompting a human to approve an act whose authority is
invisible. Consent requires visibility.

- Reject at intake when plan.data.valid != true, BEFORE persisting: no record,
  no notification, no approvable row, no dispatch_approved reachability.
- Restore the 6 inbox-approval regression tests dropped in the 0.5 merge,
  grafted from origin/review/0.5.0 against the existing merged harness — inbox
  suite 4 -> 10.
- Add inspect_action_rejects_undeclared_operation_before_inbox: asserts the
  undeclared op is rejected AND leaves zero approvable Inbox rows (structural
  fail-closed, not a hidden display string).

Gate: cargo test -p elastos-server --lib inbox suite green (11 incl. new guard).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FEL7iSfBWL2JiAFDy8fq5z
register_sub_provider was last-write-wins, so a launched capsule whose manifest
declares `provides: elastos://encrypt/*` (or key/decrypt/wallet/…) could seize
the CEK-escrow / key / signing route from the trusted boot provider — ambient
authority via registration order (Principle 3) and a break of the mediated
key/decrypt plane (Principle 15).

- Pin the escrow+keys+signing+mint spine (encrypt, publish, media, key, decrypt,
  drm, rights, wallet, chain): once bound at boot, a later registration of the
  same still-live name is refused structurally (Err), checked under the write
  lock (race-free). Non-pinned reserved names keep last-write-wins for
  hot-reload / test double-registration.
- unregister_sub_provider frees the slot, so a genuine teardown→restart of the
  same provider re-mounts cleanly; only overwrite of a live pinned slot fails.
- register_sub_provider now routes its reserved-name check through the new
  is_reserved_sub_name() predicate (single source of truth; also clears the
  dead-code warning).

Validated empirically: `just local-carrier-setup-smoke` (full Linux boot +
`elastos home`) passes with the guard live — boot registers each pinned name
exactly once, so nothing legitimate is refused. Test proves refuse-overwrite,
original-stays-bound, and restart-after-unregister.

Gate: cargo test -p elastos-runtime --lib green; clippy -p elastos-runtime 0;
smoke green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FEL7iSfBWL2JiAFDy8fq5z
…-auth funnel

Two same-class hygiene fixes surfaced by the audit (both "one canonical path",
Principle 10):

1. DDRM test env-lock: mint/buy/rights/owned_ledger each held their OWN
   `static ENV_LOCK`, so a lock only serialized a module against itself while
   the mutated `ELASTOS_DDRM_*` vars are process-global — a reader in one module
   could observe another module's mid-test mutation and fail closed (the exact
   nondeterministic class the trusted-auth-env guard fixed). Replace the four
   disjoint statics with one shared `api::ddrm_env_lock()` so all DDRM env
   mutation serializes on a single lock instance.

2. Trusted-auth funnel: `room_transport_identity_data_dir` was a byte-identical
   copy of `home_launch_auth_data_dir` (env read + test guard). Delegate to the
   canonical one so the two can't drift; the entropy-check-pinned
   `home_launch_auth_data_dir` symbol is unchanged.

Gate: cargo test -p elastos-server --lib green (1051), fmt clean, 0 warnings.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FEL7iSfBWL2JiAFDy8fq5z
…f-tier

Audit surfaced a three-way contradiction: docs said "/api/provider/inspect/self
is System-only", but the code routes self to the app/browser tier
("self" => &[BROWSER_CAPSULE_ID]) AND the entropy-checker simultaneously pinned
BOTH the BROWSER-self code and the stale "System-only" doc line.

Decision (owner): keep the self-tier — a legitimate KEEP transparency capability,
fail-closed by construction (gateway injects the authenticated principal_id,
client-supplied id ignored, authorize_view enforces caller == target under
InspectScope::SelfOnly), already covered by
inspect_self_returns_own_record_and_ignores_client_id and
inspect_self_token_cannot_reach_system_capsule_op.

- docs/CAPSULE_INSPECTOR.md + docs/INSPECTOR_TESTING.md: self is a live,
  caller-bound, fail-closed route (not System-only).
- home-entropy-check.mjs: pin the new fail-closed self-tier language instead of
  the stale "System-only" phrase, so code, docs, checker, and tests all agree
  (Principle 12). No code/behavior change.

Gate: home-entropy-check PASS.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FEL7iSfBWL2JiAFDy8fq5z
…ants

serde's container `deny_unknown_fields` does NOT apply to UNIT variants of an
internally-tagged enum, so the quorum authority's Request::Status / ::Shutdown
silently accepted `{"op":"status","smuggled":true}` — a small fail-open seam on
an untrusted protocol surface (Principle 11). The authority-carrying variants
(Hello/Recover/RotateShare/…) are struct variants and already fail closed; only
the two empty ones leaked.

- Convert Status/Shutdown to empty STRUCT variants so deny_unknown_fields covers
  them; update the four match sites.
- Add empty_variants_reject_unknown_fields (clean parse; hidden field rejected).

Scoped to only the logical change (no whole-file reformat, per the shared-tree
lesson). Gate: cargo test -p dkms-authority green (25); no new clippy warnings.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FEL7iSfBWL2JiAFDy8fq5z
…in KNOWN_GAPS

Turn the remaining audit finding into a build-visible, tracked contract rather
than prose (LESSONS.md: audit → gap registry). server_infra warn-swallows a
register_sub_provider Err at boot for ~22 providers; the capability still fails
closed at route time (not fail-open), but a spawned-but-unregisterable
boot-critical provider goes silently dark with only a warn. Row records the
anchor, the distinction (absent-binary=warn ok vs spawned-but-rejected=loud),
the close criteria, and a pending ratchet (needs a boot failure-injection seam).

The other remaining finding — carrier-service launch skipping the author-
signature gate — is already tracked as AUD-1 RESIDUAL (b); not duplicated.

Docs-only.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FEL7iSfBWL2JiAFDy8fq5z
…is session's fixes

Registry-truth sweep (LESSONS.md: audits feed resolutions back — a doc that rots is a
liability). Reconcile every row whose truth changed under this session's commits:

- G-ID residual: drop `attach.rs:63` from the "None-vm_id follow-ups" list — attach host
  sessions now carry an honest host-shell/host-client identity (`279dac1`), closing the
  live-only managed-home dead-end the smoke caught.
- PRINCIPLES_CONFORMANCE §A RESERVED_SUB_NAMES: mark it DESIGN-gap-only now — the acute
  risks are build-guarded (manifest-scan invariant `1fc2a14`; first-writer-wins pin
  `8b688fc`); drop the stale `:448-476` line ref.
- Enforced invariants (+3): every provider `provides` sub-scheme is reserved (no silent-dark);
  boot-critical sub-providers pinned first-writer-wins; request_act intake fails closed on an
  undeclared op.

inspect/self tier was already reconciled in `e51be7b`; DDRM env-lock is test-infra (no row).
Docs-only. Gates: home-entropy + wci-alignment PASS.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FEL7iSfBWL2JiAFDy8fq5z
…ratchet

AUD-6 seam + first fix. Boot-critical sub-provider registration was warn-swallowed
at ~19 server_infra sites: a spawned-but-unregisterable provider (an invariant
violation → a dark mint/keys/signing path) left the runtime up with only a warn.

- `encrypt` (CEK escrow — the crown jewel) now PROPAGATES its register_sub_provider
  failure (`?`, boot fails loud) instead of warn-swallow. Only the
  registration-rejected branch changes; absent-binary stays the outer warn
  (genuinely optional). Smoke-validated: real boot registers encrypt once, no Err,
  boot proceeds — `just local-carrier-setup-smoke` green.
- `#[ignore]`d ratchet `aud6_boot_critical_sub_provider_registration_fails_loud`
  scans for the warn-swallow line per boot-critical scheme; run with --ignored it
  FAILS today, listing publish/media/key/decrypt/drm/rights/wallet/chain (encrypt
  absent = fixed). Flips green — delete #[ignore] — when the rest are classified
  critical-vs-optional and rewired. Non-blocking in normal CI (ignored).
- KNOWN_GAPS AUD-6 updated: PARTIAL (encrypt), ratchet named.

Gate: cargo test -p elastos-server --bin green (96 pass, 1 ignored); smoke green;
server_infra.rs rustfmt-clean (scoped).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FEL7iSfBWL2JiAFDy8fq5z
…t response paths (DoS)

Audit swarm finding (Priya, HIGH): the primary Carrier request path used
unbounded `read_line` on remote-controlled streams. `handle_file_stream`
accepts every inbound CARRIER_ALPN connection with no peer auth and then
read a whole line into memory, so a remote peer could OOM the node pre-auth
with a newline-less flood. The same class was already fixed for the
WASM/microVM bridges (BUG-6, bounded `read_bounded_line`, 1 MB cap) but
never applied here. The client-side response readers (release_head,
provider_invoke, gossip push/pull, operator send_request) had the same gap
against a malicious source we dialed.

Fix (fail-closed, no protocol change): expose the existing bounded reader
`pub(crate)` and funnel every Carrier newline-delimited control read through
one shared `read_bounded_carrier_line` helper (1 MB cap; oversized/truncated
= error, not a giant alloc). Carrier bulk bytes ride the separate
length-prefixed path (already capped at 200 MB), so the 1 MB bound only ever
constrains small JSON control lines.

Sites: carrier.rs handle_file_stream (inbound, HIGH) + 4 client response
readers; operator_control.rs inbound handler + peer response.

Gate: cargo build -p elastos-server green; clippy -p elastos-server --lib
clean; 2 new regression tests (oversized flood refused, normal line
round-trips) pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FEL7iSfBWL2JiAFDy8fq5z
…only ops (T1)

Audit swarm finding (Sol, CONFIRMED): `handle_file_connection` accepts every
inbound CARRIER_ALPN connection with NO peer authentication, and
`validate_carrier_provider_invocation` is self-referential (it checks
caller-supplied envelope fields against each other, not against a
runtime-issued capability). So any anonymous remote peer could invoke the
whole provider_invoke matrix — confirmed harm: `content:publish`/`import_exact`
pin arbitrary bytes into the node's store under a caller-supplied
`principal_id` (unauthorized write + quota-attribution abuse); critical
caveat: the `key`/`decrypt`/`drm` targets were reachable too.

Fix (fail-closed, default-DENY): `carrier_provider_plane_allows_unauthenticated`
is a strict allowlist — only `content:{fetch,status,admission}` (non-mutating
reads: fetch bytes, read status, quota *decision*) pass. Every write
(publish/import_exact/import_object/ensure/unpublish/repair) and every
key/decrypt/drm/rights/availability op is refused with
`unauthorized_provider_operation` BEFORE `send_raw` ever runs.

Trade-off (user-approved "lock read-only now"): authenticated push-replication
and cross-node key/rights flows over the plane are disabled until real Carrier
peer authentication lands — tracked as G-CARRIER-PEER in KNOWN_GAPS. Widening
the allowlist without peer auth reopens T1.

Gate: cargo clippy -p elastos-server --lib clean; full carrier test module
57/57 pass; 2 new refusal tests (write op refused, key/decrypt/drm refused) +
existing content:fetch dispatch test still green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FEL7iSfBWL2JiAFDy8fq5z
… (T3)

Audit swarm finding (Nadia, HIGH, confirmed end-to-end): `validate_public_ip`
checked only the native IPv6 predicates (loopback/unspecified/unique-local/
link-local), so IPv4-mapped IPv6 literals evaded every guard —
`::ffff:169.254.169.254`, `::ffff:127.0.0.1`, `::ffff:192.168.1.1` all returned
"public". The `url` crate preserves the mapped form through the host allowlist,
DNS resolver, and connect; on a dual-stack host the kernel routes
`::ffff:a.b.c.d` to the bare IPv4, so a capsule with a permissive `http_fetch`
backend could read `http://[::ffff:169.254.169.254]/latest/meta-data/...`
(cloud metadata / loopback services).

Fix: in the V6 arm, normalize `to_ipv4_mapped()` (and the deprecated
IPv4-compatible `::a.b.c.d` via `to_ipv4()`) FIRST and recurse into the full v4
private/loopback/link-local guard. Ordered so `::1`/`::` are still caught by
the native predicates before the v4 fallback. Applied identically to
exit-provider and net-provider (the two SSRF egress mediators).

Gate: cargo test + clippy on both standalone capsule crates green; new
regression test `validate_public_ip_blocks_ipv4_mapped_private_targets`
(mapped metadata/loopback/RFC1918 refused; public v6 + public mapped v4 pass).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FEL7iSfBWL2JiAFDy8fq5z
Audit swarm finding (Vera+Dmitri, HIGH, confirmed): the audit-chain signature
was strippable via an unauthenticated `alg` downgrade. `compute_record_hash`
hashes only `domain ‖ seq ‖ prev_hash ‖ event_json` — `alg` and `sig` are NOT
in the preimage — and `verify_chain` ran the ed25519 check only
`if rec.alg == "ed25519"`. So an offline editor with NO signing key could
rewrite the entire event history, recompute every (public) record_hash, relink
the chain, set `alg="none"`, drop `sig`, and pass: `verify_chain` returned Ok,
`chain_attestation` reported verified=true, still advertising the real signer.
This defeated the module's own tamper-evidence guarantee — the EU AI Act
durable-custody claim.

Fix (no on-disk format change): make the decision to check the signature
independent of the forgeable `alg`. When a verifying key is supplied (custody /
tamper-evidence mode — both production callers, with_file_verified and
chain_attestation, derive the key from self.signer, present iff the log is
signed), EVERY record MUST be ed25519-signed and verify; a non-ed25519 alg in a
signed chain is a downgrade and is refused fail-closed. The keyless
(memory/unsigned) path is unchanged and still refuses to report a signed record
as verified without its key.

Gate: cargo clippy -p elastos-runtime --lib clean; all 19 audit tests pass,
incl. new `signature_downgrade_forgery_is_refused` (full forgery: event edited,
record_hash recomputed + relinked, sig stripped → refused; hash-chain is
internally consistent so ONLY the mandatory-signature rule catches it).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FEL7iSfBWL2JiAFDy8fq5z
… charset guard (T6)

Two MEDIUM audit-swarm findings (Nadia):

T5 — exit-provider `http_fetch` auto-followed ureq's default 5 redirects. The
private agent has no IP-validating resolver on redirect hops, and the backend
host allowlist is only checked against the INITIAL URL, so an allowlisted host
could `302` the fetch to cloud metadata / any non-allowlisted host. Fix:
`.redirects(0)` on both agents — the mediator returns the 3xx to the caller
instead of following; the capsule re-issues `http_fetch` for the new URL, which
re-runs the full URL + host + allowlist + resolver validation per hop (each
egress individually capability-checked). All 29 exit-provider tests still pass.

T6 — the carrier `operation` was only checked non-empty, then interpolated into
`/api/provider/{scheme}/{operation}`; `Url::join` normalizes `..`, so
`x/../../capability/request` escaped the provider gate and reached arbitrary
local-API endpoints as the capsule's own token. Fix: restrict `operation` to a
single `[A-Za-z0-9_-]` segment in `carrier_invoke_dispatch`, rejecting
`/`/`.`/`%` etc. before it reaches the URL.

Gate: clippy clean on both crates; 8/8 carrier dispatch tests pass incl. new
`carrier_invoke_dispatch_rejects_path_traversal_operation` (traversal/dot/pct
refused, normal underscore op still parses); exit-provider 29/29 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FEL7iSfBWL2JiAFDy8fq5z
`just verify`'s `cargo fmt --check` step flagged four non-canonical lines in
the test code added by the audit-fix chunks (assert! wrap, .replacen args,
Cursor::new arg, for-loop array). Formatting only — no logic change. Applied
by hand (scoped to the exact lines) to respect shared-tree discipline; scoped
`cargo fmt -p elastos-runtime -p elastos-server --check` now clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FEL7iSfBWL2JiAFDy8fq5z
Doc-truth reconcile: add the audit-swarm callout to the KNOWN_GAPS opening so
the registry reflects the six confirmed reachable defects fixed this pass
(T1 carrier plane lock, T2 bounded reads, T3 SSRF, T4 audit downgrade, T5
redirects, T6 operation traversal), the cleared-as-sound surfaces, and the
deferred roadmap (T7 crypto migration, perf ceilings, quality cleanups). The
open residual (T1 peer-auth) is already the G-CARRIER-PEER row.

Gate: home + browser entropy checks, WCI alignment, and git diff-check all
pass on the doc change; full `just verify` was green on the code at HEAD.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FEL7iSfBWL2JiAFDy8fq5z
…yte copy

Both VM-launch overlay sites (rootfs.rs get_or_create_overlay and the inline
copy in supervisor.rs) did a full tokio::fs::copy of the ~335 MB rootfs.ext4 on
every launch. Replace both with a shared reflink_or_copy helper: a copy-on-write
clone via `cp --reflink=always` — an O(1) metadata op on CoW filesystems
(btrfs/xfs/zfs/bcachefs) — that transparently falls back to the exact same
pure-Rust full copy on any failure (non-CoW FS, cross-device, or `cp` absent).

Correctness is identical on both paths: the result is an independent writable
file with identical contents (a reflink gives copy semantics, not a shared
mutable file). Only the cost changes. New unit test asserts independence —
writing the clone leaves the source untouched — so it holds whichever path the
host filesystem takes.

Audit-swarm finding (Berger, HIGH, safe, free): the standout no-measurement-gate
latency win — a full image copy on the launch hot path with a free O(1)
replacement. mkfs.ext4 is already shelled out from this crate, so external-tool
use here matches the established pattern.

Gate: full `just verify` green (fmt/clippy -D warnings/test/carrier smoke).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FEL7iSfBWL2JiAFDy8fq5z
The GAP-8/AUD-2 custody write on the dDRM open path called
audit.content_open(...) synchronously inside the async handler; content_open ->
emit does a full fsync, so every open parked a tokio worker thread on disk I/O.
Wrap it in spawn_blocking with owned clones of the record fields (the Arc<AuditLog>
handle is cloned in).

The fail-closed contract is preserved exactly: the open proceeds ONLY on
Ok(Ok(())); an emit error (Ok(Err)) refuses it as before, and a join failure
(Err) is now also treated as a write failure and refuses the open — content
whose open cannot be durably, tamper-evidently recorded still does not happen.
The fsync itself is unchanged (custody durability is not weakened); it just no
longer blocks an async runtime thread.

Audit-swarm finding (Vyukov, HIGH, safe): custody fsync on the async worker on
the open hot path.

Gate: full `just verify` green (fmt/clippy -D warnings/test/carrier smoke).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FEL7iSfBWL2JiAFDy8fq5z
…tors

content.rs and carrier.rs each carried byte-for-byte copies of three
security-invariant validators: the SSRF egress URL guard (reject inline creds,
allow only https or loopback http), the HTTP-header CRLF-injection guard, and
the content path-traversal guard. Duplicated security logic drifts silently —
tightening one copy leaves the other on the weaker rule (the same class that let
an SSRF gap exist in two places).

Extract the logic into one `net_validation` module (with unit tests) and reduce
the six local functions to trivial label-passing delegators. Zero call-site
churn (~28 callers unchanged) and byte-identical error messages — the label
parameter reproduces each surface's exact prefix ("operator alert" /
"carrier external endpoint" / "carrier authorization header"). Behavior is
unchanged; the security rule now lives in exactly one place per invariant.

Audit-swarm finding (matklad, MED): security-validator duplication / drift.

Gate: full `just verify` green (fmt/clippy -D warnings/test/carrier smoke);
3 new net_validation unit tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FEL7iSfBWL2JiAFDy8fq5z
Produce a single MPEG-DASH/CENC-compliant asset (ISO-IEC 23001-7) for every
media (DASH) mint, while keeping the server-decrypt rail's own player working
by down-converting back to a plaintext-looking init at the fetch point.

- ddrm-envelope: shared `pssh` module -- single source of truth for producer,
  runtime decrypt read-path, and playback clients. ELASTOS_PQ_SYSTEM_ID
  (b6e254ef-0dc5-47fe-94e7-0e72ed1dc7b0); build_pssh (v1 box, default-KID +
  opaque .asset.protections JSON) / parse_pssh (v0/1, trailing-moov tolerant).
- ddrm-media: cenc_signal_init() (avc1->encv / mp4a->enca + sinf(frma/schm/tenc)
  + pssh moov child) and strip_cenc_signal() as a byte-exact inverse. Roundtrip
  tests assert strip(signal(x)) == x, no-op on unsignaled, fail-closed on double.
- encrypt-provider: CencSignalInits op -- pure public box surgery (no CEK/secret),
  wraps the runtime-built PSSH envelope and rewrites each per-track init; returns
  transformed inits + pssh_b64 for the MPD.
- creator (producer): after the threshold seal, build the PSSH envelope from
  dkms_protection, CENC-signal each init, and patch stream.mpd with
  <ContentProtection> (mp4protection:2011 + cenc:default_KID + per-system pssh).
- ddrm-media-authority: read_dash_init strips CENC signaling at the fetch point
  so the seal-bound AAD init and the runtime player's served init both match the
  plaintext init the mint sealed (no AAD mismatch).
- Flip on by default: drop the ELASTOS_DDRM_CENC_PSSH gate -- CENC signaling +
  MPD ContentProtection are now standard output. Additive; existing playback
  unchanged.

Squashed from: d012fc4 047d38f 4d26798 d6fb99f 3ac5fdc 4edfd9e
elastos-server 782+95 green; helper 15 green; fmt clean.
…TY-2282)

Stop the dKMS quorum path from wedging and leaking processes under
playback+reload, and make the local test suite pass off the Linux x86_64 gate.

- dkms-authority (Defect A): serve each accepted connection on its own thread
  (serve_unix_listener / serve_tcp_listener) so an idle/slow/leaked client can no
  longer head-of-line-block the daemon in read_frame; revoked_callers becomes
  daemon-lifetime Arc<Mutex> shared state (additive+idempotent); 30s per-conn
  read timeout on both transports. Regression test drives the real Unix accept
  loop (RED pre-fix, GREEN after); 35/35 green.
- key-provider: bound the Unix recover read in establish_dkms_session with the
  same DKMS_TCP_READ_TIMEOUT_MS (5s) the tcp/carrier branches use, so a wedged
  node fails fail-closed within a bounded window. 18/18 green.
- dkms (Defect B): reap leaked quorum helper/provider processes -- add Drop for
  the helper Capsule (kills+reaps key-provider/decrypt-provider children on every
  path), and guard MediaAuthorityProc launch/launch_quorum with a ChildReaper so
  early-return/error paths no longer orphan the raw Child.
- browser: keep the runtime stream socket path within the macOS sun_path limit
  (104) -- fall back to a short "/tmp" base when temp_dir() would overflow, fixing
  the 6 browser-open route tests on macOS arm64 (Linux unaffected).
- test(elastos-server): key component-checksum fixtures by detect_platform() so
  verify/stamp and agent-binary tests run on any host without masking the check.

Squashed from: 50cdc46 0c22718 46a7ba4 d93f673 a9283b5
…view

Address correctness, security, and robustness findings across the DKMS and
encryption/decryption workflows, each with regression tests.

- dkms-authority: revocations now share ONE live Arc<Mutex<HashSet>> across all
  connection threads (was a per-connection snapshot merged only on close), so a
  revoke binds every open connection immediately — "revocation outranks a live
  session" holds under concurrency. Unify the Unix/TCP accept loops into one
  generic serve_accept_loop with a MAX_ACTIVE_CONNECTIONS cap + RAII slot guard,
  bounding the thread/memory-exhaustion (slow-loris) vector on the network node.

- key-provider: distinguish a transport fault from a node rejection
  (NodeRecoverError). A warm pooled connection the node's idle timeout closed is
  re-established and retried ONCE; a genuine rejection still fails closed with no
  retry. Fixes the first open after a >30s idle gap failing below quorum.

- ddrm-media: drive the enca/encv choice off the authoritative hdlr handler type
  (fallback to an expanded audio-4CC allowlist), and make parse_codec_string use
  the same allowlist so the two classifiers can't diverge — an uncommon audio
  codec is no longer mis-signaled as video (non-compliant init + strip missize).

- ddrm-media-authority: read_dash_init propagates strip_cenc_signal errors
  instead of unwrap_or(raw), so a malformed init fails with a precise diagnosis
  rather than an opaque downstream decrypt/quorum failure.

- encrypt-provider: decode_kid16 validates length AND ASCII-hex charset before
  byte-slicing, rejecting a multibyte KID instead of panicking the capsule.

- elastos-server: browser stream sockets use a per-euid dir created 0700 and
  refuse any pre-existing dir not owned by us or group/other-writable, closing
  the world-writable /tmp squatting / socket-hijack vector.
@irzhywau irzhywau force-pushed the fix/mpeg-dash-compliance branch from 92573d2 to 6798942 Compare July 2, 2026 14:29
The ci.yml on this line (inherited from flint-0.5) invokes 'just verify-ci'
and 'just verify-capsules', but the justfile never got the recipes; both CI
jobs fail on every run with 'justfile does not contain recipe'. Port the
recipes from feat/ddrm-hardening-and-creator-parity, whose feature sets and
paths all exist on this branch (verified locally: verify-capsules,
alignment-check, command-smoke, candidate-command-audit all pass).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants