Skip to content

injective keys - 75%+ faster expansion, 50% reduction in file size, massive allocation reduction && Soundness#993

Open
kans wants to merge 7 commits into
mainfrom
kans/ggreer/injective-keys
Open

injective keys - 75%+ faster expansion, 50% reduction in file size, massive allocation reduction && Soundness#993
kans wants to merge 7 commits into
mainfrom
kans/ggreer/injective-keys

Conversation

@kans

@kans kans commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Pebble: injective grant identities + sorted-ingest grant expansion

Summary

This PR makes grant expansion on the Pebble engine fast and predictable for very
large datasets by fixing two compounding problems: grant/entitlement IDs were
not injective
(distinct logical entities could collide, and equal entities
could carry unequal IDs), and expansion emitted grants out of key order,
turning the LSM write path into a write-amplification machine.

On a production-scale benchmark (3.68M base grants expanding to 57.7M), a full
projection expansion + index build + save went from ~285s to ~164s (-43%),
while also shipping a fully consolidated artifact (compaction debt ~0) instead
of a partially-compacted one.

1. Injective structural identities

  • Grants and entitlements are now keyed internally by their structural tuple:
    grant = (ent_rt, ent_rid, ent_kind, ent_name, prin_rt, prin_id),
    entitlement = (rt, rid, kind, name), stored as raw components with a
    0x00-separated tuple codec. Same logical entity ⇒ same key, always.
  • Canonical, escaped string IDs are produced/parsed only at the edges (CLI,
    RPC, connector-facing APIs); Grant.id is no longer deprecated.
  • Custom entitlement IDs are tagged with a universal EntitlementKindCustom
    marker (the sanitizer intentionally keeps the marker cleartext; documented).

Migration

  • Legacy files are migrated build-then-replace: new keyspaces are built into
    SSTs and swapped in atomically via IngestAndExcise, then stamped. Crash-safe
    (re-runs from scratch), no per-key fallback paths in readers.
  • Legacy IDs that collide onto one structural identity are merged with a
    deterministic collapse rule.
  • Read-only opens of legacy Pebble files migrate a temp copy (same behavior as
    SQLite read-only migrations). SQLite→Pebble conversion emits the new format
    directly and stamps the migration marker.

2. Index diet

Folded into primary-key prefix scans (the grant primary key is
entitlement-first): grant by_entitlement, by_entitlement_resource,
by_principal_resource_type, and entitlement by_resource. Remaining
secondary indexes (by_principal, by_needs_expansion) are key-only.

by_principal is scattered relative to expansion's write order, so its inline
maintenance is deferred: expansion skips it and EndSync rebuilds the entire
family from one primary scan into a single sorted SST (IngestAndExcise).

3. Sorted-ingest expansion write path

Expansion previously wrote synthesized grants through batch commits in
topological-node order — massively out of key order, so the memtable/L0 path
rewrote the data repeatedly through compaction.

Now:

  • The topological order is computed as Kahn waves (level decomposition).
    Every parent of a wave-k node lives in a wave < k, so a wave's output is
    never read within that wave — its writes can be deferred and published at
    the wave boundary.
  • Each wave opens a layer session: synthesized rows are encoded to final
    key/value bytes immediately and streamed into a background-sorted spill
    sorter (128MiB chunks). Every ~8M rows the session cuts a segment, and a
    background worker k-way merges the segment into one SST and ingests it —
    overlapping merge/ingest with expansion compute.
  • Synthesized contributions skip read-before-write entirely and are encoded
    with a hand-rolled deterministic wire encoder for sources (byte-identical
    to the reflective proto marshal, pinned by test).
  • SQLite and non-conforming stores keep the existing batched path.

4. LSM lifecycle around bulk ingest

Segment SSTs span interleaved key ranges, so Pebble stacks them in upper
levels and schedules hundreds of compactions to untangle them — work that
competed with EndSync and whose output never survived to the saved artifact
(the file is checkpointed and closed minutes later). Two changes:

  • A pausable compaction scheduler (drop-in variant of Pebble's default)
    stops granting automatic compactions from EndSync to close; binding a new
    sync resumes it. In-flight compactions finish normally.
  • The deferred index scan tees every raw grant row (via a pipelined writer
    goroutine) into rolling flat SSTs, and one IngestAndExcise swaps the whole
    grant range: each byte is rewritten exactly once, on a scan already being
    paid for, replacing what background compaction needed ~2m48s of thread time
    to approximate. The saved artifact ships with compaction debt ~0.
  • Compaction/ingest metrics are logged at phase boundaries for observability.

Benchmark (3.68M → 57.7M grants, Apple Silicon)

Phase Before sorted ingest This PR
Expansion (54M synthesized rows) ~3m50s 1m15s
Deferred index build + keyspace consolidation 57s (no consolidation) 1m01s (debt 0)
Envelope encode 17s 14s
Total ~285s ~164s

Compactions during the post-expansion window: 236 → 9. Output c1z is
byte-count comparable (slightly smaller) with identical logical content,
verified by differential expansion tests against SQLite and the legacy path.

Compatibility

  • External ID formats are unchanged on connector-facing APIs.
  • Old c1z files (both engines) migrate on open, including read-only opens.
  • Expansion output parity is pinned by cross-engine differential tests,
    structural-snapshot tests, and a seeded fuzz sweep.

@kans kans requested a review from a team July 1, 2026 04:48
Comment thread pkg/dotc1z/engine/pebble/id_index_format.go Outdated
Comment thread pkg/types/grant/grant.go Outdated
@kans kans force-pushed the kans/ggreer/injective-keys branch from 7eb0dda to 9616b70 Compare July 1, 2026 04:56
Comment thread pkg/dotc1z/engine/pebble/id_index_format.go Outdated
Comment thread pkg/c1zsanitize/transform.go Outdated
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

General PR Review: injective keys - 75%+ faster expansion, 50% reduction in file size, massive allocation reduction && Soundness

Blocking Issues: 1 | Suggestions: 3 | Threads Resolved: 0
Criteria: Criteria status: loaded .claude/skills/ci-review.md from trusted base 6cf5a8172058.
Review mode: full
View review run

Review Summary

Re-reviewed the full PR diff (115 files, ~13.3k additions) for security and correctness, with focused verification of the wire/proto surfaces, the on-disk id-index migration, and the serialized-state paths. Two candidate blocking issues were evaluated and dismissed as false positives: the build-then-replace id-index migration is crash-safe because each range is swapped atomically via IngestAndExcise and mergeDuplicateGrantValues is fold-order-independent/idempotent, so a mid-migration crash re-runs cleanly from the surviving (already-merged) rows; and the unchanged keyspaceVersion is not a gap because a separate idIndexFormat stamp gates migration (legacy files migrate on writable open and fail loudly on direct read-only open). The previously-flagged GrantsForEntitlementPrincipalSorted concern is now a non-issue — the method still returns true and the change was doc-only. The one remaining blocking issue (SQLite→Pebble conversion re-stamping discovered_at) is confirmed still open and is verifiably introduced by this PR.

Security Issues

None found.

Correctness Issues

  • pkg/dotc1z/to_pebble.go / pkg/dotc1z/engine/pebble/bulk_import.go:303,374: the conversion path deleted its discovered_at propagation (removed the AddGrantsWithDiscoveredAt/AddResourcesWithDiscoveredAt/AddEntitlementsWithDiscoveredAt/AddResourceTypesWithDiscoveredAt calls that read the source discovered_at column) and now re-stamps timestamppb.Now() for every converted record. Since the Pebble compactor picks fold winners by newest discovered_at, converted rows get an artificially-recent stamp that can override genuinely-newer data on a later fold — exactly what the now-deleted // re-stamping would invert record precedence comment warned against. (Blocking, previously flagged, verified as a regression introduced by this PR.)

Suggestions

  • pkg/sdk/version.go: stays v0.16.0 despite a new on-disk Pebble id-index format, an open-time migration, and default-behavior changes; a 0.x minor bump would be the pre-1.0 compatibility signal, plus a migration/rollout note for downstreams. (Previously flagged.)
  • pkg/dotc1z/engine/pebble/adapter.go / bulk_import.go: hard removal of exported AddGrantsWithDiscoveredAt / AddResourcesWithDiscoveredAt / AddEntitlementsWithDiscoveredAt / AddResourceTypesWithDiscoveredAt; no in-repo callers remain, but these were exported SDK symbols and their removal is what dropped the discovered_at propagation above. (Previously flagged.)
  • pkg/dotc1z/engine/pebble/deferred_index.go:364-368: an ok=false from appendGrantByPrincipalKeyFromPrimary silently omits the row from by_principal (still teed into the primary rebuild and counted) with no counter/log — add observability. (Previously flagged, low confidence.)
Prompt for AI agents

```
Verify each finding against the current code and only fix it if needed.

Correctness Issues

In `pkg/dotc1z/to_pebble.go` and `pkg/dotc1z/engine/pebble/bulk_import.go`:

  • The SQLite->Pebble conversion (convertGrants/convertResources/convertEntitlements/
    convertResourceTypes) no longer reads the source `discovered_at` column and no longer
    passes it into the bulk importer, so translateGrantsSerial (bulk_import.go:303) and the
    per-record stamp (bulk_import.go:374,401,440) fall back to timestamppb.Now(). Because the
    Pebble compactor picks fold winners by newest discovered_at, converted rows get a
    conversion-time stamp that can wrongly override genuinely-newer data on a later fold.
    Fix: restore reading the `discovered_at` column in the four convert* scans and pass the
    per-record timestamps into the bulk import (re-add the WithDiscoveredAt-style plumbing, or
    set DiscoveredAt on the translated v3 record before Add) so the source discovery time is
    preserved instead of re-stamped.

Suggestions

In `pkg/sdk/version.go`:

  • Around line 3: Version is still "v0.16.0". This PR changes the on-disk Pebble id-index
    format, adds an open-time migration, and changes default behavior. Bump the 0.x minor
    version as the pre-1.0 compatibility signal and include a migration/rollout note in the PR.

In `pkg/dotc1z/engine/pebble/adapter.go` / `bulk_import.go`:

  • The exported bulk-import methods AddGrantsWithDiscoveredAt / AddResourcesWithDiscoveredAt /
    AddEntitlementsWithDiscoveredAt / AddResourceTypesWithDiscoveredAt were removed. These were
    exported SDK symbols; if any downstream depends on them, restore them (or provide a
    replacement). Restoring their use in to_pebble.go is also the fix for the discovered_at
    regression above.

In `pkg/dotc1z/engine/pebble/deferred_index.go`:

  • Around line 364-368: appendGrantByPrincipalKeyFromPrimary returning ok=false silently omits
    the row from the by_principal family (while it is still teed into the primary rebuild and
    counted) with no counter or log line. Add a dropped-row counter and a warn log so a
    malformed key that vanishes from by_principal is observable.
    ```

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found.

@kans kans force-pushed the kans/ggreer/injective-keys branch from 9616b70 to 93646ec Compare July 1, 2026 05:17

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found.

@kans kans force-pushed the kans/ggreer/injective-keys branch from 93646ec to 84500e6 Compare July 1, 2026 14:58
Comment thread pkg/dotc1z/engine/pebble/id_index_migration.go Outdated
message Grant {
c1.connector.v2.Entitlement entitlement = 1 [(validate.rules).message = {required: true}];
c1.connector.v2.Resource principal = 2 [(validate.rules).message = {required: true}];
// These ids may not map one to one with the grant itself.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: The generated pb/c1/connector/v2/grant.pb.go rawDesc in this PR marks field id = 3 as deprecated=true (the added \x18\x01 option bytes), but this .proto source only adds a comment — it does not declare deprecated = true on the field. This is generated-artifact drift: running make protogen would drop the deprecated flag. If the intent is to deprecate id, add deprecated = true to the field options here and regenerate; otherwise the generated file shouldn't carry it. Confidence: high (mismatch), low blast radius.

@kans kans force-pushed the kans/ggreer/injective-keys branch from 84500e6 to a37841f Compare July 1, 2026 17:30
@kans kans changed the title [wip] injective keys injective keys - 60%+ faster expansion, 50% reduction in file size, soundness Jul 1, 2026
@kans kans changed the title injective keys - 60%+ faster expansion, 50% reduction in file size, soundness injective keys - 70%+ faster expansion, 50% reduction in file size, && Soundness Jul 1, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found.

…iously trip compaction thresholds hundreds of times due to data volume alone
// runs) but competes with those phases for CPU and IO, so stop granting
// new compactions. StartNewSync/SetCurrentSync resume the scheduler if
// the store is written to again.
a.engine.PauseCompactions()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion (low confidence): Compactions are paused for the whole EndSync-to-close window, but that window is exactly where the deferred by_principal index build ingests its SSTs. Flushes still drain memtables to L0 and ingested SSTs that overlap existing key ranges can also land in L0; with granting paused, L0 can accumulate toward L0StopWritesThreshold and stall the very index build this is meant to speed up. Worth confirming (via the whale run's LogCompactionMetrics) that read-amp/L0 stays bounded through the index build, or resuming compactions if a write phase follows the pause. The code comment acknowledges this tradeoff, so this is a verification ask rather than a confirmed bug.

// isGranting serializes granting from Done and the periodic granter,
// and lets Unregister wait out an in-flight grant loop.
isGranting bool
isGrantingCond *sync.Cond

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: pausableCompactionScheduler reimplements pebble's ConcurrencyLimitScheduler with non-trivial concurrency (the isGranting handoff, Unregister waiting out an in-flight grant loop, Done/periodic-granter/TrySchedule all mutating runningCompactions). It has no direct unit test — only exercised indirectly via the env-gated whale benchmark. A focused test covering pause during an in-flight grant, resume-poke pickup, and Register/Unregister ordering would guard this against future pebble-version bumps. Same applies to the new Begin/Add/Finish/Abort layer session (segment cut, worker-error propagation, abort mid-wave), only reached with BATON_PEBBLE_SYNTH_LAYER_SST set.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found.

// Tee the raw row into the primary-keyspace rebuild pipeline. The
// iterator yields globally sorted keys, so each SST is sorted and
// the cut files are mutually disjoint.
if err := rebuild.add(iter.Key(), iter.Value()); err != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: A tee write error surfaced here (via addflushBatchtakeErr) is returned from BuildDeferredGrantIndexes and fails the whole EndSync/sync. That contradicts the design intent stated at line 380 and the excise GUARD (line 414), where post-scan rebuild failures are downgraded to a loud no-op that "leaves the LSM alone rather than failing the sync." Consider capturing a scan-time tee error and continuing the scan (so the by_principal index still builds and the excise is skipped) rather than aborting the sync. Low confidence — an SST-write error mid-scan may also fail the concurrent spill sorter, but the asymmetry with the post-scan guard is real. (incremental)

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found.

@kans kans mentioned this pull request Jul 2, 2026
2 tasks
Comment on lines +157 to +173
for k := 0; k < len(entitlementID); k++ {
if entitlementID[k] != ':' {
continue
}
for l := k + 1; l < len(entitlementID); l++ {
if entitlementID[l] != ':' {
continue
}
cand := entitlementIdentity{
resourceTypeID: entitlementID[:k],
resourceID: entitlementID[k+1 : l],
stripped: true,
tail: entitlementID[l+1:],
}
nonEmpty, err := e.grantPrimaryPrefixNonEmpty(encodeGrantPrimaryEntitlementPrefix(cand))
if err != nil {
return entitlementIdentity{}, err

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: This direct-split fallback opens a fresh Pebble iterator (grantPrimaryPrefixNonEmpty) for every O(colons²) candidate with no upper bound, unlike resolveGrantIdentityByExternalID which caps enumeration at maxGrantIDCandidates. An entitlement id with many colons and no matching entitlement record would open a large number of iterators on this bare-id edge. Consider applying a similar cap here for consistency and robustness. (low confidence)

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found.

@kans kans force-pushed the kans/ggreer/injective-keys branch 2 times, most recently from 24fba77 to 687fe2d Compare July 2, 2026 20:10
Comment thread pkg/dotc1z/engine/pebble/adapter.go
@kans kans force-pushed the kans/ggreer/injective-keys branch from 687fe2d to e47abd8 Compare July 2, 2026 20:23
Comment thread pkg/dotc1z/engine/pebble/id_index_migration.go
Comment thread pkg/dotc1z/grants.go

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found.

@kans kans force-pushed the kans/ggreer/injective-keys branch from e47abd8 to a804d65 Compare July 2, 2026 20:40
Comment thread pkg/dotc1z/engine/pebble/deferred_index.go

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking issues found — see review comments.

@kans kans changed the title injective keys - 70%+ faster expansion, 50% reduction in file size, && Soundness injective keys - 75%+ faster expansion, 50% reduction in file size, massive allocation reduction && Soundness Jul 2, 2026
@kans kans force-pushed the kans/ggreer/injective-keys branch from a804d65 to 376f9b4 Compare July 2, 2026 22:40
@kans kans force-pushed the kans/ggreer/injective-keys branch from 376f9b4 to a1fa055 Compare July 2, 2026 23:11

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking issues found — see review comments.

@kans kans force-pushed the kans/ggreer/injective-keys branch from ba5f7da to 104a04c Compare July 2, 2026 23:50

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking issues found — see review comments.

@kans kans force-pushed the kans/ggreer/injective-keys branch from 104a04c to 407f025 Compare July 3, 2026 00:07
}
pid := principal.GetId()
grantID := descEntitlement.GetId() + ":" + pid.GetResourceType() + ":" + pid.GetResource()
grantID := batonGrant.NewGrantID(principal, descEntitlement)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: newExpandedGrantWithSources checks principal == nil but not principal.GetId() == nil. batonGrant.NewGrantID panics ("principal resource must have a valid resource ID") when the principal's Id is nil (grant.go:143-145), whereas the pre-PR manual ":"-join was nil-safe. Reachable only on a malformed principal, but consider guarding principal.GetId() == nil and returning an error instead of panicking. (confidence: low)

Comment thread pkg/sync/expand/topological_merge_projection.go
@kans kans force-pushed the kans/ggreer/injective-keys branch from 407f025 to 133e3a1 Compare July 3, 2026 00:15
Comment thread pkg/dotc1z/engine/pebble/deferred_index.go

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking issues found — see review comments.

@kans kans force-pushed the kans/ggreer/injective-keys branch from 133e3a1 to ee7896f Compare July 3, 2026 00:29
@kans kans force-pushed the kans/ggreer/injective-keys branch from ee7896f to 159a719 Compare July 3, 2026 00:33

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking issues found — see review comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant