fix: prevent second batch store reset wiping postage snapshot on --resync#5499
Open
martinconic wants to merge 3 commits into
Open
fix: prevent second batch store reset wiping postage snapshot on --resync#5499martinconic wants to merge 3 commits into
martinconic wants to merge 3 commits into
Conversation
acud
reviewed
Jun 10, 2026
c26bb91 to
a398078
Compare
…sync When a node is started with --resync together with the postage snapshot, the batch store was reset twice. The snapshot batch service resets and rebuilds the store from the snapshot, but the live-chain batch service was constructed with the same resync flag and reset the store a second time in its Start(), discarding the freshly loaded snapshot. Live sync then began at the snapshot block height against an empty store, so the first top-up or dilute event for any batch created before the snapshot height failed with "get batch <id>: storage: not found" and the node shut down. This is why disabling the snapshot (skip-postage-snapshot) worked around it: the store was reset only once and sync replayed from contract genesis, where every create precedes its top-up. Construct the live batch service after the snapshot path has run and pass resync only when the snapshot did not rebuild the store (o.Resync && !snapshotLoaded). The store is still reset exactly once in every other case (snapshot skipped, failed, not applicable, or a non-resync start). Closes #5495
a398078 to
69df1aa
Compare
acud
approved these changes
Jun 15, 2026
akrem-chabchoub
approved these changes
Jun 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Checklist
Description
Fixes #5495: a node started with
--resyncwhile the postage snapshot is active shuts down during sync with:Root cause
The batch store is reset twice on startup in this configuration:
snapshotBatchSvc.Start()resets the store (resync) and rebuilds it from the snapshot, then setspostageSyncStart = maxBlockHeight.batchSvcwas constructed with the sameo.Resyncflag, so itsStart()reset the store again, discarding the snapshot that was just loaded.Live sync then started from the snapshot block height against an empty store, so the first top-up/dilute event referencing a batch created before the snapshot height failed with
storage: not found, and the node shut down.This matches the report exactly: two
"resync requested, resetting batch store"log lines, and the fact thatskip-postage-snapshotworks around it (only one reset, and sync replays from contract genesis where every create precedes its top-up).Note: this is independent of #5343/#5482. The flow originates in #5094 and is still present on master; the reporter's build (
2.8.0-41d6efc6) predates the #5343 revert, but reverting #5343 does not address this.Fix
The two reset sites are not redundant — one rebuilds from the snapshot, the other is the fallback rebuild for the no-snapshot / snapshot-failed / non-mainnet cases — they were just both armed at once.
Construct the live batch service after the snapshot path has run and pass resync only when the snapshot did not rebuild the store:
The reset logic stays in its single existing place (
batchservice.Start()), exactly one service is armed to reset, and there is no runtime toggle. The store is still reset exactly once in every other case (snapshot skipped, failed, not applicable, or a non-resync start).Testing
TestResyncControlsResetinbatchserviceasserts both directions of the contractnode.gorelies on:resync=true⇒ store reset onceresync=false⇒ store not reset (the post-snapshot live-service case)go test -race ./pkg/postage/batchservice/...passesgo build ./pkg/node/...passes;go vet,gofumpt, andgolangci-lintcleanOpen API Spec Version Changes (if applicable)
None.
Related Issue
Closes #5495
AI Disclosure