Skip to content

sync: Ensure initialize leaves block_metadata(tip) populated#480

Closed
Cosmos-Harry wants to merge 1 commit into
zcash:mainfrom
Cosmos-Harry:zd/initialize-scan-tip
Closed

sync: Ensure initialize leaves block_metadata(tip) populated#480
Cosmos-Harry wants to merge 1 commit into
zcash:mainfrom
Cosmos-Harry:zd/initialize-scan-tip

Conversation

@Cosmos-Harry

@Cosmos-Harry Cosmos-Harry commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

Unblocks the CI hang where shards 0–2 of the integration tests run for the full 6-hour GitHub Actions cap and get cancelled (the pattern that's been hitting #476 dispatches).

This is a targeted workaround in zallet for a deeper API gap in zcash_client_backend. See "Where the proper fix belongs" below — happy to follow up with a librustzcash change if preferred that direction, but I think we want CI unblocked first.

Root cause

The scan-range loop in initialize can legitimately complete without ever routing the tip block through put_blocks — for example when the wallet has no shielded scan work and suggest_scan_ranges returns nothing in the bands the filter accepts (Verify / ≥ Historic). update_chain_tip records the new height in the scan queue, but no BlockMetadata is committed for the tip, so:

  • WalletRead::chain_height returns Some(tip)
  • WalletRead::block_metadata(tip) returns None
  • getwalletstatus.wallet_tip — implemented as chain_height + block_metadata(chain_height) — is omitted from the response

This strands integration-tests' rebuild_cache, which spawns 8 trivially-synced wallets in parallel and waits in a while True: loop with no timeout for all of them to report a wallet_tip matching node_tip. The shards that trigger this path (those with len(test_list) > 1, which makes rpc-tests.py invoke create_cache.py) wait forever.

Empirically on recent #476 dispatches: shards 0, 1, 2 cancelled at exactly 360 min every run; shards 3–9 finished in under 2 min.

Fix in this PR

After the existing scan-range loop in initialize, if block_metadata(current_tip.height) is still None, take a fresh chain snapshot, fetch the tip block, and route it through the existing scan_block helper. ~25 lines, single file.

Behavior for the two real cases:

  • Wallets that had shielded scan work: one extra DB query (block_metadata(tip) returns Some), no scan triggered. Effectively free.
  • Trivially-synced wallets (the failing case): one block through the batch decryptor. A block with no notes belonging to this wallet completes its decryption immediately, then put_blocks commits the metadata.

The new invariant is "initialize always leaves committed metadata for the tip it reported via update_chain_tip," which seems independently sensible — every other code path that touches chain_height already assumes scan state up to that height is available.

Where the proper fix belongs

The semantic root cause is in zcash_client_backend::data_api:

  • WalletRead::chain_height() records "the height I've been told about" — independent of scan progress.
  • WalletRead::block_metadata(h) records "data for a fully scanned block."
  • There is no reader for "(height, hash) the wallet thinks the tip is at" that's symmetric to chain_height, so any caller that wants the wallet's view of the tip in (height, hash) form has to go through block_metadata — which conflates the two concepts.

The clean fix is to extend the trait:

  • Add WalletWrite::update_chain_tip_with_hash(height, hash) with a default impl that forwards to the existing height-only method (non-breaking).
  • Add WalletRead::chain_tip() -> Option<(BlockHeight, BlockHash)> with a default impl returning Ok(None) (also non-breaking).
  • Implement both in zcash_client_sqlite with a one-row chain_tip table populated by the hash-aware update path.
  • Switch zallet's getwalletstatus.wallet_tip to read chain_tip(); switch initialize to call update_chain_tip_with_hash.

This properly separates tip-awareness from scan progress and matches what wallet_tip's docstring already promises ("the wallet's view of the chain tip"). I have a draft of this change locally (~50 lines in librustzcash + ~10 lines in zallet, with a sqlite migration and a roundtrip test); happy to open it if maintainers would rather take that route now and skip this workaround.

Test plan

  • Existing integration tests pass; specifically shards 0–2 finish in <10 min instead of hitting the 6h cap.
  • No regressions on shards 3–9 (they didn't hit this path and shouldn't be affected).

The scan-range loop in `initialize` can complete without ever routing
the tip block through `put_blocks` — e.g. when the wallet has no
shielded scan work and `suggest_scan_ranges` returns nothing in the
bands the filter accepts. `update_chain_tip` then records the new height
but no `BlockMetadata` is committed for it, and any caller asking the
wallet for its view of the tip via `getwalletstatus.wallet_tip` sees
`None` indefinitely.

This strands integration-tests `rebuild_cache`, which spawns several
trivially-synced wallets in parallel and waits for all of them to
report a `wallet_tip` matching `node_tip`; shards exercising this path
hang until the GitHub Actions 6h cap cancels them.

Force a tip scan when no scan range covered it. `scan_block` is cheap
when there are no notes to detect.
@Cosmos-Harry Cosmos-Harry marked this pull request as draft June 17, 2026 03:14
@Cosmos-Harry

Copy link
Copy Markdown
Collaborator Author

Superseded by #483 — same commit (a6ff6b3), opened from the in-repo branch so the integration tests can actually dispatch. CI on this PR couldn't reach zcash/integration-tests because fork PRs don't get access to DISPATCH_APP_ID / DISPATCH_APP_PRIVATE_KEY secrets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant