Skip to content

fix(svm): resolve slot timestamps via getBlocks instead of per-slot walk#1447

Open
droplet-rl wants to merge 4 commits into
masterfrom
droplet/T90K0AL22-C03GHT4RV42-1779870798-393759
Open

fix(svm): resolve slot timestamps via getBlocks instead of per-slot walk#1447
droplet-rl wants to merge 4 commits into
masterfrom
droplet/T90K0AL22-C03GHT4RV42-1779870798-393759

Conversation

@droplet-rl

Copy link
Copy Markdown
Contributor

Summary

  • SVMSpokePoolClient.getTimestampForBlock() and getNearestSlotTime() recover from skipped slots by decrementing _slot one at a time and re-calling getBlockTime. When a target slot lands inside a real skipped-slot run on Solana mainnet, the cost compounds — every BundleDataClient.getBundleBlockTimestamps lookup × NODE_QUORUM providers × propose/validate/execute pass.
  • In a production dataworker run on 2026-05-27, the bundle boundary fell inside slots 422461540–422461547 (a real ~8-slot skip run on mainnet). The SVM executor primary kept calling getBlockTime for 57+ minutes against the same 8 slots before the serverless-orchestration hub's rejectSpokeDelay fired a retry that then raced the primary.
  • Replace the per-slot walk with findNearestProducedSlot(provider, targetSlot): a single getBlocks(lower, upper) call asks the RPC which slots in a 32-slot window produced a block, then iterates the window backwards in fixed steps if the initial range is empty.
  • findNearestProducedSlot is exported from arch/svm/SpokeUtils.ts and used by both SVMSpokePoolClient.getTimestampForBlock and getNearestSlotTime.
  • Also memoise per-block timestamp lookups inside SVMSpokePoolClient: bundle data loading hits the same boundary slots repeatedly across propose/validate/execute passes; storing the Promise (not the resolved value) also dedupes concurrent lookups.

Why this is safe

  • getBlocks is a standard Solana JSON-RPC method (GetBlocksApi in @solana/kit; in the spec since v1.7 / mid-2021 when getConfirmedBlocks was renamed). Verified live against api.mainnet-beta.solana.com and shape-checked against the kit type surface.
  • Benchmarked latency on mainnet-beta is flat ~75 ms up to ~1024 slot windows, +5–10 ms per doubling past that. A 32-slot window is one round-trip ~75 ms, vs the current path that issues 8+ getBlockTime RPCs × 2 providers per single resolution.
  • Iteration is bounded: maxIterations × window (default 32 × 32 = 1024 slots backward). If we still find nothing, the function returns undefined and the caller throws the same Unable to resolve time… error the old code did.
  • Function signatures of getTimestampForBlock and getNearestSlotTime are unchanged; this is a pure implementation swap.

Test plan

  • yarn lint-check clean on touched files.
  • yarn build (tsc) clean.
  • New unit test test/Solana.findNearestProducedSlot.ts covers: produced slot in initial window, iterate backwards when initial window is empty, return undefined when no produced slot inside the iteration bound, stop descending past slot 0.
  • CI to run the full suite under yarn test.

Follow-ups (separate PRs)

  • The relayer-side companion fixes for the SVM duplicate-execution race that this stall triggered are tracked separately in across-protocol/relayer.

🤖 Generated with Claude Code

SVMSpokePoolClient.getTimestampForBlock() and getNearestSlotTime() both
walked backwards one slot at a time via getBlockTime() to recover from
skipped slots. When a bundle boundary lands inside a skipped-slot run
(observed on mainnet at slots 422461540-422461547), every BundleDataClient
lookup for that boundary fans out into N getBlockTime() RPCs per call
times NODE_QUORUM providers, repeated across every propose/validate/execute
pass. In a production dataworker run on 2026-05-27 this kept the SVM
executor primary alive for 57+ minutes making nothing but getBlockTime
calls against the same 8-slot range, ultimately tripping the serverless
hub's rejectSpokeDelay and triggering a retry that raced the primary.

Replace the per-slot walk with findNearestProducedSlot(), which issues a
single getBlocks() call to ask the RPC which slots in a 32-slot window
produced a block, then iterates the window backwards in fixed steps if
nothing is produced in the initial range. getBlocks() latency is flat
(~75 ms) up to ~1024 slots on every provider we benchmarked, so the
recovery cost is bounded regardless of skip-run length. Also memoise
per-block timestamp lookups inside SVMSpokePoolClient so repeated
boundary-slot queries across a single dataworker run resolve once.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread src/arch/svm/SpokeUtils.ts

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 05a4e05372

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/clients/SpokePoolClient/SVMSpokePoolClient.ts
Attach a .catch() on the cached promise so a transient RPC failure in
findNearestProducedSlot/getBlocks (or any other rejection from the
resolver) clears the cache entry instead of poisoning getTimestampForBlock
for the lifetime of the client.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@droplet-rl

Copy link
Copy Markdown
Contributor Author

Addressed both review comments in 73ae3a6:

  • pxrl (SpokeUtils.ts:248, "is it guaranteed that any elements in the response are actually blocks?") — replied inline. By the Solana RPC contract for getBlocks the response is a list of produced-block slots in [lower, upper] (ascending), typed as Slot[] in @solana/kit, so after the blocks.length > 0 gate blocks[blocks.length - 1] is guaranteed to be a produced slot. No code change needed.
  • codex (SVMSpokePoolClient.ts:205, P2, evict failed timestamp lookups) — fixed. Attached cached.catch(() => this.timestampForBlockCache.delete(slot)) when the promise is first stored, so any rejection (RPC error from findNearestProducedSlot/getBlocks, undefined producedSlot, undefined block time) evicts the entry. Dropped the now-redundant explicit delete() from _resolveTimestampForBlock's undefined-timestamp branch — the catch handles it.

yarn build:types and yarn hardhat test test/Solana.findNearestProducedSlot.ts both pass. Commit pushed via checkpoint.

Comment thread src/arch/svm/SpokeUtils.ts Outdated
Comment thread src/arch/svm/SpokeUtils.ts Outdated
Comment thread src/arch/svm/SpokeUtils.ts Outdated
Comment thread test/Solana.findNearestProducedSlot.ts
Comment thread src/clients/SpokePoolClient/SVMSpokePoolClient.ts Outdated
- Trim verbose docstrings on findNearestProducedSlot and its defaults.
- Loop uses ++i and `break` (over `return undefined`) so the function has
  a single trailing exit.
- Invert the timestamp check in _resolveTimestampForBlock for a positive
  early return.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@droplet-rl

Copy link
Copy Markdown
Contributor Author

Addressed all five inline comments from this review in c612662:

  • if (lower === 0n) { break; } — applied.
  • for (let i = 0; i < maxIterations; ++i) — applied.
  • "Comments very verbose" — collapsed the block comments on findNearestProducedSlot and its defaults to one line each, and tightened the new .catch() comment in SVMSpokePoolClient.
  • Test path scoping — replied; there's no test/svm/ today, existing SVM tests use the Solana.* filename prefix at test/ root (Solana.setup.ts, Solana.EventData.test.ts, etc.) and the new file follows that convention. Offered to move them all as a follow-up if the team wants to standardise on subdirectories.
  • Invert isDefined(timestamp) check — applied (positive early return, throw falls through). The explicit timestampForBlockCache.delete(slot) from the suggestion was based on the pre-73ae3a6 code; the eviction is now handled by cached.catch(...) attached in getTimestampForBlock, so I left the redundant delete out and called that out in the reply.

yarn build:types and the findNearestProducedSlot tests still pass. Pushed via checkpoint.

The previous commit unconditionally issued getBlocks() + getBlockTime() for
every slot lookup, regressing the common case (target slot has a block)
from 1 RPC to 2. Restore the 1-RPC happy path in both
SVMSpokePoolClient.getTimestampForBlock() and getNearestSlotTime() by
probing getTimestampForSlot() for the target slot first, only falling
through to findNearestProducedSlot() when the target itself was skipped.

Also evict the per-block timestamp cache on rejection so transient RPC
errors don't poison the cache for the rest of the run.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread src/arch/svm/SpokeUtils.ts
Comment thread src/clients/SpokePoolClient/SVMSpokePoolClient.ts
@droplet-rl

Copy link
Copy Markdown
Contributor Author

⏳ Stopped 1 background subagent(s) at the 30m drain budget. Their results were not gathered:
• yarn install for sdk

✅ 3 background subagent(s) finished after the main reply:
• Run typecheck in sdk dir
• Run findNearestProducedSlot tests
• Background command "Reinstall" completed (exit code 0)

If you need the stopped results, ask again — the agent can re-run those subagents.

@droplet-rl droplet-rl force-pushed the droplet/T90K0AL22-C03GHT4RV42-1779870798-393759 branch from 53c8055 to 204db04 Compare May 27, 2026 10:09
@pxrl

pxrl commented May 27, 2026

Copy link
Copy Markdown
Collaborator

@codex review

@droplet-rl

Copy link
Copy Markdown
Contributor Author

No action needed from me — that comment is a @codex review invocation aimed at the chatgpt-codex-connector bot, which will post its own review pass. I'll wait for any actionable feedback from codex (or a follow-up from you) before making further changes.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@droplet-rl

Copy link
Copy Markdown
Contributor Author

Codex came back clean ("Didn't find any major issues") on the current HEAD (204db04). No further code changes warranted from this trigger. PR is ready for human review.

@droplet-rl

Copy link
Copy Markdown
Contributor Author

Approval received from pxrl on review 4371702294 — no inline comments, empty body. PR is approved at HEAD 204db04. No code changes needed; ready for merge whenever you'd like.

Comment thread src/clients/SpokePoolClient/SVMSpokePoolClient.ts
@droplet-rl

Copy link
Copy Markdown
Contributor Author

Question, not a change request — replied inline on the PR. Short version: JS is single-threaded so the get/set pair runs atomically inside one microtask; concurrent callers dedupe onto the same in-flight Promise. The only nuance is a small window between rejection settling and the attached .catch() evicting the entry, during which another caller can grab the rejected Promise — that's intentional (caller still observes the correct error, next post-eviction call triggers a fresh resolver). No additional code change.

@droplet-rl droplet-rl force-pushed the droplet/T90K0AL22-C03GHT4RV42-1779870798-393759 branch from 204db04 to 53c8055 Compare May 27, 2026 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants