evm-rpc: trace state diffs per-transaction when a block's whole-block response is too big#507
Conversation
… response is too big A very large block's debug_traceBlockByNumber prestateTracer (diffMode) response can exceed the provider's JSON-RPC response size cap (commonly 160 MiB), which geth/erigon and managed providers (alchemy, dwellir, uniblock) reject with -32008 "Response is too big" / "Exceeded max limit". The block batch can't be split below a single block, so the dumper stalled /crash-looped forever on that block, emitting no new data. Detect the size-cap error and fall back to per-transaction debug_traceTransaction, where each response is small, then reassemble the per-block state diff. Mirrors the existing tolerant trace paths instead of treating an oversized response as fatal.
|
Same root cause recurred today — fresh live evidence confirming this fix is still needed (and not yet deployed). Alert: Stuck block 42971579 (
Per-transaction |
|
Re-paged again today (2nd recurrence in ~8h). Same root cause, fix is already in this PR — just not merged/deployed yet. Alert: Stuck block 43012893 ( New independent cross-check this run: besides dwellir (active archive endpoint), the official public endpoint Requesting review/merge + an |
Cause (proven)
The
base-sepoliaEVM dumper (dump-base-sepolia-0, nsevm-archive) stalled — block rate collapsed to 0–2 blocks/sec with a 100h+ ETA, stuck around block 42625944 (0x28a6b98).That block is huge (4676 txns, ~28 MB callTracer). The dumper's
addDebugStateDiffsissues a whole-blockdebug_traceBlockByNumberwithprestateTracer+diffMode. Every probed provider rejects that one block's response as oversized:Confirmed against alchemy, dwellir (the newly-configured base-sepolia archive endpoint), and uniblock. A block batch can't be split below a single block, so the dumper retried/crash-looped on that block forever, emitting no new data.
Probing the per-transaction path (
debug_traceTransactionprestateTracer/diffMode) on txns of the same block returns small responses (~4.7 KB, ~0.11s each) — so the data is fetchable, just not as one whole-block response.Fix
In
evm/evm-rpc/src/rpc.ts(@subsquid/evm-rpc, used by the dumper):-32008/ "Response is too big" / "Exceeded max limit") in the whole-block state-diffvalidateError, returning aRESPONSE_TOO_BIGsentinel instead of throwing.traceStateDiffsPerTransaction: trace each tx individually viadebug_traceTransactionand reassemble the per-blockDebugStateDiffResult[](reattaching tx hashes, since the per-tx call returns a bare diff with no envelope).This is distinct from #505, which retries the transient
-32020 "response too large"; that handling explicitly excludes the persistent oversized single-block case fixed here.Test
Added
evm/evm-rpc/test/state-diff-too-big.test.ts(+MockRpcClienterror/validateErrorsupport):rpc.ts: throwsRpcError: Response is too big(reproduces the production crash).@subsquid/evm-rpcsuite: 148 tests pass,tscclean. Rush change file included.🤖 Generated with Claude Code