worker-thread: make serializeError structured-clone-safe (transient errors must not crash the stream)#509
Open
elina-chertova wants to merge 1 commit into
Conversation
A transient upstream error (e.g. an HTTP 429) thrown out of a worker-thread stream is delivered to the host via Server.send, which postMessages it through the structured clone algorithm. serializeError copied the error's own enumerable properties verbatim, so an HttpError (whose response.headers is a Headers instance) — and any error carrying sockets/streams/other host objects — made postMessage throw DataCloneError. Server.handleSerializationFailure then collapsed it into an opaque 'stream failed with unserializable error', terminating the ingestion stream instead of delivering a proper, retryable error. Project each copied property into a clone-safe value (structuredClone, then a JSON projection that honours toJSON, then drop), preserving diagnostic context such as the HTTP status/url/body while guaranteeing the payload can cross the worker boundary. Adds a regression test that fails (DataCloneError) before the change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
A transient upstream error thrown out of a worker-thread stream can crash the
stream with an opaque error instead of being delivered to the host as a proper,
retryable error.
The data-source of an EVM hotblocks ingester runs in a worker thread. When the
upstream RPC returns an intermittent error — e.g. HTTP
429 Too Many Requests—an
HttpErrorpropagates out of the stream.Serverreports it viaServer.send, whichpostMessages the serialized error across the workerboundary using the structured clone algorithm.
serializeErrorcopied the error's own enumerable properties verbatim:@subsquid/http-client'sHttpErrorcarries aresponsewhoseheadersis aHeadersinstance.Headers(like sockets and streams) cannot bestructured-cloned, so
postMessagethrowsDataCloneError.Server.handleSerializationFailurethen replaces it with a genericError("stream failed with unserializable error, …"), which terminates theingestion stream.
Observed in production as a tight crash-loop on a rate-limited endpoint:
i.e. an intermittent upstream error is turned into a fatal stream crash, and
the real error (a retryable 429) is lost.
Fix
Project each copied error property into a structured-clone-safe value: try
structuredClone, fall back to a JSON projection (which honours anytoJSON(),so an
HttpError's status/url/body survive), and finally drop anything thatstill can't travel.
name/message/stackare unchanged. The payload is nowguaranteed to cross the worker boundary, so the host receives a faithful,
classifiable error instead of an opaque crash.
Test
util/util-internal-worker-thread/src/error.test.tsbuilds anHttpError-shapederror whose
response.headersis a realHeadersinstance and asserts theserializeErroroutput survivesstructuredClone(and round-trips throughRemoteErrorpreserving the response payload).DataCloneError: Cannot clone object of unsupported type.(red)This is a long-standing latent defect (the serialization path dates to 2025); it
only manifests under sustained transient upstream errors.
Scope / falsification
This fixes the fatality — a single provider hiccup can no longer crash-loop a
worker-thread stream. It is not a fix for the upstream rate-limiting that
triggered the crash-loop in the incident: the offchainlabs RPC serving
robinhood-mainnethotblocks is returning sustained429s, which is whyhotblocks_last_finalized_blockstopped advancing. That trigger needs anoperator action (a second data source / rate-limit relief for the network) and
is handled separately — a provider swap is a temporary mitigation, not a code
change.
Falsified if: after this change, a worker-thread stream still terminates with an
"unserializable error" on a transient
HttpError, orserializeErroroutputstill throws
DataCloneErrorfor an error carrying host objects.