Skip to content

worker-thread: make serializeError structured-clone-safe (transient errors must not crash the stream)#509

Open
elina-chertova wants to merge 1 commit into
masterfrom
alert-fix/B4TUWF-worker-thread-error-serialization
Open

worker-thread: make serializeError structured-clone-safe (transient errors must not crash the stream)#509
elina-chertova wants to merge 1 commit into
masterfrom
alert-fix/B4TUWF-worker-thread-error-serialization

Conversation

@elina-chertova

Copy link
Copy Markdown
Contributor

Problem

A transient upstream error thrown out of a worker-thread stream can crash the
stream with an opaque error instead of being delivered to the host as a proper,
retryable error.

The data-source of an EVM hotblocks ingester runs in a worker thread. When the
upstream RPC returns an intermittent error — e.g. HTTP 429 Too Many Requests
an HttpError propagates out of the stream. Server reports it via
Server.send, which postMessages the serialized error across the worker
boundary using the structured clone algorithm.

serializeError copied the error's own enumerable properties verbatim:

for (key in err) {
    (ser as any)[key] = err[key]
}

@subsquid/http-client's HttpError carries a response whose headers is a
Headers instance. Headers (like sockets and streams) cannot be
structured-cloned
, so postMessage throws DataCloneError.
Server.handleSerializationFailure then replaces it with a generic
Error("stream failed with unserializable error, …"), which terminates the
ingestion stream.

Observed in production as a tight crash-loop on a rate-limited endpoint:

sqd:evm-data-service/data-source  connection failure  HttpError: Got 429 …  method: eth_getBlockByNumber["finalized"]
sqd:data-service  data ingestion terminated, will restart in 0 seconds
  err.stack: Error: stream failed with unserializable error, stack: HttpError: Got 429 …
             at Server.handleSerializationFailure (util-internal-worker-thread/lib/server.js)

i.e. an intermittent upstream error is turned into a fatal stream crash, and
the real error (a retryable 429) is lost.

Fix

Project each copied error property into a structured-clone-safe value: try
structuredClone, fall back to a JSON projection (which honours any toJSON(),
so an HttpError's status/url/body survive), and finally drop anything that
still can't travel. name/message/stack are unchanged. The payload is now
guaranteed to cross the worker boundary, so the host receives a faithful,
classifiable error instead of an opaque crash.

Test

util/util-internal-worker-thread/src/error.test.ts builds an HttpError-shaped
error whose response.headers is a real Headers instance and asserts the
serializeError output survives structuredClone (and round-trips through
RemoteError preserving the response payload).

  • Before: DataCloneError: Cannot clone object of unsupported type. (red)
  • After: passes (green)
cd util/util-internal-worker-thread && vitest --run src/error.test.ts

This is a long-standing latent defect (the serialization path dates to 2025); it
only manifests under sustained transient upstream errors.

Scope / falsification

This fixes the fatality — a single provider hiccup can no longer crash-loop a
worker-thread stream. It is not a fix for the upstream rate-limiting that
triggered the crash-loop in the incident: the offchainlabs RPC serving
robinhood-mainnet hotblocks is returning sustained 429s, which is why
hotblocks_last_finalized_block stopped advancing. That trigger needs an
operator action (a second data source / rate-limit relief for the network) and
is handled separately — a provider swap is a temporary mitigation, not a code
change.

Falsified if: after this change, a worker-thread stream still terminates with an
"unserializable error" on a transient HttpError, or serializeError output
still throws DataCloneError for an error carrying host objects.

A transient upstream error (e.g. an HTTP 429) thrown out of a
worker-thread stream is delivered to the host via Server.send, which
postMessages it through the structured clone algorithm. serializeError
copied the error's own enumerable properties verbatim, so an HttpError
(whose response.headers is a Headers instance) — and any error carrying
sockets/streams/other host objects — made postMessage throw
DataCloneError. Server.handleSerializationFailure then collapsed it into
an opaque 'stream failed with unserializable error', terminating the
ingestion stream instead of delivering a proper, retryable error.

Project each copied property into a clone-safe value (structuredClone,
then a JSON projection that honours toJSON, then drop), preserving
diagnostic context such as the HTTP status/url/body while guaranteeing
the payload can cross the worker boundary.

Adds a regression test that fails (DataCloneError) before the change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant