Skip to content

fix(providers): summarize per-provider errors instead of dumping err.stack#1441

Open
droplet-rl wants to merge 1 commit into
masterfrom
droplet/T90K0AL22-C03GHT4RV42-1779566476-782239
Open

fix(providers): summarize per-provider errors instead of dumping err.stack#1441
droplet-rl wants to merge 1 commit into
masterfrom
droplet/T90K0AL22-C03GHT4RV42-1779566476-782239

Conversation

@droplet-rl

Copy link
Copy Markdown
Contributor

Summary

RetryProvider.send (and the Solana QuorumFallbackSolanaRpcFactory counterpart) accumulate per-provider failures via (err as any)?.stack || err?.toString(). For ethers transport errors (SERVER_ERROR, NETWORK_ERROR, etc.), both err.stack and the stringified form embed the failed RPC URL with its API key in cleartext, alongside a multi-line ethers stack trace. That accumulator then gets concatenated into the aggregate Not enough providers succeeded. Errors: … error message and into the logQuorumMismatchOrFailureDetails warn log — both reach Slack/Datadog. So in addition to being unreadable, this is leaking provider API keys to log destinations.

Real example surfaced from the Across relayer warning logs (URL/key redacted here):

reason: "Not enough providers succeeded. Errors:\nProvider https://arb-mainnet.g.alchemy.com failed with error: Error: processing response error (body=\"{...message:\\\"execution reverted: ERC20: burn amount exceeds balance\\\"...}\", url=\"https://arb-mainnet.g.alchemy.com/v2/<API_KEY>\", code=SERVER_ERROR, version=web/5.7.1)\n    at Logger.makeError (...)\n    at Logger.throwError (...)\n    ... <truncated, ~2KB total>"

After this change, the same failure produces:

Not enough providers succeeded. Errors:
Provider https://arb-mainnet.g.alchemy.com failed with error: execution reverted: ERC20: burn amount exceeds balance

Approach

New summarizeProviderError(err) helper in src/providers/utils.ts that produces a single-line, log-safe summary. Preference order:

  1. Parsed JSON-RPC error message via the existing parseJsonRpcError (drilling into err.error if needed) — this is the actual revert reason that callers want.
  2. err.reason — ethers' short, URL-free reason string (processing response error, execution reverted, etc.).
  3. err.message, only for non-ethers errors (no err.code present) — safe because there's no transport-layer URL to leak.
  4. err.code (SERVER_ERROR, NETWORK_ERROR, …) as a last resort.

err.message is deliberately skipped for ethers-shaped errors because that's where ethers' Logger.makeError splices the failed URL into the message. The original error remains reachable via error.cause on the aggregate thrown by createSendErrorWithMessage, so any forensic detail isn't lost — it just stops being on the user-visible log path.

Wired into both errors.push([provider, …]) sites in retryProvider.ts and both in solana/quorumFallbackRpcFactory.ts. No other behavior change: failImmediate still calls parseJsonRpcError on the raw error; the original err is still thrown/cascaded as before; classification downstream (e.g. relayer's MultiCallerClient#canIgnoreRevertReason) is .includes() based and continues to match the cleaner string.

Test plan

  • 8 new unit tests for summarizeProviderError in test/providers/utils.test.ts covering: null/undefined, string inputs, JSON-RPC body extraction (top-level + nested err.error), .reason fallback, .code fallback, non-ethers .message, and an explicit URL-leak guard.
  • Existing createSendErrorWithMessage + providers.test.ts suites still pass.
  • prettier --check and eslint clean on touched files.
  • tsc -p tsconfig.build.json --noEmit clean.

Context

Found while investigating noisy MultiCallerClient#LogSimulationFailures warnings in the Across relayer. A relayer-side regex sanitizer was considered first (across-protocol/relayer#3411, now closed) but fixing it once at the source — here — is cleaner and benefits every consumer (relayer, dataworker, finalizer, monitor) with a single version bump.

🤖 Generated with Claude Code

…stack

RetryProvider.send and the Solana QuorumFallback factory accumulate
per-provider failures via `err.stack || err.toString()`. For ethers
transport errors (SERVER_ERROR, etc.), `err.stack` and the stringified
form embed the failed RPC URL with its API key in cleartext, plus a
multi-line ethers stack trace. That accumulator then gets concatenated
into the aggregate "Not enough providers succeeded" error message and
into the `logQuorumMismatchOrFailureDetails` warn log — both reach
Slack/Datadog and leak the key.

Introduce `summarizeProviderError(err)` which produces a single-line,
log-safe summary. Preference order:

  1. Parsed JSON-RPC error message (e.g. "execution reverted: ERC20:
     burn amount exceeds balance"), drilling into `err.error` if
     necessary — this is what callers actually want.
  2. `err.reason` — ethers' short, URL-free reason string.
  3. `err.message` for non-ethers errors (no `err.code` present).
  4. `err.code` (e.g. "SERVER_ERROR", "NETWORK_ERROR") as a last resort.

`err.message` is deliberately skipped for ethers-shaped errors because
that's exactly where transport-layer errors splice in the failed URL.
The original error remains reachable via `error.cause` on the aggregate
thrown by `createSendErrorWithMessage`, so forensic detail isn't lost.

Wire it into both error-collection sites in `retryProvider.ts` and both
in `solana/quorumFallbackRpcFactory.ts`. No behavior change beyond the
shape of the recorded error string.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant