fix(inventory): isolate per-bridge and per-chain failures in cross-chain transfer accounting by droplet-rl · Pull Request #3484 · across-protocol/relayer

droplet-rl · 2026-06-11T11:35:24Z

Summary

Companion to #3437 (rebalancer-client isolation). A transient outage in one bridge's or one chain adapter's outstanding-transfer read used to propagate up through InventoryClient.update and crash the relayer/rebalancer process. This PR isolates failures at two independent layers, in two commits with no inter-commit dependencies.

Motivation

zion-across-fast-relayer-rebalancer crashed on:

HttpError: HTTP 503: Service Unavailable
  at BridgeApiClient.getWithRetry
  at BridgeApi.queryL1BridgeInitiationEvents
  at TokenSplitterBridge.queryL1BridgeInitiationEvents
  at BaseChainAdapter (getOutstandingCrossChainTransfers)

The error bottoms out in InventoryClient.update, which has no catch around the cross-chain-transfer-accounting path.

Commits

1. `fix(adapter): isolate per-bridge failures in BaseChainAdapter.getOutstandingCrossChainTransfers`

Wraps each (l1Token, monitoredAddress) block in try/catch. Logs at error (at: \"<adapterName>#getOutstandingCrossChainTransfers\"). The inner Promise.all([queryL1, queryL2]) stays all-or-nothing because partial success there would skew outstanding = deposited − finalized accounting. Affected entries are absent from the cycle's result. Test reuses the existing SplitBridgeTracking scaffolding (one new it block + a new optional queryError arg on the existing MockTrackedBridge).

2. `fix(inventory): isolate per-chain failures in CrossChainTransferClient.update`

Promise.all → Promise.allSettled across chains. Fulfilled chains overwrite with fresh state; rejected chains preserve their previously-recorded state instead of being blanked — stale state biases the InventoryClient toward under-rebalancing (safer) rather than duplicate-bridging during a single failed cycle. Logs at error (at: \"CrossChainTransferClient\"). Defense in depth for any uncaught path inside the adapter layer.

Either commit stands on its own; reverting one does not break the other.

Ops note

Add Datadog alerts on either of:

at: \"<adapterName>#getOutstandingCrossChainTransfers\" (substring #getOutstandingCrossChainTransfers) with level: error
at: \"CrossChainTransferClient\" with level: error and message containing outstanding cross chain transfers

Test plan

tsc --noEmit clean
eslint + prettier clean
hardhat test test/CrossChainTransferClient.update.ts test/generic-adapters/SplitBridgeTracking.ts — 7/7 passing (5 existing + 2 new)
hardhat test test/InventoryClient.InventoryRebalance.ts test/InventoryClient.RefundChain.ts test/Monitor.ts — 66/66 passing, no regressions
Operator: wire Datadog alerts on the two new error-log at: values

🤖 Generated with Claude Code

…tandingCrossChainTransfers A transient outage in one bridge's read API (e.g. a 503 from the shared BridgeApi via TokenSplitterBridge) used to propagate up through BaseChainAdapter.getOutstandingCrossChainTransfers -> CrossChainTransferClient.update -> InventoryClient.update and crash the relayer/rebalancer process. Wrap each (l1Token, monitoredAddress) block in try/catch and log at error (at: "<adapterName>#getOutstandingCrossChainTransfers"). The inner Promise.all([queryL1, queryL2]) stays all-or-nothing because partial success there would skew outstanding = deposited - finalized; entries for the affected pair are simply absent from this cycle's result. Adds an isolation test reusing the existing SplitBridgeTracking scaffolding. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…t.update A transient outage in one chain adapter's outstanding-transfer read used to crash InventoryClient.update via this aggregator's Promise.all. Switch to Promise.allSettled across chains: fulfilled chains overwrite with fresh state; rejected chains preserve their previously-recorded state instead of being blanked, biasing the InventoryClient toward under-rebalancing (safer) rather than duplicate-bridging on a single failed cycle. Per-chain failures log at error (at: "CrossChainTransferClient"). This is independent from and complementary to the per-bridge isolation in BaseChainAdapter — defense in depth for any uncaught path in the adapter layer. Adds focused unit tests for the isolation + stale-preservation behaviors. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e2c202ae16

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-11T11:39:15Z

+    // failed chains we preserve the previously-recorded state instead of blanking it — stale state
+    // biases the InventoryClient toward under-rebalancing (safer) rather than duplicate-bridging.
+    const settled = await Promise.allSettled(
+      chainIds.map((chainId) => this.adapterManager.getOutstandingCrossChainTransfers(chainId, l1Tokens))


Wrap adapter calls so synchronous throws are settled

When AdapterManager.getOutstandingCrossChainTransfers throws before returning a promise, this map aborts before Promise.allSettled can turn the failure into a rejected result. That method does synchronous work first, including looking up the adapter and filtering tokens with getTokenInfo/adapter.supportedTokens, so a missing adapter or token-metadata/config error still makes CrossChainTransferClient.update reject and propagates back through InventoryClient.update instead of logging the failed chain and preserving stale state.

Useful? React with 👍 / 👎.

Good catch — AdapterManager.getOutstandingCrossChainTransfers does synchronous work (adapter lookup + getTokenInfo filter) before returning a promise, so a sync throw would abort the chainIds.map callback before Promise.allSettled could settle it. Fixed in 427fa45 by wrapping the callback in async, which turns sync throws into rejected promises that the per-chain failure path then logs and isolates. Added test/CrossChainTransferClient.update.ts coverage for the sync-throw case.

chatgpt-codex-connector · 2026-06-11T11:39:15Z

+            message: `Failed to fetch outstanding transfers for ${monitoredAddress.toNative()} ${l1Token.toNative()}; skipping for this cycle`,
+            error: stringifyThrownValue(error),
+          });
+          return;


Preserve stale bridge entries on partial adapter failures

When one bridge/address read fails here, the adapter returns a fulfilled but partial result; CrossChainTransferClient.update then overwrites the whole chain cache with that partial object because the chain-level promise did not reject. In a transient outage for a token that had outstanding transfers in the previous cycle, this removes that token's stale outstanding amount instead of preserving it, so inventory can treat the in-flight bridge as absent and submit a duplicate rebalance.

Useful? React with 👍 / 👎.

Agreed — the partial result would clobber the previous (address, l1Token) entry on chain-level overwrite, biasing the InventoryClient toward a duplicate rebalance for the failed bridge. Fixed in 427fa45: CrossChainTransferClient.update now forwards the previously-recorded chain state into the adapter, and BaseChainAdapter.getOutstandingCrossChainTransfers copies the previous (address, l1Token) entry into the result when a bridge read fails. Healthy bridges still produce fresh state. Added a SplitBridgeTracking test asserting the failing bridge inherits the previous cycle entry verbatim while the healthy one surfaces fresh data.

…ale entries Addresses Codex review on #3484: P2: `AdapterManager.getOutstandingCrossChainTransfers` does synchronous work (adapter lookup, getTokenInfo / supportedTokens filtering) before returning a promise. A throw there aborted the `chainIds.map` callback before `Promise.allSettled` could see it. Wrap the callback in `async` so sync throws become rejected promises and stay isolated to the failed chain. P1: On a per-bridge failure the adapter previously returned a partial result omitting that `(address, l1Token)` pair. The chain-level overwrite in `CrossChainTransferClient.update` would then blank the stale outstanding amount — risking a duplicate rebalance. Pipe the previous chain state through `AdapterManager` → `BaseChainAdapter` and, on bridge-level catch, copy the previous `(address, l1Token)` entry into the new result. Tests: - `CrossChainTransferClient.update`: new test for sync-throw isolation. - `SplitBridgeTracking`: new test asserting the failing bridge inherits the previous cycle's entry while the healthy bridge surfaces fresh state. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

droplet-rl · 2026-06-11T11:47:39Z

Addressed both Codex findings in 427fa45 and posted inline replies on each thread.

P2 (sync throws escape Promise.allSettled) — AdapterManager.getOutstandingCrossChainTransfers does sync work (adapter lookup + getTokenInfo filter) before returning a promise. Wrapped the chainIds.map callback in async so sync throws become rejected promises and route through the per-chain failure path.

P1 (partial result blanks stale entries) — Plumbed the previous chain state through CrossChainTransferClient.update → AdapterManager → BaseChainAdapter.getOutstandingCrossChainTransfers. On a bridge-level catch, the adapter now copies the previous (address, l1Token) entry into the result, so a transient failure on one bridge no longer drops the in-flight rebalance amount for that token. Healthy bridges still surface fresh state.

Added two new tests (sync-throw isolation + per-bridge stale-entry preservation). Full suite: tsc clean, prettier clean, hardhat test test/CrossChainTransferClient.update.ts test/generic-adapters/SplitBridgeTracking.ts 9/9, and test/InventoryClient.InventoryRebalance.ts test/InventoryClient.RefundChain.ts test/Monitor.ts 66/66.

@codex please re-review.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 427fa4542e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-11T11:52:21Z

+          const previousEntry = previousOutstandingTransfers?.[monitoredAddress.toNative()]?.[l1Token.toNative()];
+          if (previousEntry !== undefined) {
+            assign(outstandingTransfers, [monitoredAddress.toNative(), l1Token.toNative()], previousEntry);
+          }


Fail closed when no stale bridge entry exists

When this catch runs before the client has a prior entry for the token/address (for example after a process restart or the first update cycle), the failing bridge is omitted from a fulfilled chain result. CrossChainTransferClient.update then overwrites the chain with that partial result, and InventoryClient.getBalanceOnChain treats the missing outstanding transfer as zero, so an actual in-flight rebalance can be duplicated during a bridge outage. Please propagate an unknown/failed state for this token or chain instead of returning success with the entry absent when previousEntry is undefined.

Useful? React with 👍 / 👎.

I considered this and want to push back. Tracing through the scenarios:

Warm restart (process crashed, all previousOutstandingTransfers = undefined, real in-flight rebalances pre-restart):

Current code: bridge A succeeds with $5, bridge B fails → state = {A: $5, B: absent} → InventoryClient sees A=$5 (correct), B=0 (duplicate risk).

Fail-closed-on-cold-start: A's fresh $5 thrown away → chain state preserved as undefined → InventoryClient sees A=0 and B=0 → duplicate risk for both.

Fail-closed is strictly worse here: it loses the healthy bridge's good read without removing duplicate risk for the failed bridge.

Steady state (previous entries exist for all bridges): bridge failure already preserves the previous entry — no regression either way.

Brand-new bridge added mid-run (one bridge has no previous entry because it's genuinely new): if that new bridge fails on its first cycle, current code reports 0 (correct — it's new, no pre-cycle in-flight). Fail-closed would discard the other bridges' fresh reads to "protect" against a risk that doesn't exist.

The fundamental issue is that InventoryClient.getOutstandingCrossChainTransferAmount returns a BigNumber with no "unknown" sentinel. Per-(token, address) failure propagation would require an InventoryClient interface change so it can defer the rebalance decision on unknown state — that's a meaningful follow-up, but escalating to chain-level on cold start in the meantime makes warm-restart cases worse, not better.

The duplicate-rebalance risk for a failed-read bridge with a true-but-unknown in-flight transfer pre-process is inherent and not solvable at this layer without that API change. Happy to track it as a follow-up if useful.

droplet-rl and others added 2 commits June 11, 2026 11:34

droplet-rl requested review from bmzig, dijanin-brat, mrice32, nicholaspai and pxrl as code owners June 11, 2026 11:35

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(inventory): isolate per-bridge and per-chain failures in cross-chain transfer accounting#3484

fix(inventory): isolate per-bridge and per-chain failures in cross-chain transfer accounting#3484
droplet-rl wants to merge 3 commits into
masterfrom
droplet/T90K0AL22-C03GHT4RV42-1781175656-993129

droplet-rl commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 11, 2026

Uh oh!

droplet-rl Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 11, 2026

Uh oh!

droplet-rl Jun 11, 2026

Uh oh!

droplet-rl commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 11, 2026

Uh oh!

droplet-rl Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

droplet-rl commented Jun 11, 2026

Summary

Motivation

Commits

1. fix(adapter): isolate per-bridge failures in BaseChainAdapter.getOutstandingCrossChainTransfers

2. fix(inventory): isolate per-chain failures in CrossChainTransferClient.update

Ops note

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

droplet-rl Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

droplet-rl Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

droplet-rl commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

droplet-rl Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `fix(adapter): isolate per-bridge failures in BaseChainAdapter.getOutstandingCrossChainTransfers`

2. `fix(inventory): isolate per-chain failures in CrossChainTransferClient.update`