feat(gasless): typed transaction outcomes + stuck-queue submission reporting#3464
feat(gasless): typed transaction outcomes + stuck-queue submission reporting#3464droplet-rl wants to merge 4 commits into
Conversation
Typed-outcome rewrite of `sendAndConfirmTransaction` lets the gasless relayer distinguish simulation reverts from stuck-queue submission failures and emit a single `error`-level summary per run, without relying on error-message prefix matching. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 235641c054
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (!txResponse) { | ||
| return undefined; | ||
| return { status: "skipped" }; |
There was a problem hiding this comment.
Classify dispatcher send failures as submissions
When useDispatcher is true (for gasless deposits with configured deposit signers, and deposit-address dispatcher paths), TransactionClient.dispatch() returns (await submit(...))[0]; if submit() exhausts its retries after simulation passed, it returns an empty array, so this branch maps the falsy response to skipped. That prevents GaslessRelayer._sendTrackedTx() from incrementing failedSubmissions, so logRunSummary() will not page for stuck dispatcher queues even though the send failed after simulation.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — fixed. dispatchTransaction now checks for the falsy (await submit(...))[0] from TransactionClient.dispatch() and throws TransactionSubmissionFailedError, so the dispatcher path (gasless with depositSigners, and deposit-address dispatcher) increments failedSubmissions instead of being treated as skipped.
| if (err instanceof TransactionSubmissionFailedError) { | ||
| return { status: "submission_failed", error: err }; | ||
| } | ||
| throw err; |
There was a problem hiding this comment.
Return wait failures instead of crashing
For transactions that were submitted but never confirmed cleanly, TransactionClient._submit() can give up its internal confirmation loop and still return the TransactionResponse; the extra txResponse.wait() here can then reject with transient provider errors such as SERVER_ERROR/TIMEOUT. Because those errors are now rethrown instead of converted to a failed outcome, gasless and deposit-address runs can abort (and leave in-memory locks uncleared) on the same confirmation failures that previously returned undefined for retry/logging.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Agreed — fixed. txResponse.wait() now sits in its own try/catch and unknown rejections map back to { status: "skipped" }, restoring the legacy "no receipt → caller releases locks / retries" path. Only the typed TransactionSimulationFailedError / TransactionSubmissionFailedError thrown by submitTransaction / dispatchTransaction are converted to their respective outcome variants; truly unexpected errors from those still re-throw. Added a regression test that exhausts the inner confirmation loop and then rejects the outer wait() with ECONNRESET.
| await relayer.waitForDisconnect(); | ||
|
|
||
| // Surface any post-simulation submission failures observed during the run before exit. | ||
| relayer.logRunSummary(); |
There was a problem hiding this comment.
Wait for active polling before summarizing
If handover or lifetime expiry happens while an evaluateApiSignatures() tick is still awaiting a deposit/fill submission, waitForDisconnect() only aborts the interval; it does not await the fire-and-forget task scheduled by scheduleTask. Calling the summary immediately here can therefore read failedSubmissions before the in-flight _sendTrackedTx() records a post-simulation submission failure, so the new stuck-queue page is missed in exactly the shutdown window when the report is emitted.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch. _sendTrackedTx now registers its promise in a pendingTrackedSubmissions set (wrapping the underlying _doSendTrackedTx so the counter update is part of the tracked work), and waitForDisconnect Promise.allSettleds them after aborting the polling interval. The drain is bounded by sendAndConfirmTransaction (which is itself bounded by TransactionClient retries + a single outer wait()), so an in-flight _sendTrackedTx finishes incrementing failedSubmissions before logRunSummary reads it.
…t() rejects map to skipped; drain in-flight on handover
- dispatchTransaction now throws TransactionSubmissionFailedError when dispatcher.dispatch() returns falsy (empty submit() array), so stuck-queue dispatcher failures increment failedSubmissions instead of being treated as skipped.
- sendAndConfirmTransaction isolates txResponse.wait() in its own try/catch and converts unknown rejections back to {status: "skipped"}; transient provider errors after _submit gave up no longer abort the run or leak in-memory locks.
- GaslessRelayer tracks in-flight _sendTrackedTx promises and waitForDisconnect drains them with Promise.allSettled before logRunSummary reads failedSubmissions, so shutdown-window counter updates are not missed.
Codex review: #3464 (review)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@codex addressed all three P2 comments in 6f96692:
Please re-review. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6f96692be8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ): Promise<TransactionReceipt | undefined> { | ||
| const promise = this._doSendTrackedTx(tx, kind, useDispatcher); | ||
| this.pendingTrackedSubmissions.add(promise); | ||
| void promise.finally(() => this.pendingTrackedSubmissions.delete(promise)); |
There was a problem hiding this comment.
Handle rejection from the cleanup promise
When _doSendTrackedTx rejects with one of the unexpected errors that sendAndConfirmTransaction now rethrows (for example a transient provider/simulation exception), this promise.finally(...) creates a second promise that rejects with the same reason and is never observed. Even if the caller's await this._sendTrackedTx(...) is caught by the scheduled task wrapper, the unhandled rejection from the finally chain can still surface as an unhandled promise rejection; attach a catch or use then(onFulfilled, onRejected) for the cleanup path.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — fixed. void promise.finally(cleanup) was creating a finally-chained promise that re-throws the underlying rejection; with no catch attached it would surface as an unhandled rejection even though the caller awaiting promise separately handled the error. Replaced with void promise.then(cleanup, cleanup), where both handlers return undefined so the cleanup chain settles fulfilled in both branches and the original promise is still what the caller awaits.
| if (this.pendingTrackedSubmissions.size > 0) { | ||
| await Promise.allSettled([...this.pendingTrackedSubmissions]); |
There was a problem hiding this comment.
Loop until all handover submissions drain
This drains only the snapshot of pendingTrackedSubmissions that exists at the moment handover fires. A poll that is already inside processDepositMessage can add another tracked submission after this snapshot—for example after the deposit promise settles and the same state loop proceeds to submit the fill—so waitForDisconnect can return and logRunSummary() can run before that later fill/deposit failure increments failedSubmissions. Please keep draining until the set remains empty, or track the whole evaluateApiSignatures task rather than only currently registered tx promises.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Agreed — fixed both directions. waitForDisconnect now loops Promise.allSettled([...pendingTrackedSubmissions]) until the set is empty, so a tick that registers a fresh fill _sendTrackedTx after the deposit settled is still captured. To keep the loop bounded, processDepositMessage's do-while now checks abortController.signal.aborted at the top of each iteration and breaks on abort — that runs through the existing delete this.fillLock[fillKey] after-loop cleanup, releases the lock, and stops scheduling further tracked submissions so the drain reaches a fixed point.
| return { status: "submission_failed", error: err }; | ||
| } | ||
| return txResponse.wait(); | ||
| throw err; |
There was a problem hiding this comment.
Release caller locks when sends rethrow
With this rethrow, transient provider/simulation errors no longer come back as the legacy undefined/skipped result, but the gasless and deposit-address callers already take in-memory locks before awaiting the send and release them only on the non-confirmed outcome path. In those RPC-error cases the scheduled task swallows the rejection after the state machine exits, leaving entries such as fillLock/messageState or observedExecutedDeposits stuck for the rest of the process; either normalize these send failures back to a typed outcome or add finally cleanup in the affected callers.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Agreed — fixed by adding a fifth outcome variant { status: "unexpected_error", error }. sendAndConfirmTransaction now never throws: typed simulation/submission errors map to their own variants, and anything else (e.g. a throw inside willSucceed or a malformed contract on dispatcher.dispatch) becomes unexpected_error with the error preserved on the variant. Gasless _sendTrackedTx handles it with an error-level log (notificationPath: across-error) but does not increment failedSubmissions — it isn't a stuck-queue signal. DepositAddressHandler was already correct: its outcome.status !== "confirmed" early-return now also catches unexpected_error, and the existing finally block releases observedExecutedWithdraws/observedExecutedDeposits. The existing Re-throws unexpected errors… test was renamed to Returns unexpected_error (typed) for non-typed thrown errors and inverted to assert the typed outcome.
| if (response.length === 0) { | ||
| throw new Error( | ||
| throw new TransactionSubmissionFailedError( |
There was a problem hiding this comment.
Don't page mined reverts as stuck submissions
In cases where the transaction is placed but then txnResponse.wait() throws CALL_EXCEPTION for a mined revert after the pre-flight simulation passed, TransactionClient.submit() catches that _submit error and returns the same empty array as a genuine placement failure. This branch converts both cases to submission_failed, so the gasless relayer increments the stuck-queue summary and pages even though the nonce was not stuck and the tx reached chain; preserve the submit error cause or avoid counting execution reverts as submission failures.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Real concern but punting to a follow-up. The fix requires plumbing the inner _submit rejection cause out through TransactionClient.submit (which currently catches everything and returns an empty array), and submit has multiple non-gasless callers (MultiCallerClient, BaseChainAdapter, dispatch, the existing Throws on CALL_EXCEPTION test) that all rely on the "return empty array" contract — distinguishing CALL_EXCEPTION from genuine stuck-queue here is a broader TransactionClient change than the typed-outcome refactor this PR sets up.
For now: a mined revert after a passing simulation does count as a failedSubmissions entry. That's a false-positive risk on the stuck-queue page, but pre-flight simulation already filters the common deposit-side regressions, and operationally a page-then-discover-it-was-a-revert is much cheaper than missing a real stuck dispatcher. Will track separately to plumb the cause through submit and add a sixth execution_reverted variant.
…ed rejection, unexpected_error outcome
- waitForDisconnect now drains pendingTrackedSubmissions in a while loop so submissions a tick adds after the snapshot are still drained before logRunSummary reads failedSubmissions.
- processDepositMessage do-while bails on abortController.signal.aborted so the drain loop terminates: each tick can submit at most one more tracked tx after handover.
- _sendTrackedTx replaces void promise.finally(cleanup) with promise.then(cleanup, cleanup); the finally chain re-throws the underlying rejection on its returned promise, which would surface as an unhandled rejection.
- sendAndConfirmTransaction maps non-typed throws to {status: "unexpected_error", error}; gasless _sendTrackedTx logs them at error level (not counted as stuck-queue submission failures) and DepositAddressHandler's existing status !== "confirmed" check releases its in-memory locks. Removes the lock-leak risk where a transient throw from willSucceed/dispatch would abort the run mid-await.
Codex review: #3464 (review)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@codex addressed the second round in 1f6c2a6:
All 48 tests pass (TransactionClient + GaslessRelayer); typecheck + lint clean. Please re-review. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1f6c2a6e50
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| while (this.pendingTrackedSubmissions.size > 0) { | ||
| await Promise.allSettled([...this.pendingTrackedSubmissions]); |
There was a problem hiding this comment.
Drain active evaluators before logging summary
When handover happens while an evaluateApiSignatures task has already passed the abort check but is still awaiting pre-submit work (for example willSucceed before initiateDeposit), pendingTrackedSubmissions can still be empty here. Since pollAndExecute schedules evaluateApiSignatures fire-and-forget, waitForDisconnect can exit this loop and logRunSummary can run before that in-flight evaluator later adds a _sendTrackedTx promise and increments failedSubmissions, so the stuck-queue summary can still miss failures. Track/drain the evaluator promises as well, or gate _sendTrackedTx after abort, rather than only draining the current tracked-submission set.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — fixed. Rather than tracking the entire evaluateApiSignatures promise (the state machine can churn indefinitely in pathological cases), I added a hard gate in _sendTrackedTx: if abortController.signal.aborted, return undefined before registering in pendingTrackedSubmissions. A tick that had passed the do-while abort check while awaiting willSucceed / peripheral lookups now produces a no-op submission and never adds to the set, so the drain in waitForDisconnect sees the final state and logRunSummary reads the final counters.
| while (this.pendingTrackedSubmissions.size > 0) { | ||
| await Promise.allSettled([...this.pendingTrackedSubmissions]); |
There was a problem hiding this comment.
Bound shutdown waits on pending transactions
If handover occurs while a tracked EVM submission is waiting for confirmation, this new drain can block shutdown indefinitely because _sendTrackedTx ultimately calls TransactionResponse.wait() without an abort or timeout, and wait() may not settle for a dropped or never-mined transaction. In that case the process never reaches logRunSummary or Redis cleanup during handover, defeating the bounded shutdown path this code is adding. Add a timeout/abort-aware race around the drain or around tracked confirmation waits so handover can complete even when an in-flight tx remains pending.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Agreed — bounded the drain. Added a class-level SHUTDOWN_DRAIN_TIMEOUT_MS = 60_000 cap; waitForDisconnect races each Promise.allSettled cycle against a remaining-deadline timeout and breaks out with a warn (notificationPath: across-error) if it elapses. So a hung wait() on a dropped / never-mined tx no longer blocks handover indefinitely: the summary may be partial in that pathological case, but Redis cleanup and process exit still proceed.
| if (response.length === 0) { | ||
| throw new Error( | ||
| throw new TransactionSubmissionFailedError( |
There was a problem hiding this comment.
Don’t count mined reverts as submission failures
When a transaction passes simulation but later reverts while mining (for example because state changes between simulation and inclusion), TransactionClient._submit() throws on the CALL_EXCEPTION, submit() catches it and returns an empty response array, and this new wrapper classifies that as submission_failed. The gasless relayer then increments failedSubmissions and pages for a suspected stuck queue even though the transaction was actually submitted and mined; distinguish mined execution failures from empty submit results before emitting the stuck-queue signal.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fair pushback — actually implemented it. TransactionClient.submit and dispatch now accept an optional { throwOnError?: boolean }; existing non-gasless callers (MultiCallerClient, BaseChainAdapter, Disputer) keep the legacy "return empty array" contract (test Throws on CALL_EXCEPTION still passes). submitTransaction / dispatchTransaction opt in and run the thrown error through classifyPostSimulationFailure: a CALL_EXCEPTION becomes TransactionExecutionRevertedError, anything else becomes TransactionSubmissionFailedError. sendAndConfirmTransaction grows a sixth outcome variant { status: "execution_reverted", reason }; gasless _sendTrackedTx logs it at warn level without incrementing failedSubmissions so the stuck-queue counter only paged for actual placement failures. Added test coverage.
…tinguish mined reverts
- _sendTrackedTx returns undefined immediately if abortController.signal.aborted, before registering in pendingTrackedSubmissions. A tick that passed the do-while abort check while awaiting pre-submit work (willSucceed, lookups) can no longer add a late submission after the drain set has emptied.
- waitForDisconnect drains under a 60s timeout (SHUTDOWN_DRAIN_TIMEOUT_MS). If an outer wait() hangs on a never-mined tx, the drain exits with an across-error warn so handover still completes and Redis cleanup runs.
- TransactionClient.submit / dispatch grow optional throwOnError. submitTransaction / dispatchTransaction set it and classify CALL_EXCEPTION as TransactionExecutionRevertedError (mined revert) vs TransactionSubmissionFailedError (stuck queue). Other submit callers (MultiCallerClient, BaseChainAdapter, Disputer) use default {} and keep the legacy return-empty-array contract.
- sendAndConfirmTransaction grows execution_reverted outcome variant; _sendTrackedTx logs it as warn without incrementing failedSubmissions so the paging counter no longer triggers on state-race reverts.
Codex review: #3464 (review)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@codex addressed the third round in 3f2ff77:
49/49 tests pass (TransactionClient + GaslessRelayer); typecheck + lint clean. Please re-review. |
|
Codex Review: Didn't find any major issues. 🎉 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
Thanks — clean re-review noted. No further changes; this is ready for human review. |
Summary
Slack discussion:
#across-engineeringthread on adding lightweightpost-handover reporting when gasless mainnet transactions get stuck.
Two pieces:
Typed transaction outcomes in
src/utils/TransactionUtils.ts.submitTransaction/dispatchTransactionnow throwTransactionSimulationFailedError/TransactionSubmissionFailedErrorsubclasses (still
Error-shaped — all ~15 direct callers unchanged).sendAndConfirmTransactionreturns a discriminatedTransactionOutcomeunion —
{ confirmed | skipped | simulation_failed | submission_failed }—instead of
TransactionReceipt | undefined. Unexpected errors arere-thrown rather than silently swallowed.
Gasless relayer stuck-queue reporting. A
_sendTrackedTxwrapper onGaslessRelayerconsumes the new outcome through an exhaustive switch(with a
neverarm), increments a per-(chainId, deposit|fill)counteronly on
submission_failed, and falls back to the existingTransactionReceipt | undefinedcontract for the two call sites(
initiateDeposit,initiateFill).index.tscalls a newlogRunSummary()afterwaitForDisconnect()returns, which emits oneerror-level log (withnotificationPath: across-error) when any chainexceeded zero stuck submissions, or no-ops otherwise.
The compiler — not a string prefix — is what keeps detection wired up.
Renaming/removing a
TransactionOutcomevariant breaks every consumer'sswitch; changing the typed-error classes breaks the
instanceofchain insendAndConfirmTransaction; either path lands red in CI rather thansilently dropping counts.
Scope notes
skippedsends are intentionally not counted— matches the spec: "do not include cases where simulation failed".
SvmFillerClient.executeFillImmediately) go through a separatepath and remain untracked. The reported incident was EVM-only; happy to
add SVM tracking in a follow-up if useful.
cross-run persistence.
DepositAddressHandler's threesendAndConfirmTransactioncallers wereupdated to consume the discriminated outcome — semantics unchanged, just
one extra
outcome.status === "confirmed"check per site.Doc updates considered
Per
AGENTS.md, considered targeted updates and decided no docs need tomove:
src/gasless/README.mdexists today; this PR doesn't introduceenough public surface to warrant one.
AGENTS.mdalready namesgaslessat a directory level — noshift in that summary.
src/clients/README.md(if/when written) is the natural home for aTransactionOutcomereference; out of scope for this change.Test plan
yarn typecheck— clean.yarn lint— clean (one prettier nit auto-fixed).yarn hardhat test test/TransactionClient.ts— 14 passing,including 4 new cases under "sendAndConfirmTransaction outcome typing"
covering confirmed / simulation_failed / submission_failed /
unexpected-error-rethrown.
yarn hardhat test test/DepositAddressHandler.ts test/DepositAddressHandler.publish.ts— 10 passing (no regression from the callsite migration).
healthy; surfaces an
errorlog keyedGaslessRelayer#logRunSummarywith
failedByChainwhen a chain accumulates stuck-queue submissions.🤖 Generated with Claude Code