fix(proxy): clear transfer latch after unsafe attempt#24979
Conversation
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
There was a problem hiding this comment.
Pull request overview
This PR fixes a proxy tunnel transfer deadlock where Deliver() could mark a tunnel as inTransfer, but an unsafe async transfer attempt could return early before finishTransfer() cleared the latch—causing subsequent drain polls to skip the tunnel indefinitely and blocking CN draining. It also makes transfer-intent metric updates idempotent and ensures sync transfers respect the same inTransfer latch to prevent overlapping migration attempts.
Changes:
- Make
setTransferIntentidempotent via atomic swap to avoid double-inc/dec of the transfer-intent gauge. - Add explicit helpers to acquire/release the
inTransferlatch for transfer attempts, and ensure unsafe async early-return clears the latch. - Ensure
transferSyncalso acquires/releases the latch and add tests covering unsafe async retryability, sync latch behavior, and metric idempotency.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| pkg/proxy/tunnel.go | Clears inTransfer on unsafe async early-return, adds sync transfer latch acquisition, and makes transfer-intent gauge updates idempotent. |
| pkg/proxy/tunnel_test.go | Adds tests for retry behavior after unsafe transfer, sync latch behavior, and idempotent gauge updates. |
What type of PR is this?
Which issue(s) this PR fixes:
issue #24942
What this PR does / why we need it:
When a proxy scaling transfer cannot start safely, for example because the session is still in a transaction, the async transfer attempt returned before
finishTransferwas registered.Deliverhad already marked the tunnel asinTransfer, so later drain polls skipped the tunnel forever and the CN drain could remain blocked until the connection closed.This PR clears the transfer latch on that unsafe early-return path while preserving
transferIntent, so later retries or intent-driven transfer can still happen. It also makes transfer intent metric updates idempotent and has sync transfer acquire the same transfer latch to avoid overlapping async/sync migration attempts.Tests cover:
inTransferand can be re-enqueued;