fix(server): snapshot queue completion results#351
Conversation
Signed-off-by: Minh Vu <vuhoangminh97@gmail.com>
Greptile SummaryThis PR fixes a data race in
Confidence Score: 5/5Safe to merge — the change is a minimal, well-targeted struct copy that closes a specific data race without altering any observable contract. The fix is two lines under an existing lock, leaving all other behavior unchanged. The shallow copy is sufficient because the timer callback replaces entry.Ret with a new assignment rather than mutating the value it points to. The regression test directly reproduces the race scenario and confirms the snapshot is immutable after the worker completes. No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant Get
participant Mutex
participant Cache
participant TimerCallback
Note over Caller,TimerCallback: Before fix — race window
Caller->>Get: Get(hash)
Get->>Mutex: Lock
Get->>Cache: "lookup *Completion"
Cache-->>Get: live pointer p
Get->>Mutex: Unlock
Note over Get,TimerCallback: lock released, p still shared
TimerCallback->>Mutex: Lock (writes p.Status, p.Ret)
Caller->>Get: read p.Status ← RACE
Note over Caller,TimerCallback: After fix — snapshot returned
Caller->>Get: Get(hash)
Get->>Mutex: Lock
Get->>Cache: "lookup *Completion"
Cache-->>Get: live pointer p
Get->>Get: "copy = *p (struct copy under lock)"
Get->>Mutex: Unlock
Get-->>Caller: "© (independent snapshot)"
TimerCallback->>Mutex: Lock (writes p.Status, p.Ret)
Note over Caller: copy.Status unaffected ✓
Reviews (1): Last reviewed commit: "fix(server): snapshot queue completion r..." | Re-trigger Greptile |
|
/ok-to-test 765115e |
|
@fallintoplace , thank you for your contribution! |
Description
Split out from #350 at maintainer request.
TrailingDelayQueue.Getreturned the live*Completionstored in the queue cache. The queue lock protected the lookup itself, but callers read the returned pointer after the lock was released while the timer callback could still update the sameCompletion. That showed up as a race between polling/v1/topologyand request completion.This PR returns a shallow snapshot of the cached completion instead. It also adds a focused regression test that proves an accepted-result snapshot does not mutate after the worker completes.
Validation:
go test -race ./pkg/servermake qualify LINTER_BIN=/Users/hoangvu/go/bin/golangci-lintChecklist
git commit -s).