Skip to content

WL-0MPFGDDZ3009J2P4: Fix corrupted lock cleanup grace window scaling with exponential backoff#2003

Merged
SorraTheOrc merged 1 commit into
mainfrom
bug/WL-0MPFGDDZ3009J2P4-corrupted-lock-cleanup
May 22, 2026
Merged

WL-0MPFGDDZ3009J2P4: Fix corrupted lock cleanup grace window scaling with exponential backoff#2003
SorraTheOrc merged 1 commit into
mainfrom
bug/WL-0MPFGDDZ3009J2P4-corrupted-lock-cleanup

Conversation

@SorraTheOrc

Copy link
Copy Markdown
Member

Summary

The diagnostic regression test should log stale lock cleanup reason when lock is corrupted was timing out because the grace window for unparseable/corrupted lock files scaled with the exponential backoff delay, preventing cleanup within the default 5000ms timeout.

Root Cause

In src/file-lock.ts:340, the grace window was calculated as Math.max(currentDelay * 2, 500). Since currentDelay grows exponentially (100ms → 150ms → 225ms → ...), the grace window kept pace with the file age, making it nearly impossible for the corrupted file to ever become "old enough" to clean up before the acquisition timeout fired.

Fix

Changed the grace window to a fixed 1000ms constant. This is:

  • Safe: lock file writes (open, write, fsync, close) complete in <100ms
  • Deterministic: corrupted files are recovered in ~1 second regardless of retry timing
  • Conservative: 1000ms is 10x the typical write duration

Verification

  • The failing test now passes consistently in ~1.5s (down from ~3.5s)
  • All 73 file-lock tests pass
  • Full test suite: 158 test files, 1620 tests, all passing, no regressions

Focus for Review

The single-line change at src/file-lock.ts:340. Verify that the fixed 1000ms grace window still protects against concurrent writer races (a writer that crashes mid-write should have the lock file reclaimed after 1s, which is more than adequate).

…with exponential backoff

The grace window for unparseable/corrupted lock files was calculated as
Math.max(currentDelay * 2, 500), where currentDelay grows with exponential
backoff on each retry. This caused the grace window to keep pace with file
age, preventing corrupted lock cleanup within the default 5000ms timeout.

Fix: use a fixed 1000ms grace window instead of scaling with retry delay.
This is safe because lock file writes (open, write, fsync, close) complete
in <100ms, so 1000ms is more than adequate to guard against concurrent
writers, while ensuring corrupted files are recovered deterministically
within ~1 second.
@SorraTheOrc SorraTheOrc merged commit 818aaf2 into main May 22, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant