fix(lock): implement Redlock single-instance pattern in LockManagerService#537
fix(lock): implement Redlock single-instance pattern in LockManagerService#537romanetar wants to merge 1 commit into
Conversation
📝 WalkthroughWalkthroughThe PR adds an atomic compare-and-delete cache method to ICacheService, implements it in Redis via a Lua eval, updates Redis conditional set usages, refactors LockManagerService to use per-call random ownership tokens with exponential backoff and ownership-aware releases, and adds tests validating ownership semantics. ChangesOwnership-Based Distributed Locking
Sequence DiagramsequenceDiagram
participant Client
participant LMS as LockManagerService
participant RCS as RedisCacheService
participant Redis
rect rgba(100, 150, 200, 0.5)
Note over Client,Redis: Lock Acquisition (Success Path)
Client->>LMS: acquireLock(name, lifetime)
LMS->>LMS: Generate token = random_bytes(16)
loop Exponential Backoff Retry Loop
LMS->>RCS: addSingleValue(name, token, lifetime)
RCS->>Redis: SET name token NX EX lifetime
Redis-->>RCS: OK (or nil)
alt First attempt succeeds
RCS-->>LMS: true
LMS->>LMS: Store $tokens[name] = token
LMS-->>Client: Acquisition complete
else Retry needed
RCS-->>LMS: false
LMS->>LMS: Sleep with exponential backoff
end
end
end
rect rgba(200, 150, 100, 0.5)
Note over Client,Redis: Lock Release (Ownership Check)
Client->>LMS: releaseLock(name)
LMS->>LMS: Check $tokens[name] exists?
alt Owned by this process
LMS->>RCS: deleteIfValueMatches(name, token)
RCS->>Redis: EVAL Lua script (compare & delete)
Redis->>Redis: GET name == token?
alt Values match
Redis->>Redis: DEL name
Redis-->>RCS: 1 (deleted)
else Mismatch (lock stolen)
Redis-->>RCS: 0 (not deleted)
end
RCS-->>LMS: true/false
LMS->>LMS: Unset $tokens[name]
LMS-->>Client: Released
else Not owned
LMS-->>Client: Return (no-op)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
📘 OpenAPI / Swagger preview ➡️ https://OpenStackweb.github.io/summit-api/openapi/pr-537/ This page is automatically updated on each push to this PR. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
app/Services/Utils/LockManagerService.php (1)
62-70: 💤 Low valueMinor: Unnecessary sleep before throwing on final retry.
When
attempt >= MaxRetries - 1, the code still executesusleepbefore throwing. This adds ~400ms of unnecessary delay on the final failed attempt.Consider moving the retry check before the sleep:
Suggested reorder
- $wait_interval = (int)(self::BackOffBaseInterval * (self::BackOffMultiplier ** $attempt)); - Log::debug(sprintf("LockManagerService::acquireLock name %s retrying in %s µs (attempt %s)", $name, $wait_interval, $attempt)); - usleep($wait_interval); if ($attempt >= (self::MaxRetries - 1)) { Log::error(sprintf("LockManagerService::acquireLock name %s lifetime %s ERROR MAX RETRIES attempt %s", $name, $lifetime, $attempt)); throw new UnacquiredLockException(sprintf("lock name %s", $name)); } + $wait_interval = (int)(self::BackOffBaseInterval * (self::BackOffMultiplier ** $attempt)); + Log::debug(sprintf("LockManagerService::acquireLock name %s retrying in %s µs (attempt %s)", $name, $wait_interval, $attempt)); + usleep($wait_interval); ++$attempt;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/Services/Utils/LockManagerService.php` around lines 62 - 70, The loop in LockManagerService::acquireLock sleeps (usleep) even when $attempt >= (self::MaxRetries - 1), causing an unnecessary delay before throwing UnacquiredLockException; reorder the logic so the check for final retry (if $attempt >= (self::MaxRetries - 1)) occurs before calling usleep and before incrementing $attempt, log and throw immediately on final attempt, otherwise perform the usleep, increment $attempt and continue the loop to preserve backoff behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@app/Services/Utils/LockManagerService.php`:
- Around line 62-70: The loop in LockManagerService::acquireLock sleeps (usleep)
even when $attempt >= (self::MaxRetries - 1), causing an unnecessary delay
before throwing UnacquiredLockException; reorder the logic so the check for
final retry (if $attempt >= (self::MaxRetries - 1)) occurs before calling usleep
and before incrementing $attempt, log and throw immediately on final attempt,
otherwise perform the usleep, increment $attempt and continue the loop to
preserve backoff behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a4fb76b4-55ed-4ec3-a248-fdf0d070fc95
📒 Files selected for processing (4)
Libs/Utils/ICacheService.phpapp/Services/Utils/LockManagerService.phpapp/Services/Utils/RedisCacheService.phptests/Unit/Services/LockManagerServiceOwnershipTest.php
b3dbd7a to
acb8447
Compare
|
📘 OpenAPI / Swagger preview ➡️ https://OpenStackweb.github.io/summit-api/openapi/pr-537/ This page is automatically updated on each push to this PR. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/Unit/Services/LockManagerServiceOwnershipTest.php (1)
139-144: 💤 Low valueConsider asserting the token is non-empty for stronger ownership guarantee coverage.
Mockery::type('string')accepts any string including''. A complementary assertion withMockery::on(fn($v) => strlen($v) >= 16)(or similar) would confirm the service is actually generating a meaningful random token rather than an empty or trivial value.♻️ Tighter token constraint
- ->with('test.lock', Mockery::type('string'), 3600) + ->with('test.lock', Mockery::on(fn(string $v) => strlen($v) >= 16), 3600)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/Unit/Services/LockManagerServiceOwnershipTest.php` around lines 139 - 144, Replace the loose Mockery::type('string') expectation in LockManagerServiceOwnershipTest (the mock of ICacheService used with addSingleValue) with a stricter constraint that asserts the token is non-empty/strong (e.g. Mockery::on(fn($v) => is_string($v) && strlen($v) >= 16)) or add an additional expectation using Mockery::on to verify token length, keeping the same call to addSingleValue and the deleteIfValueMatches expectation; target the mock for addSingleValue on the ICacheService to ensure the generated token is meaningful rather than allowing an empty string.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/Unit/Services/LockManagerServiceOwnershipTest.php`:
- Around line 139-144: Replace the loose Mockery::type('string') expectation in
LockManagerServiceOwnershipTest (the mock of ICacheService used with
addSingleValue) with a stricter constraint that asserts the token is
non-empty/strong (e.g. Mockery::on(fn($v) => is_string($v) && strlen($v) >= 16))
or add an additional expectation using Mockery::on to verify token length,
keeping the same call to addSingleValue and the deleteIfValueMatches
expectation; target the mock for addSingleValue on the ICacheService to ensure
the generated token is meaningful rather than allowing an empty string.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 757f3521-79bb-45c3-a48d-09cc2ce97243
📒 Files selected for processing (4)
Libs/Utils/ICacheService.phpapp/Services/Utils/LockManagerService.phpapp/Services/Utils/RedisCacheService.phptests/Unit/Services/LockManagerServiceOwnershipTest.php
🚧 Files skipped from review as they are similar to previous changes (3)
- app/Services/Utils/RedisCacheService.php
- app/Services/Utils/LockManagerService.php
- Libs/Utils/ICacheService.php
…rvice Signed-off-by: romanetar <roman_ag@hotmail.com>
acb8447 to
c6e6473
Compare
|
📘 OpenAPI / Swagger preview ➡️ https://OpenStackweb.github.io/summit-api/openapi/pr-537/ This page is automatically updated on each push to this PR. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/Unit/Services/LockManagerServiceOwnershipTest.php (1)
139-145: ⚡ Quick winAssert token identity across acquire and release, not just token type.
At Line [142] and Line [144], the test validates string token creation and release call count, but it does not verify
deleteIfValueMatchesreceives the same token captured duringaddSingleValue. A token-mismatch regression could still pass.Proposed test hardening
public function testAddSingleValueCalledOnceWithTokenAndLifetime(): void { $cache = Mockery::mock(ICacheService::class); + $token = null; $cache->shouldReceive('addSingleValue') ->once() - ->with('test.lock', Mockery::type('string'), 3600) + ->with( + 'test.lock', + Mockery::on(function ($value) use (&$token) { + if (!is_string($value) || $value === '') return false; + $token = $value; + return true; + }), + 3600 + ) ->andReturn(true); - $cache->shouldReceive('deleteIfValueMatches')->once()->andReturn(true); + $cache->shouldReceive('deleteIfValueMatches') + ->once() + ->with('test.lock', Mockery::on(fn($value) => $value === $token)) + ->andReturn(true);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/Unit/Services/LockManagerServiceOwnershipTest.php` around lines 139 - 145, The test currently only asserts the token is a string and that deleteIfValueMatches is called, but not that it's the same token; update LockManagerServiceOwnershipTest to capture the token passed to ICacheService::addSingleValue (use Mockery capture or an on/closure) and then assert ICacheService::deleteIfValueMatches is invoked with the same captured token (e.g., expect deleteIfValueMatches('test.lock', <capturedToken>)). Keep addSingleValue and deleteIfValueMatches expectations tied to the captured variable so the test fails on token mismatches.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/Unit/Services/LockManagerServiceOwnershipTest.php`:
- Around line 139-145: The test currently only asserts the token is a string and
that deleteIfValueMatches is called, but not that it's the same token; update
LockManagerServiceOwnershipTest to capture the token passed to
ICacheService::addSingleValue (use Mockery capture or an on/closure) and then
assert ICacheService::deleteIfValueMatches is invoked with the same captured
token (e.g., expect deleteIfValueMatches('test.lock', <capturedToken>)). Keep
addSingleValue and deleteIfValueMatches expectations tied to the captured
variable so the test fails on token mismatches.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b5de5ba1-752b-4d1c-9d24-3a6a5e3e917a
📒 Files selected for processing (4)
Libs/Utils/ICacheService.phpapp/Services/Utils/LockManagerService.phpapp/Services/Utils/RedisCacheService.phptests/Unit/Services/LockManagerServiceOwnershipTest.php
🚧 Files skipped from review as they are similar to previous changes (3)
- Libs/Utils/ICacheService.php
- app/Services/Utils/LockManagerService.php
- app/Services/Utils/RedisCacheService.php
| $prop->setAccessible(true); | ||
| $this->assertEmpty($prop->getValue($service)); | ||
| } | ||
| } |
There was a problem hiding this comment.
Missing test: Redis unavailable during releaseLock
There is no test for the case where deleteIfValueMatches returns false (Redis connection fails). When that happens the lock is not deleted from Redis but unset($this->tokens[$name]) still runs — the local token handle is gone. The lock persists in Redis until TTL expiry with no way to retry the release from application code.
Suggested outline:
public function testReleaseLockWhenRedisDownLeavesKeyUntilTtl(): void
{
$cache = Mockery::mock(ICacheService::class);
$cache->shouldReceive('addSingleValue')->once()->andReturn(true);
// simulate Redis unavailable — deleteIfValueMatches returns false
$cache->shouldReceive('deleteIfValueMatches')->once()->andReturn(false);
$lock = new LockManagerService($cache);
$lock->lock('resource.lock', fn() => null);
// Observable effect: a subsequent acquireLock on the same key will fail
// because the key was NOT deleted — only TTL expiry can free it.
}Documenting this as a test makes the failure mode explicit for future readers.
| return $conn->set($key, $value, 'NX') !== null; | ||
| }, false); | ||
| } | ||
|
|
There was a problem hiding this comment.
Missing integration test: atomic SET NX EX
All tests for this path mock ICacheService, so the actual Redis command added here is never exercised against a real server. Two things that only an integration test can verify:
- Driver compatibility — the variadic form
set($key, $value, 'EX', $ttl, 'NX')works with Predis 2.x. If the driver is ever switched to PhpRedis,set()returnsfalseon an NX-miss (notnull), silently breaking the!== nullcheck. - Atomicity — the fix guarantees the key cannot exist without a TTL; only a real
TTL keycall afteraddSingleValuecan confirm there is no gap.
Recommend a @group integration test that:
- calls
addSingleValue($key, $token, 30)against a real test Redis - reads
TTL $keyand asserts it is between 1 and 30 s - calls
addSingleValueagain on the same key and asserts it returnsfalse(NX semantics)
| @@ -46,22 +50,24 @@ public function __construct(ICacheService $cache_service){ | |||
| */ | |||
| public function acquireLock(string $name, int $lifetime = 3600):LockManagerService | |||
There was a problem hiding this comment.
Default TTL of 3600 s creates hour-long stuck locks in queue workers
This service is resolved inside long-lived queue workers (ProcessPaymentGatewayRefundJob, ProcessSummitOrderPaymentConfirmation, RevokeSummitOrder, etc.). If the worker is killed (SIGKILL, OOM, deployment restart) while holding a lock, the finally block never executes and the Redis key persists for 3600 s — blocking ticket sales on the affected key for up to one hour.
The PR description already flags this: "Consider a tighter explicit lifetime (e.g. 30 s) that matches the realistic worst-case DB write time."
Suggested fix — pass an explicit $lifetime at the call sites rather than relying on the default:
// SummitOrderService.php:951, 968, 1539
$this->lock_service->lock(
'ticket_type.' . $ticket_type->getId() . '.sell.lock',
function () use (...) { ... },
30 // worst-case DB write time in seconds
);| private $cache_service; | ||
|
|
||
| /** @var array<string,string> lock-name → per-call ownership token */ | ||
| private array $tokens = []; |
There was a problem hiding this comment.
Mutable per-acquisition state on a singleton service
LockManagerService is bound as a singleton (BaseServicesProvider.php:143). Before this PR the service was stateless; this line introduces the first mutable instance state. In the current stack (PHP-FPM, synchronous queue workers) this is safe because only one coroutine runs at a time per process. However, if Laravel Octane / Swoole is ever introduced, two concurrent coroutines racing on the same key name would overwrite each other's token here — coroutine A's releaseLock would present coroutine B's token to deleteIfValueMatches, releasing B's live lock.
Suggested mitigation: either change the binding to scoped (Octane-safe per-request singleton), or remove instance state entirely by returning the token from acquireLock and accepting it as a parameter in releaseLock.
| { | ||
| return $this->retryOnConnectionError(function ($conn) use ($counter_name, $ttl) { | ||
| if ($conn->setnx($counter_name, 1)) { | ||
| if ($conn->set($counter_name, 1, ['NX' => true]) !== null) { |
There was a problem hiding this comment.
incCounter partial fix — TTL assignment still non-atomic
addSingleValue was made fully atomic (SET key value EX $ttl NX in one command), but incCounter only swaps setnx for SET NX while keeping the expire as a separate call. If the process crashes between these two commands the counter key will exist indefinitely with no TTL — the same race that addSingleValue was fixed to prevent.
Suggested fix — collapse into a single atomic command, consistent with the approach used in addSingleValue:
if ($conn->set($counter_name, 1, ['EX' => (int)$ttl, 'NX' => true]) !== null) {
return 1;
}|
Malformed lock key — missing dot separator () This line cannot be commented inline since The lock key is missing a // Current — produces e.g. "ticket_type.42promo_code.SUMMER25.sell.lock"
$this->lock_service->lock('ticket_type.' . $type_id . 'promo_code.' . $promo_code_val . '.sell.lock', ...)
// Intended — "ticket_type.42.promo_code.SUMMER25.sell.lock"
$this->lock_service->lock('ticket_type.' . $type_id . '.promo_code.' . $promo_code_val . '.sell.lock', ...)With integer type IDs there is no key collision today, but the key is semantically malformed and will confuse any tooling (monitoring, manual Redis inspection, key-expiry scripts) that parses the key pattern. There is also no test asserting the exact key string passed to The PR description already flags this as a follow-up: "Fix the missing . in the key." Recommend addressing it in a dedicated follow-up ticket before the ownership token changes are deployed, so the key format stabilises. |
smarcet
left a comment
There was a problem hiding this comment.
@romanetar please review comments
ref https://app.clickup.com/t/86b9f3a22
Recommended actions for a follow-up ticket:
Summary by CodeRabbit
New Features
Tests