Fix callback subscriber dying on first message when callback returns None (2.0.4) #29
Merged
Conversation
…None return MsgCallbackSubscriber._task_body uses `if not cont` to decide whether to stop the loop, which fires for both False (the documented "stop" return) and None (the implicit return value of any callback that doesn't end with `return True`). Result: any `async def cb(...)` or `def cb(...)` that just does its work and returns nothing silently kills the subscription after the first delivered message. Three tests added: - async callback returning None must keep the loop alive across 5 messages - sync callback returning None must keep the loop alive across 5 messages - explicit `return False` must still stop the loop after one message (regression check that the fix preserves the documented stop contract) The first two fail on master; the third passes. The fix on feat/checking-log-burst-debug (dcf5aed) makes all three pass.
…turned None MsgCallbackSubscriber._task_body treated None (the implicit return value of a Python function with no explicit return statement) the same as False — so any async def callback that didn't end with `return True` exited the subscription loop after the very first delivered message. Symptom (pms aggregator, 176 subscribers via asyncio.gather): - 32 subjects with no replay-time messages stayed in the wait loop and worked correctly. - 144 subjects that received an initial replay message had their callback fire once, return None implicitly, and break the loop. The subscription task exited silently (DEBUG-level "Exiting sync iteration" log was the only trace). The aggregator's value cache then never updated again, trends stayed null for hours, and the watchdog flagged "144/176 silent > 600s" exactly as observed in production. Fix: stop only on explicit False; None and any truthy value keep the subscription alive. Docstring updated to match the corrected contract.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
MsgCallbackSubscriber._task_body used if not cont to decide whether to stop the loop. not None is True, so any async def cb(...) (or sync def cb(...)) without an explicit return True silently broke the subscription after the first delivered message — the idiomatic Python case where a
▎ callback just does its work and returns nothing.
▎
▎ Production symptom (pms aggregator, 176 subscribers via asyncio.gather):
▎ - 32 subjects with no replay-time messages stayed in the wait loop and worked correctly.
▎ - 144 subjects that received an initial replay message had their callback fire once, return None implicitly, and break the loop. Trends stayed null for hours; watchdog flagged "144/176 silent > 600s".
▎
Fix
▎
▎ Stop the loop only on explicit False. None and any truthy value keep it alive. Docstring updated to match the corrected contract.
▎
Tests (TDD)
▎
▎ Three regression tests added — written and committed before the fix to demonstrate they fail on master:
▎ - test_async_callback_returning_none_keeps_subscription_alive — the production failure mode
▎ - test_sync_callback_returning_none_keeps_subscription_alive — same bug for sync callbacks
▎ - test_async_callback_returning_false_stops_subscription — guards the documented stop contract against future inversions
▎
▎ Pre-fix: 2 of 3 fail with "Received 1/5". Post-fix: all 3 pass. Full suite: 283 passed, 0 failed.