Use NOT EXISTS anti-join in Trigger.clean_unused and added skip locked#68244
Use NOT EXISTS anti-join in Trigger.clean_unused and added skip locked#68244AntonioBergonzi wants to merge 4 commits into
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide
|
c0f8d0c to
725b143
Compare
… between concurrent triggerer pods
There was a problem hiding this comment.
Pull request overview
This PR optimizes Trigger.clean_unused() by switching from an OUTER JOIN + GROUP BY/HAVING count-based approach to a NOT EXISTS-style anti-join when identifying unused triggers, and adds row-level locking with SKIP LOCKED to reduce interference when multiple triggerers are running concurrently.
Changes:
- Replace JOIN/aggregate “no TaskInstance rows” check with
~cls.task_instance.has()(anti-join / NOT EXISTS semantics). - Add
SKIP LOCKEDrow locking to avoid concurrent triggerers contending for the same trigger rows during cleanup.
| .where( | ||
| ~cls.assets.any(), | ||
| ~cls.callback.has(), | ||
| ~cls.task_instance.has(), | ||
| ) | ||
| .with_for_update(skip_locked=True) | ||
| ) |
There was a problem hiding this comment.
I can cherry pick this AntonioBergonzi@95b2489 in this branch. However I noticed we use with_for_update with skip_locked in other parts of the code (in the scheduler, for example)
There was a problem hiding this comment.
@hussein-awala I resolved this. let me know if I should unresolve
There was a problem hiding this comment.
I'm not sure with_for_update here is right -- it is locking the TI table, but you are never updating or touchingthem -- you are instead deleting things in the Trigger table.
There was a problem hiding this comment.
If SELECT ... FOR UPDATE is right, then please use the existing with_row_locks existing helper.
There was a problem hiding this comment.
Thanks for the review!
Addressed in ab47768, I used the helper and scoped the locking to the trigger table.
hussein-awala
left a comment
There was a problem hiding this comment.
LGTM, I went through this carefully since it rewrites the delete predicate, and I'm satisfied it's a safe, well-motivated optimization.
For SKIP LOCKED: Good call for multi-triggerer, a single clean_unused() pass may now skip rows locked by a concurrent triggerer and leave them for the next run instead of always sweeping everything, which is the right trade-off.
Description
Changed the query that checks for triggers that have no more callbacks, assets or task instances associated to them from count + aggregate to anti join on task_instance.
The result is the same (we are checking for non existence) but the query is more efficient. Also added skip locked since we run multiple triggerers and the different queries interfere with each other.
I did not add tests since I'm not introducing a new functionality, so the current one should cover the code change. If you think additional tests are needed, please let me know.
Testing
Plans of the queries
OLD QUERY:yields
NEW QUERY:
yields
Was generative AI tooling used to co-author this PR?
Claude Sonnet 1M
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.