[24.04_linux-nvidia-6.17-next] writeback: Avoid contention on wb->list_lock when switching inodes#468
[24.04_linux-nvidia-6.17-next] writeback: Avoid contention on wb->list_lock when switching inodes#468sforshee wants to merge 2 commits into
Conversation
PR Validation ReportPatchscan
|
BaseOS Kernel ReviewSummaryNo issues found across the reviewed commits. Findings: no problems found Latest watcher review: open review Kernel deb build: failed (failure log, build artifacts) Head: This comment is maintained by nv-pr-bot. It is updated when the GitHub watcher publishes a newer review. |
|
It looks like I did miss the fix |
|
|
|
|
|
@sforshee No issues with the backports.
I do have a github hygiene question. Is there a reason the source branch came from the NVIDIA github and not your personal github? |
ltrager
left a comment
There was a problem hiding this comment.
Acked-by: Lee Trager <ltrager@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2156922 There can be multiple inode switch works that are trying to switch inodes to / from the same wb. This can happen in particular if some cgroup exits which owns many (thousands) inodes and we need to switch them all. In this case several inode_switch_wbs_work_fn() instances will be just spinning on the same wb->list_lock while only one of them makes forward progress. This wastes CPU cycles and quickly leads to softlockup reports and unusable system. Instead of running several inode_switch_wbs_work_fn() instances in parallel switching to the same wb and contending on wb->list_lock, run just one work item per wb and manage a queue of isw items switching to this wb. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> (cherry picked from commit e1b849c) Signed-off-by: Seth Forshee <sforshee@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2156922 inode_switch_wbs_work_fn() has a loop like: wb_get(new_wb); while (1) { list = llist_del_all(&new_wb->switch_wbs_ctxs); /* Nothing to do? */ if (!list) break; ... process the items ... } Now adding of items to the list looks like: wb_queue_isw() if (llist_add(&isw->list, &wb->switch_wbs_ctxs)) queue_work(isw_wq, &wb->switch_work); Because inode_switch_wbs_work_fn() loops when processing isw items, it can happen that wb->switch_work is pending while wb->switch_wbs_ctxs is empty. This is a problem because in that case wb can get freed (no isw items -> no wb reference) while the work is still pending causing use-after-free issues. We cannot just fix this by cancelling work when freeing wb because that could still trigger problematic 0 -> 1 transitions on wb refcount due to wb_get() in inode_switch_wbs_work_fn(). It could be all handled with more careful code but that seems unnecessarily complex so let's avoid that until it is proven that the looping actually brings practical benefit. Just remove the loop from inode_switch_wbs_work_fn() instead. That way when wb_queue_isw() queues work, we are guaranteed we have added the first item to wb->switch_wbs_ctxs and nobody is going to remove it (and drop the wb reference it holds) until the queued work runs. Fixes: e1b849c ("writeback: Avoid contention on wb->list_lock when switching inodes") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260413093618.17244-2-jack@suse.cz Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit 6689f01) Signed-off-by: Seth Forshee <sforshee@nvidia.com>
84a43a4 to
296fb8e
Compare
Um, that was a mistake cause by |
|
The last push was to add buglinks to the commit descriptions, no code changes. |
To be clear, my comment in Slack was not about adding buglinks to the actual commits (Brad does that as part of his process), but to add a link in the PR description to the LP. I've gone ahead and added one to this PR as an example. |
|
Merged, closing PR. |
Summary
Fix NVBug 6291984 for
24.04_linux-nvidia-6.17-next.The observed symptoms are soft lockups, degraded performance, and NIC hangs. This is cause by a cgroup which exits while owning a large number of inodes, and all of these inodes need to be switched to another wb. This can lead to multiple workers trying to switch inodes, with many spinning on the same wb->list_lock while only one makes forward progress. The fix from upstream switches to use a single work item per wb, managing a queue of inode switch work items.
Fixes
Clean cherryy-picks of a fix from 6.19, plus a follow-on which fixes a use-after-free in the original fix.
Testing
A test case was used to reproduce this issue on an unpatched kernel. The same reproducer on a patched kernel remains responsive and does not show soft lockups or NIC hangs.
PR: https://bugs.launchpad.net/bugs/2156922