[Deepin-Kernel-SIG] [linux 6.6.y] [Upstream] mm/filemap: remove hugetlb special casing in filemap.c#1888
Conversation
mainline inclusion
from mainline-v6.7-rc1
category: performance
Remove special cased hugetlb handling code within the page cache by
changing the granularity of ->index to the base page size rather than the
huge page size. The motivation of this patch is to reduce complexity
within the filemap code while also increasing performance by removing
branches that are evaluated on every page cache lookup.
To support the change in index, new wrappers for hugetlb page cache
interactions are added. These wrappers perform the conversion to a linear
index which is now expected by the page cache for huge pages.
========================= PERFORMANCE ======================================
Perf was used to check the performance differences after the patch.
Overall the performance is similar to mainline with a very small larger
overhead that occurs in __filemap_add_folio() and
hugetlb_add_to_page_cache(). This is because of the larger overhead that
occurs in xa_load() and xa_store() as the xarray is now using more entries
to store hugetlb folios in the page cache.
Timing
aarch64
2MB Page Size
6.5-rc3 + this patch:
[root@sidhakum-ol9-1 hugepages]# time fallocate -l 700GB test.txt
real 1m49.568s
user 0m0.000s
sys 1m49.461s
6.5-rc3:
[root]# time fallocate -l 700GB test.txt
real 1m47.495s
user 0m0.000s
sys 1m47.370s
1GB Page Size
6.5-rc3 + this patch:
[root@sidhakum-ol9-1 hugepages1G]# time fallocate -l 700GB test.txt
real 1m47.024s
user 0m0.000s
sys 1m46.921s
6.5-rc3:
[root@sidhakum-ol9-1 hugepages1G]# time fallocate -l 700GB test.txt
real 1m44.551s
user 0m0.000s
sys 1m44.438s
x86
2MB Page Size
6.5-rc3 + this patch:
[root@sidhakum-ol9-2 hugepages]# time fallocate -l 100GB test.txt
real 0m22.383s
user 0m0.000s
sys 0m22.255s
6.5-rc3:
[opc@sidhakum-ol9-2 hugepages]$ time sudo fallocate -l 100GB /dev/hugepages/test.txt
real 0m22.735s
user 0m0.038s
sys 0m22.567s
1GB Page Size
6.5-rc3 + this patch:
[root@sidhakum-ol9-2 hugepages1GB]# time fallocate -l 100GB test.txt
real 0m25.786s
user 0m0.001s
sys 0m25.589s
6.5-rc3:
[root@sidhakum-ol9-2 hugepages1G]# time fallocate -l 100GB test.txt
real 0m33.454s
user 0m0.001s
sys 0m33.193s
aarch64:
workload - fallocate a 700GB file backed by huge pages
6.5-rc3 + this patch:
2MB Page Size:
--100.00%--__arm64_sys_fallocate
ksys_fallocate
vfs_fallocate
hugetlbfs_fallocate
|
|--95.04%--__pi_clear_page
|
|--3.57%--clear_huge_page
| |
| |--2.63%--rcu_all_qs
| |
| --0.91%--__cond_resched
|
--0.67%--__cond_resched
0.17% 0.00% 0 fallocate [kernel.vmlinux] [k] hugetlb_add_to_page_cache
0.14% 0.10% 11 fallocate [kernel.vmlinux] [k] __filemap_add_folio
6.5-rc3
2MB Page Size:
--100.00%--__arm64_sys_fallocate
ksys_fallocate
vfs_fallocate
hugetlbfs_fallocate
|
|--94.91%--__pi_clear_page
|
|--4.11%--clear_huge_page
| |
| |--3.00%--rcu_all_qs
| |
| --1.10%--__cond_resched
|
--0.59%--__cond_resched
0.08% 0.01% 1 fallocate [kernel.kallsyms] [k] hugetlb_add_to_page_cache
0.05% 0.03% 3 fallocate [kernel.kallsyms] [k] __filemap_add_folio
x86
workload - fallocate a 100GB file backed by huge pages
6.5-rc3 + this patch:
2MB Page Size:
hugetlbfs_fallocate
|
--99.57%--clear_huge_page
|
--98.47%--clear_page_erms
|
--0.53%--asm_sysvec_apic_timer_interrupt
0.04% 0.04% 1 fallocate [kernel.kallsyms] [k] xa_load
0.04% 0.00% 0 fallocate [kernel.kallsyms] [k] hugetlb_add_to_page_cache
0.04% 0.00% 0 fallocate [kernel.kallsyms] [k] __filemap_add_folio
0.04% 0.00% 0 fallocate [kernel.kallsyms] [k] xas_store
6.5-rc3
2MB Page Size:
--99.93%--__x64_sys_fallocate
vfs_fallocate
hugetlbfs_fallocate
|
--99.38%--clear_huge_page
|
|--98.40%--clear_page_erms
|
--0.59%--__cond_resched
0.03% 0.03% 1 fallocate [kernel.kallsyms] [k] __filemap_add_folio
========================= TESTING ======================================
This patch passes libhugetlbfs tests and LTP hugetlb tests
********** TEST SUMMARY
* 2M
* 32-bit 64-bit
* Total testcases: 110 113
* Skipped: 0 0
* PASS: 107 113
* FAIL: 0 0
* Killed by signal: 3 0
* Bad configuration: 0 0
* Expected FAIL: 0 0
* Unexpected PASS: 0 0
* Test not present: 0 0
* Strange test result: 0 0
**********
Done executing testcases.
LTP Version: 20220527-178-g2761a81c4
page migration was also tested using Mike Kravetz's test program.[8]
[dan.carpenter@linaro.org: fix an NULL vs IS_ERR() bug]
Link: https://lkml.kernel.org/r/1772c296-1417-486f-8eef-171af2192681@moroto.mountain
Link: https://lkml.kernel.org/r/20230926192017.98183-1-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reported-and-tested-by: syzbot+c225dea486da4d5592bd@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c225dea486da4d5592bd
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit a08c719)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion from mainline-v6.7-rc5 category: bugfix After commit a08c719 "mm/filemap: remove hugetlb special casing in filemap.c", hugetlb pages are stored in the page cache in base page sized indexes. This leads to multi index stores in the xarray which is only supporting through CONFIG_XARRAY_MULTI. The other page cache user of multi index stores ,THP, selects XARRAY_MULTI. Have CONFIG_HUGETLB_PAGE follow this behavior as well to avoid the BUG() with a CONFIG_HUGETLB_PAGE && !CONFIG_XARRAY_MULTI config. Link: https://lkml.kernel.org/r/20231204183234.348697-1-sidhartha.kumar@oracle.com Fixes: a08c719 ("mm/filemap: remove hugetlb special casing in filemap.c") Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reported-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 4a3ef6b) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
…dling mainline inclusion from mainline-v6.8-rc3 category: bugfix has_extra_refcount() makes the assumption that the page cache adds a ref count of 1 and subtracts this in the extra_pins case. Commit a08c719 (mm/filemap: remove hugetlb special casing in filemap.c) modifies __filemap_add_folio() by calling folio_ref_add(folio, nr); for all cases (including hugtetlb) where nr is the number of pages in the folio. We should adjust the number of references coming from the page cache by subtracing the number of pages rather than 1. In hugetlbfs_read_iter(), folio_test_has_hwpoisoned() is testing the wrong flag as, in the hugetlb case, memory-failure code calls folio_test_set_hwpoison() to indicate poison. folio_test_hwpoison() is the correct function to test for that flag. After these fixes, the hugetlb hwpoison read selftest passes all cases. Link: https://lkml.kernel.org/r/20240112180840.367006-1-sidhartha.kumar@oracle.com Fixes: a08c719 ("mm/filemap: remove hugetlb special casing in filemap.c") Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Closes: https://lore.kernel.org/linux-mm/20230713001833.3778937-1-jiaqiyan@google.com/T/#m8e1469119e5b831bbd05d495f96b842e4a1c5519 Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Muchun Song <muchun.song@linux.dev> Cc: James Houghton <jthoughton@google.com> Cc: Jiaqi Yan <jiaqiyan@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: <stable@vger.kernel.org> [6.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 19d3e22) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion from mainline-v7.1-rc1 category: bugfix In mfill_atomic_hugetlb(), linear_page_index() is used to calculate the page index for hugetlb_fault_mutex_hash(). However, linear_page_index() returns the index in PAGE_SIZE units, while hugetlb_fault_mutex_hash() expects the index in huge page units. This mismatch means that different addresses within the same huge page can produce different hash values, leading to the use of different mutexes for the same huge page. This can cause races between faulting threads, which can corrupt the reservation map and trigger the BUG_ON in resv_map_release(). Fix this by introducing hugetlb_linear_page_index(), which returns the page index in huge page granularity, and using it in place of linear_page_index(). Link: https://lkml.kernel.org/r/20260310110526.335749-1-jianhuizzzzz@gmail.com Fixes: a08c719 ("mm/filemap: remove hugetlb special casing in filemap.c") Signed-off-by: Jianhui Zhou <jianhuizzzzz@gmail.com> Reported-by: syzbot+f525fd79634858f478e7@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f525fd79634858f478e7 Acked-by: SeongJae Park <sj@kernel.org> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Jane Chu <jane.chu@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: JonasZhou <JonasZhou@zhaoxin.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: include/linux/hugetlb.h (cherry picked from commit 0217c7fb4de4a40cee667eb21901f3204effe5ac) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Reviewer's GuideThis PR removes hugetlb-specific indexing and refcount handling from the core page cache (filemap) code by standardizing folio->index to PAGE_SIZE units, introducing hugetlb-specific helpers/wrappers to translate hugepage indices, and updating hugetlbfs, migration, memory-failure, and userfaultfd paths to use the new conventions while fixing related refcount and lookup bugs. Sequence diagram for hugetlb page-cache lookup using filemap_lock_hugetlb_foliosequenceDiagram
participant HFS as hugetlbfs_read_iter
participant HLPF as filemap_lock_hugetlb_folio
participant FLF as filemap_lock_folio
participant XA as xarray_pagecache
HFS->>HLPF: filemap_lock_hugetlb_folio(h, mapping, idx)
note over HLPF: idx << huge_page_order(h)
HLPF->>FLF: filemap_lock_folio(mapping, idx_shifted)
FLF->>XA: xa_load(mapping->i_pages, idx_shifted)
XA-->>FLF: folio or error
FLF-->>HLPF: folio (locked) or ERR_PTR
HLPF-->>HFS: folio (locked) or ERR_PTR
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In hugetlbfs_read_iter(), any IS_ERR(folio) from filemap_lock_hugetlb_folio() is now treated as a hole and zero-filled, which can silently mask real lookup/allocation errors; consider distinguishing between "not present" (e.g., -ENOENT) and genuine failures (e.g., -ENOMEM) and propagating/handling those appropriately.
- In hugetlbfs_fallocate(), the folio lookup uses index << huge_page_order(h) while hugetlb_fault_mutex_hash() is still hashed on the unshifted hugepage index; please double-check that this intentional mismatch between mutex granularity (hugepage index) and pagecache index (base-page index) is safe and remains consistent with other hugetlb fault paths.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In hugetlbfs_read_iter(), any IS_ERR(folio) from filemap_lock_hugetlb_folio() is now treated as a hole and zero-filled, which can silently mask real lookup/allocation errors; consider distinguishing between "not present" (e.g., -ENOENT) and genuine failures (e.g., -ENOMEM) and propagating/handling those appropriately.
- In hugetlbfs_fallocate(), the folio lookup uses index << huge_page_order(h) while hugetlb_fault_mutex_hash() is still hashed on the unshifted hugepage index; please double-check that this intentional mismatch between mutex granularity (hugepage index) and pagecache index (base-page index) is safe and remains consistent with other hugetlb fault paths.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Link: https://lore.kernel.org/all/20230926192017.98183-1-sidhartha.kumar@oracle.com/T/#u
Remove special cased hugetlb handling code within the page cache by
changing the granularity of ->index to the base page size rather than the
huge page size. The motivation of this patch is to reduce complexity
within the filemap code while also increasing performance by removing
branches that are evaluated on every page cache lookup.
To support the change in index, new wrappers for hugetlb page cache
interactions are added. These wrappers perform the conversion to a linear
index which is now expected by the page cache for huge pages.
========================= PERFORMANCE ======================================
Perf was used to check the performance differences after the patch. Overall
the performance is similar to mainline with a very small larger overhead that
occurs in __filemap_add_folio() and hugetlb_add_to_page_cache(). This is because
of the larger overhead that occurs in xa_load() and xa_store() as the
xarray is now using more entries to store hugetlb folios in the page cache.
Timing
aarch64
2MB Page Size
6.5-rc3 + this patch:
[root@sidhakum-ol9-1 hugepages]# time fallocate -l 700GB test.txt
real 1m49.568s
user 0m0.000s
sys 1m49.461s
x86
2MB Page Size
6.5-rc3 + this patch:
[root@sidhakum-ol9-2 hugepages]# time fallocate -l 100GB test.txt
real 0m22.383s
user 0m0.000s
sys 0m22.255s
aarch64:
workload - fallocate a 700GB file backed by huge pages
x86
workload - fallocate a 100GB file backed by huge pages
========================= TESTING ======================================
This patch passes libhugetlbfs tests and LTP hugetlb tests
********** TEST SUMMARY
page migration was also tested using Mike Kravetz's test program.8
Signed-off-by: Sidhartha Kumar sidhartha.kumar@oracle.com
Reported-and-tested-by: syzbot+c225dea486da4d5592bd@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c225dea486da4d5592bd
rebased on mm-unstable 09/26/23
RFC v21 -> v12
-change direction of series to maintain both huge and base page size index
rather than try to get rid of all references to a huge page sized index.
v1 -> v23
- squash seperate filemap and hugetlb changes into one patch to allow
for bisection per Matthew
- get rid of page_to_index()
- fix errors in hugetlb_fallocate() and remove_inode_hugepages()
v2 -> v34
- gather performance data per Mike Kravetz
- remove start variable in remove_inode_hugepages() per Mike Kravetz
- remove hugetlb special case within folio_file_page()
v3 -> v45
- rebase to current mm-unstable
- include time data per Mike Kravetz
v4 -> v56
- fix build issue by removing hugetlb_basepage_index() definition
per intel test robot
v5 -> v67
- remove folio_more_pages() from result of incorrect rebase
v6 -> v79:
- clarify commit message to demonstrate the benefits of this patch.
- fix error in hugepage migration by using folio_expected_refs() rather
than 2 in migrate_huge_page_move_mapping() and increase the dst folio
ref count by folio_nr_pages().
- this error is due to __filemap_add_folio() increasing the ref
count of the huge folio by folio_nr_pages() which
migrate_huge_page_move_mapping() did not handle
- remove linear_hugepage_index() as it is no longer used per Matthew Wilcox.
v7 -> v8:
- fix syzbot reported error in hugetlbfs_read_iter() by using
filemap_lock_hugetlb_folio() for page cache lookup.
fs/hugetlbfs/inode.c | 37 +++++++++++++++++++------------------
include/linux/hugetlb.h | 12 ++++++++++++
include/linux/pagemap.h | 32 ++------------------------------
mm/filemap.c | 34 ++++++++++------------------------
mm/hugetlb.c | 32 ++++++--------------------------
mm/migrate.c | 6 +++---
6 files changed, 52 insertions(+), 101 deletions(-)
Summary by Sourcery
Simplify hugetlb page cache handling by indexing huge pages at base-page granularity and relying on common filemap paths, while updating hugetlb-specific users accordingly.
Bug Fixes:
Enhancements: