aboutsummaryrefslogtreecommitdiff
path: root/mm/debug_vm_pgtable.c
diff options
context:
space:
mode:
authorGravatar David Hildenbrand <david@redhat.com> 2023-01-13 18:10:01 +0100
committerGravatar Andrew Morton <akpm@linux-foundation.org> 2023-02-02 22:33:05 -0800
commit2321ba3e3733f513e46e29b9c70512ecddbf1085 (patch)
tree2896bac7550dd008e674c8ffc733c97307ee1f38 /mm/debug_vm_pgtable.c
parentmm/khugepaged: convert release_pte_pages() to use folios (diff)
downloadlinux-2321ba3e3733f513e46e29b9c70512ecddbf1085.tar.gz
linux-2321ba3e3733f513e46e29b9c70512ecddbf1085.tar.bz2
linux-2321ba3e3733f513e46e29b9c70512ecddbf1085.zip
mm/debug_vm_pgtable: more pte_swp_exclusive() sanity checks
Patch series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". This is the follow-up on [1]: [PATCH v2 0/8] mm: COW fixes part 3: reliable GUP R/W FOLL_GET of anonymous pages After we implemented __HAVE_ARCH_PTE_SWP_EXCLUSIVE on most prominent enterprise architectures, implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all remaining architectures that support swap PTEs. This makes sure that exclusive anonymous pages will stay exclusive, even after they were swapped out -- for example, making GUP R/W FOLL_GET of anonymous pages reliable. Details can be found in [1]. This primarily fixes remaining known O_DIRECT memory corruptions that can happen on concurrent swapout, whereby we can lose DMA reads to a page (modifying the user page by writing to it). To verify, there are two test cases (requiring swap space, obviously): (1) The O_DIRECT+swapout test case [2] from Andrea. This test case tries triggering a race condition. (2) My vmsplice() test case [3] that tries to detect if the exclusive marker was lost during swapout, not relying on a race condition. For example, on 32bit x86 (with and without PAE), my test case fails without these patches: $ ./test_swp_exclusive FAIL: page was replaced during COW But succeeds with these patches: $ ./test_swp_exclusive PASS: page was not replaced during COW Why implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE for all architectures, even the ones where swap support might be in a questionable state? This is the first step towards removing "readable_exclusive" migration entries, and instead using pte_swp_exclusive() also with (readable) migration entries instead (as suggested by Peter). The only missing piece for that is supporting pmd_swp_exclusive() on relevant architectures with THP migration support. As all relevant architectures now implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE,, we can drop __HAVE_ARCH_PTE_SWP_EXCLUSIVE in the last patch. I tried cross-compiling all relevant setups and tested on x86 and sparc64 so far. CCing arch maintainers only on this cover letter and on the respective patch(es). [1] https://lkml.kernel.org/r/20220329164329.208407-1-david@redhat.com [2] https://gitlab.com/aarcange/kernel-testcases-for-v5.11/-/blob/main/page_count_do_wp_page-swap.c [3] https://gitlab.com/davidhildenbrand/scratchspace/-/blob/main/test_swp_exclusive.c This patch (of 26): We want to implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures. Let's extend our sanity checks, especially testing that our PTE bit does not affect: * is_swap_pte() -> pte_present() and pte_none() * the swap entry + type * pte_swp_soft_dirty() Especially, the pfn_pte() is dodgy when the swap PTE layout differs heavily from ordinary PTEs. Let's properly construct a swap PTE from swap type+offset. [david@redhat.com: fix build] Link: https://lkml.kernel.org/r/6aaad548-cf48-77fa-9d6c-db83d724b2eb@redhat.com Link: https://lkml.kernel.org/r/20230113171026.582290-1-david@redhat.com Link: https://lkml.kernel.org/r/20230113171026.582290-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: <aou@eecs.berkeley.edu> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Chris Zankel <chris@zankel.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: H. Peter Anvin (Intel) <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Nadav Amit <namit@vmware.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Xu <peterx@redhat.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuerui Wang <kernel@xen0n.name> Cc: Yang Shi <shy828301@gmail.com> Cc: Yoshinori Sato <ysato@users.osdn.me> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'mm/debug_vm_pgtable.c')
-rw-r--r--mm/debug_vm_pgtable.c25
1 files changed, 24 insertions, 1 deletions
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index bb3328f46126..ff8d6f6af896 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -811,13 +811,36 @@ static void __init pmd_swap_soft_dirty_tests(struct pgtable_debug_args *args) {
static void __init pte_swap_exclusive_tests(struct pgtable_debug_args *args)
{
#ifdef __HAVE_ARCH_PTE_SWP_EXCLUSIVE
- pte_t pte = pfn_pte(args->fixed_pte_pfn, args->page_prot);
+ unsigned long max_swap_offset;
+ swp_entry_t entry, entry2;
+ pte_t pte;
pr_debug("Validating PTE swap exclusive\n");
+
+ /* See generic_max_swapfile_size(): probe the maximum offset */
+ max_swap_offset = swp_offset(pte_to_swp_entry(swp_entry_to_pte(swp_entry(0, ~0UL))));
+
+ /* Create a swp entry with all possible bits set */
+ entry = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset);
+
+ pte = swp_entry_to_pte(entry);
+ WARN_ON(pte_swp_exclusive(pte));
+ WARN_ON(!is_swap_pte(pte));
+ entry2 = pte_to_swp_entry(pte);
+ WARN_ON(memcmp(&entry, &entry2, sizeof(entry)));
+
pte = pte_swp_mkexclusive(pte);
WARN_ON(!pte_swp_exclusive(pte));
+ WARN_ON(!is_swap_pte(pte));
+ WARN_ON(pte_swp_soft_dirty(pte));
+ entry2 = pte_to_swp_entry(pte);
+ WARN_ON(memcmp(&entry, &entry2, sizeof(entry)));
+
pte = pte_swp_clear_exclusive(pte);
WARN_ON(pte_swp_exclusive(pte));
+ WARN_ON(!is_swap_pte(pte));
+ entry2 = pte_to_swp_entry(pte);
+ WARN_ON(memcmp(&entry, &entry2, sizeof(entry)));
#endif /* __HAVE_ARCH_PTE_SWP_EXCLUSIVE */
}