aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-02-23writeback: simplify the loops in write_cache_pages()Gravatar Matthew Wilcox (Oracle) 1-39/+36
Collapse the two nested loops into one. This is needed as a step towards turning this into an iterator. Note that this drops the "index <= end" check in the previous outer loop and just relies on filemap_get_folios_tag() to return 0 entries when index > end. This actually has a subtle implication when end == -1 because then the returned index will be -1 as well and thus if there is page present on index -1, we could be looping indefinitely. But as the comment in filemap_get_folios_tag documents this as already broken anyway we should not worry about it here either. The fix for that would probably a change to the filemap_get_folios_tag() calling convention. [hch@lst.de: update the commit log per Jan] Link: https://lkml.kernel.org/r/20240215063649.2164017-10-hch@lst.de Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23writeback: factor writeback_get_batch() out of write_cache_pages()Gravatar Matthew Wilcox (Oracle) 2-22/+44
This simple helper will be the basis of the writeback iterator. To make this work, we need to remember the current index and end positions in writeback_control. [hch@lst.de: heavily rebased, add helpers to get the tag and end index, don't keep the end index in struct writeback_control] Link: https://lkml.kernel.org/r/20240215063649.2164017-9-hch@lst.de Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23writeback: factor folio_prepare_writeback() out of write_cache_pages()Gravatar Matthew Wilcox (Oracle) 1-27/+34
Reduce write_cache_pages() by about 30 lines; much of it is commentary, but it all bundles nicely into an obvious function. [hch@lst.de: rename should_writeback_folio to folio_prepare_writeback per Jan] Link: https://lkml.kernel.org/r/20240215063649.2164017-8-hch@lst.de Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23writeback: rework the loop termination condition in write_cache_pagesGravatar Christoph Hellwig 1-51/+33
Rework the way we deal with the cleanup after the writepage call. First handle the magic AOP_WRITEPAGE_ACTIVATE separately from real error returns to get it out of the way of the actual error handling path. The split the handling on intgrity vs non-integrity branches first, and return early using a goto for the non-ingegrity early loop condition to remove the need for the done and done_index local variables, and for assigning the error to ret when we can just return error directly. Link: https://lkml.kernel.org/r/20240215063649.2164017-7-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Christian Brauner <brauner@kernel.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23writeback: only update ->writeback_index for range_cyclic writebackGravatar Christoph Hellwig 1-10/+14
mapping->writeback_index is only [1] used as the starting point for range_cyclic writeback, so there is no point in updating it for other types of writeback. [1] except for btrfs_defrag_file which does really odd things with mapping->writeback_index. But btrfs doesn't use write_cache_pages at all, so this isn't relevant here. Link: https://lkml.kernel.org/r/20240215063649.2164017-6-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23writeback: also update wbc->nr_to_write on writeback failureGravatar Christoph Hellwig 1-1/+1
When exiting write_cache_pages early due to a non-integrity write failure, wbc->nr_to_write currently doesn't account for the folio we just failed to write. This doesn't matter because the callers always ingore the value on a failure, but moving the update to common code will allow to simplify the code, so do it. Link: https://lkml.kernel.org/r/20240215063649.2164017-5-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23writeback: fix done_index when hitting the wbc->nr_to_writeGravatar Christoph Hellwig 1-0/+1
When write_cache_pages finishes writing out a folio, it fails to update done_index to account for the number of pages in the folio just written. That means when range_cyclic writeback is restarted, it will be restarted at this folio instead of after it as it should. Fix that by updating done_index before breaking out of the loop. Link: https://lkml.kernel.org/r/20240215063649.2164017-4-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23writeback: remove a duplicate prototype for tag_pages_for_writebackGravatar Matthew Wilcox (Oracle) 1-2/+0
[hch@lst.de: split from a larger patch] Link: https://lkml.kernel.org/r/20240215063649.2164017-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23writeback: don't call mapping_set_error on AOP_WRITEPAGE_ACTIVATEGravatar Christoph Hellwig 1-1/+3
Patch series "convert write_cache_pages() to an iterator", v8. This is an evolution of the series Matthew Wilcox originally sent in June 2023, which has changed quite a bit since and now has a while based iterator. This patch (of 14): mapping_set_error should only be called on 0 returns (which it ignores) or a negative error code. writepage_cb ends up being able to call writepage_cb on the magic AOP_WRITEPAGE_ACTIVATE return value from ->writepage which means success but the caller needs to unlock the page. Ignore that and just call mapping_set_error on negative errors. (no fixes tag as this goes back more than 20 years over various renames and refactors so I've given up chasing down the original introduction) Link: https://lkml.kernel.org/r/20240215063649.2164017-1-hch@lst.de Link: https://lkml.kernel.org/r/20240215063649.2164017-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Brian Foster <bfoster@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/page_alloc: make bad_range() return boolGravatar Hao Ge 1-6/+6
bad_range() can return bool, so let us change it. Link: https://lkml.kernel.org/r/20240221073227.276234-1-gehao@kylinos.cn Signed-off-by: Hao Ge <gehao@kylinos.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23madvise:madvise_cold_or_pageout_pte_range(): allow split while ↵Gravatar Barry Song 1-1/+1
folio_estimated_sharers = 0 The purpose is stopping splitting large folios whose mapcount are 2 or above. Folios whose estimated_shares = 0 should be still perfect and even better candidates than estimated_shares = 1. Consider a pte-mapped large folio with 16 subpages, if we unmap 1-15, the current code will split folios and reclaim them while madvise goes on this folio; but if we unmap subpage 0, we will keep this folio and break. This is weird. For pmd-mapped large folios, we can still use "= 1" as the condition as anyway we have the entire map for it. So this patch doesn't change the condition for pmd-mapped large folios. This also explains why we had been using "= 1" for both pmd-mapped and pte-mapped large folios before commit 07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range() to use folios"), because in the past, we used the mapcount of the specific subpage, since the subpage had pte present, its mapcount wouldn't be 0. The problem can be quite easily reproduced by writing a small program, unmapping the first subpage of a pte-mapped large folio vs. unmapping anyone other than the first subpage. Link: https://lkml.kernel.org/r/20240221085036.105621-1-21cnbao@gmail.com Fixes: 2f406263e3e9 ("madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check") Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yin Fengwei <fengwei.yin@intel.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/swapfile:__swap_duplicate: drop redundant WRITE_ONCE on swap_map for err ↵Gravatar Barry Song 1-1/+2
cases The code is quite hard to read, we are still writing swap_map after errors happen. Though the written value is as before, has_cache = count & SWAP_HAS_CACHE; count &= ~SWAP_HAS_CACHE; [snipped] WRITE_ONCE(p->swap_map[offset], count | has_cache); It would be better to entirely drop the WRITE_ONCE for both performance and readability. [akpm@linux-foundation.org: avoid using goto] Link: https://lkml.kernel.org/r/20240221091028.123122-1-21cnbao@gmail.com Signed-off-by: Barry Song <v-songbaohua@oppo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23shmem: properly report quota mount optionsGravatar Jan Kara 1-0/+18
Report quota options among the set of mount options. This allows proper user visibility into whether quotas are enabled or not. Link: https://lkml.kernel.org/r/20240129120131.21145-1-jack@suse.cz Fixes: e09764cff44b ("shmem: quota support") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/compaction: optimize >0 order folio compaction with free page split.Gravatar Zi Yan 1-5/+30
During migration in a memory compaction, free pages are placed in an array of page lists based on their order. But the desired free page order (i.e., the order of a source page) might not be always present, thus leading to migration failures and premature compaction termination. Split a high order free pages when source migration page has a lower order to increase migration successful rate. Note: merging free pages when a migration fails and a lower order free page is returned via compaction_free() is possible, but there is too much work. Since the free pages are not buddy pages, it is hard to identify these free pages using existing PFN-based page merging algorithm. Link: https://lkml.kernel.org/r/20240220183220.1451315-5-zi.yan@sent.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Yu Zhao <yuzhao@google.com> Cc: Adam Manzanares <a.manzanares@samsung.com> Cc: David Hildenbrand <david@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yin Fengwei <fengwei.yin@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/compaction: add support for >0 order folio memory compaction.Gravatar Zi Yan 3-63/+83
Before last commit, memory compaction only migrates order-0 folios and skips >0 order folios. Last commit splits all >0 order folios during compaction. This commit migrates >0 order folios during compaction by keeping isolated free pages at their original size without splitting them into order-0 pages and using them directly during migration process. What is different from the prior implementation: 1. All isolated free pages are kept in a NR_PAGE_ORDERS array of page lists, where each page list stores free pages in the same order. 2. All free pages are not post_alloc_hook() processed nor buddy pages, although their orders are stored in first page's private like buddy pages. 3. During migration, in new page allocation time (i.e., in compaction_alloc()), free pages are then processed by post_alloc_hook(). When migration fails and a new page is returned (i.e., in compaction_free()), free pages are restored by reversing the post_alloc_hook() operations using newly added free_pages_prepare_fpi_none(). Step 3 is done for a latter optimization that splitting and/or merging free pages during compaction becomes easier. Note: without splitting free pages, compaction can end prematurely due to migration will return -ENOMEM even if there is free pages. This happens when no order-0 free page exist and compaction_alloc() return NULL. Link: https://lkml.kernel.org/r/20240220183220.1451315-4-zi.yan@sent.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Yu Zhao <yuzhao@google.com> Cc: Adam Manzanares <a.manzanares@samsung.com> Cc: David Hildenbrand <david@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yin Fengwei <fengwei.yin@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/compaction: enable compacting >0 order folios.Gravatar Zi Yan 1-25/+76
migrate_pages() supports >0 order folio migration and during compaction, even if compaction_alloc() cannot provide >0 order free pages, migrate_pages() can split the source page and try to migrate the base pages from the split. It can be a baseline and start point for adding support for compacting >0 order folios. Link: https://lkml.kernel.org/r/20240220183220.1451315-3-zi.yan@sent.com Signed-off-by: Zi Yan <ziy@nvidia.com> Suggested-by: Huang Ying <ying.huang@intel.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Yu Zhao <yuzhao@google.com> Cc: Adam Manzanares <a.manzanares@samsung.com> Cc: David Hildenbrand <david@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yin Fengwei <fengwei.yin@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/page_alloc: remove unused fpi_flags in free_pages_prepare()Gravatar Zi Yan 1-5/+5
Patch series "Enable >0 order folio memory compaction", v7. This patchset enables >0 order folio memory compaction, which is one of the prerequisitions for large folio support[1]. I am aware of that split free pages is necessary for folio migration in compaction, since if >0 order free pages are never split and no order-0 free page is scanned, compaction will end prematurely due to migration returns -ENOMEM. Free page split becomes a must instead of an optimization. lkp ncompare results (on a 8-CPU (Intel Xeon E5-2650 v4 @2.20GHz) 16G VM) for default LRU (-no-mglru) and CONFIG_LRU_GEN are shown at the bottom, copied from V3[4]. In sum, most of vm-scalability applications do not see performance change, and the others see ~4% to ~26% performance boost under default LRU and ~2% to ~6% performance boost under CONFIG_LRU_GEN. Overview === To support >0 order folio compaction, the patchset changes how free pages used for migration are kept during compaction. Free pages used to be split into order-0 pages that are post allocation processed (i.e., PageBuddy flag cleared, page order stored in page->private is zeroed, and page reference is set to 1). Now all free pages are kept in a NR_PAGE_ORDER array of page lists based on their order without post allocation process. When migrate_pages() asks for a new page, one of the free pages, based on the requested page order, is then processed and given out. And THP <2MB would need this feature. [1] https://lore.kernel.org/linux-mm/f8d47176-03a8-99bf-a813-b5942830fd73@arm.com/ [2] https://lore.kernel.org/linux-mm/20231113170157.280181-1-zi.yan@sent.com/ [3] https://lore.kernel.org/linux-mm/20240123034636.1095672-1-zi.yan@sent.com/ [4] https://lore.kernel.org/linux-mm/20240202161554.565023-1-zi.yan@sent.com/ [5] https://lore.kernel.org/linux-mm/20240212163510.859822-1-zi.yan@sent.com/ [6] https://lore.kernel.org/linux-mm/20240214220420.1229173-1-zi.yan@sent.com/ [7] https://lore.kernel.org/linux-mm/20240216170432.1268753-1-zi.yan@sent.com/ This patch (of 4): Commit 0a54864f8dfb ("kasan: remove PG_skip_kasan_poison flag") removes the use of fpi_flags in should_skip_kasan_poison() and fpi_flags is only passed to should_skip_kasan_poison() in free_pages_prepare(). Remove the unused parameter. Link: https://lkml.kernel.org/r/20240220183220.1451315-1-zi.yan@sent.com Link: https://lkml.kernel.org/r/20240220183220.1451315-2-zi.yan@sent.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Adam Manzanares <a.manzanares@samsung.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yin Fengwei <fengwei.yin@intel.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23MAINTAINERS: add Chengming Zhou as a zswap reviewerGravatar Chengming Zhou 1-0/+1
I have been actively contributing to zswap and reviewing zswap patches for a while, and I am already getting CC'd on most of them. So add myself as a reviewer, will continue to work on it and help with the review process. Link: https://lkml.kernel.org/r/20240220073851.865113-1-chengming.zhou@linux.dev Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/zsmalloc: remove get_zspage_mapping()Gravatar Chengming Zhou 1-24/+4
Actually we seldom use the class_idx returned from get_zspage_mapping(), only the zspage->fullness is useful, just use zspage->fullness to remove this helper. Note zspage->fullness is not stable outside pool->lock, remove redundant "VM_BUG_ON(fullness != ZS_INUSE_RATIO_0)" in async_free_zspage() since we already have the same VM_BUG_ON() in __free_zspage(), which is safe to access zspage->fullness with pool->lock held. Link: https://lkml.kernel.org/r/20240220-b4-zsmalloc-cleanup-v1-3-5c5ee4ccdd87@bytedance.com Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/zsmalloc: remove_zspage() don't need fullness parameterGravatar Chengming Zhou 1-7/+7
We must remove_zspage() from its current fullness list, then use insert_zspage() to update its fullness and insert to new fullness list. Obviously, remove_zspage() doesn't need the fullness parameter. Link: https://lkml.kernel.org/r/20240220-b4-zsmalloc-cleanup-v1-2-5c5ee4ccdd87@bytedance.com Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/zsmalloc: remove set_zspage_mapping()Gravatar Chengming Zhou 1-11/+2
Patch series "mm/zsmalloc: some cleanup for get/set_zspage_mapping()". The discussion[1] with Sergey shows there are some cleanup works to do in get/set_zspage_mapping(): - the fullness returned from get_zspage_mapping() is not stable outside pool->lock, this usage pattern is confusing, but should be ok in this free_zspage path. - we seldom use the class_idx returned from get_zspage_mapping(), only free_zspage path use to get its class. - set_zspage_mapping() always set the zspage->class, but it's never changed after zspage allocated. [1] https://lore.kernel.org/all/a6c22e30-cf10-4122-91bc-ceb9fb57a5d6@bytedance.com/ This patch (of 3): We only need to update zspage->fullness when insert_zspage(), since zspage->class is never changed after allocated. Link: https://lkml.kernel.org/r/20240220-b4-zsmalloc-cleanup-v1-0-5c5ee4ccdd87@bytedance.com Link: https://lkml.kernel.org/r/20240220-b4-zsmalloc-cleanup-v1-1-5c5ee4ccdd87@bytedance.com Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm: compaction: early termination in compact_nodes()Gravatar Kefeng Wang 1-7/+17
No need to continue try compact memory if pending fatal signal, allow loop termination earlier in compact_nodes(). The existing fatal_signal_pending() check does make compact_zone() break out of the while loop, but it still enters the next zone/next nid, and some unnecessary functions(eg, lru_add_drain) are called. There was no observable benefit from the new test, it is just found from code inspection when refactoring compact_node(). Link: https://lkml.kernel.org/r/20240208022508.1771534-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm: zswap: increase reject_compress_poor but not reject_compress_fail if ↵Gravatar Barry Song 1-14/+13
compression returns ENOSPC We used to rely on the returned -ENOSPC of zpool_malloc() to increase reject_compress_poor. But the code wouldn't get to there after commit 744e1885922a ("crypto: scomp - fix req->dst buffer overflow") as the new code will goto out immediately after the special compression case happens. So there might be no longer a chance to execute zpool_malloc now. We are incorrectly increasing zswap_reject_compress_fail instead. Thus, we need to fix the counters handling right after compressions return ENOSPC. This patch also centralizes the counters handling for all of compress_poor, compress_fail and alloc_fail. Link: https://lkml.kernel.org/r/20240219211935.72394-1-21cnbao@gmail.com Fixes: 744e1885922a ("crypto: scomp - fix req->dst buffer overflow") Signed-off-by: Barry Song <v-songbaohua@oppo.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Acked-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/z3fold: fix the comment for __encode_handle()Gravatar Zhongkun He 1-2/+3
The comment is confusing that Pool lock should be held as this function accesses first_num above the __encode_handle() because first_num is the element of z3fold_header which is protected by z3fold_header->page_lock. I found the same comment for encode_handle() in zbud.c by accident ,Pool lock should be held as this function accesses first|last_chunks, which is the element of zbud_header and it does not have any lock, so pool lock should be held. Z3fold is based on zbud, maybe the comment come from zbud, but it was wrong, so fix it. Link: https://lkml.kernel.org/r/20240219024453.2240147-1-hezhongkun.hzk@bytedance.com Signed-off-by: Zhongkun He <hezhongkun.hzk@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/zsmalloc: remove unused zspage->isolatedGravatar Chengming Zhou 1-32/+0
The zspage->isolated is not used anywhere, we don't need to maintain it, which needs to hold the heavy pool lock to update it, so just remove it. Link: https://lkml.kernel.org/r/20240219-b4-szmalloc-migrate-v1-3-34cd49c6545b@bytedance.com Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/zsmalloc: remove migrate_write_lock_nested()Gravatar Chengming Zhou 1-17/+5
The migrate write lock is to protect the race between zspage migration and zspage objects' map users. We only need to lock out the map users of src zspage, not dst zspage, which is safe to map by users concurrently, since we only need to do obj_malloc() from dst zspage. So we can remove the migrate_write_lock_nested() use case. As we are here, cleanup the __zs_compact() by moving putback_zspage() outside of migrate_write_unlock since we hold pool lock, no malloc or free users can come in. Link: https://lkml.kernel.org/r/20240219-b4-szmalloc-migrate-v1-2-34cd49c6545b@bytedance.com Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/zsmalloc: fix migrate_write_lock() when !CONFIG_COMPACTIONGravatar Chengming Zhou 1-6/+3
Patch series "mm/zsmalloc: fix and optimize objects/page migration". This series is to fix and optimize the zsmalloc objects/page migration. This patch (of 3): migrate_write_lock() is a empty function when !CONFIG_COMPACTION, in which case zs_compact() can be triggered from shrinker reclaim context. (Maybe it's better to rename it to zs_shrink()?) And zspage map object users rely on this migrate_read_lock() so object won't be migrated elsewhere. Fix it by always implementing the migrate_write_lock() related functions. Link: https://lkml.kernel.org/r/20240219-b4-szmalloc-migrate-v1-0-34cd49c6545b@bytedance.com Link: https://lkml.kernel.org/r/20240219-b4-szmalloc-migrate-v1-1-34cd49c6545b@bytedance.com Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23Docs/admin-guide/mm/damon/reclaim: document auto-tuning parametersGravatar SeongJae Park 1-0/+27
Update DAMON_RECLAIM usage document for the user/self feedback based auto-tuning of the quota. Link: https://lkml.kernel.org/r/20240219194431.159606-21-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/damon/reclaim: implement memory PSI-driven quota self-tuningGravatar SeongJae Park 1-0/+25
Support the PSI-driven quota self-tuning from DAMON_RECLAIM by introducing yet another parameter, 'quota_mem_pressure_us'. Users can set the desired amount of memory pressure stall time per each quota reset interval using the parameter. Then DAMON_RECLAIM monitor the memory pressure stall time, specifically system-wide memory 'some' PSI value that increased during the given time interval, and self-tune the quota using the DAMOS core logic. Link: https://lkml.kernel.org/r/20240219194431.159606-20-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/damon/reclaim: implement user-feedback driven quota auto-tuningGravatar SeongJae Park 1-0/+28
DAMOS supports user-feedback driven quota auto-tuning, but only DAMON sysfs interface is using it. Add support of the feature on DAMON_RECLAIM by adding one more input parameter, namely 'quota_autotune_feedback', for providing the user feedback to DAMON_RECLAIM. It assumes the target value of the feedback is 10,000. Link: https://lkml.kernel.org/r/20240219194431.159606-19-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23Docs/admin-guide/mm/damon/usage: document quota goal metric fileGravatar SeongJae Park 1-6/+6
Update DAMON usage document for the quota goal target_metric file. [sj@kernel.org: fix a typo on the auto-tuning design reference link] Link: https://lkml.kernel.org/r/20240221170852.55529-3-sj@kernel.org Link: https://lkml.kernel.org/r/20240219194431.159606-18-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23Docs/ABI/damon: document quota goal metric fileGravatar SeongJae Park 1-0/+6
Update DAMON ABI document for the quota goal target_metric file. Link: https://lkml.kernel.org/r/20240219194431.159606-17-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23Docs/mm/damon/design: document quota goal self-tuningGravatar SeongJae Park 1-2/+18
Update DAMON design doc to explain the quota goal self-tuning, which can be used by setting the goal's metric to metrics that kernel can self-retrieve. Link: https://lkml.kernel.org/r/20240219194431.159606-16-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/damon/sysfs-schemes: support PSI-based quota auto-tuneGravatar SeongJae Park 1-2/+40
Extend DAMON sysfs interface to support the PSI-based quota auto-tuning by adding a new file, 'target_metric' under the quota goal directory. Old users don't get any behavioral changes since the default value of the metric is 'user input'. Link: https://lkml.kernel.org/r/20240219194431.159606-15-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/damon/core: implement PSI metric DAMOS quota goalGravatar SeongJae Park 2-0/+32
Extend DAMOS quota goal metric with system wide memory pressure stall time. Specifically, the system level 'some' PSI for memory is used. The target value can be set in microseconds. DAMOS measures the increased amount of the PSI metric in last quota_reset_interval and use the ratio of it versus the user-specified target PSI value as the score for the auto-tuning feedback loop. Link: https://lkml.kernel.org/r/20240219194431.159606-14-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/damon/core: support multiple metrics for quota goalGravatar SeongJae Park 3-8/+45
DAMOS quota auto-tuning asks users to assess the current tuned quota and provide the feedback in a manual and repeated way. It allows users generate the feedback from a source that the kernel cannot access, and writing a script or a function for doing the manual and repeated feeding is not a big deal. However, additional works are additional works, and it could be more efficient if DAMOS could do the fetch itself, especially in case of DAMON sysfs interface use case, since it can avoid the context switches between the user-space and the kernel-space, though the overhead would be only trivial in most cases. Also in many cases, feedbacks could be made from kernel-accessible sources, such as PSI, CPU usage, etc. Make the quota goal to support multiple types of metrics including such ones. Link: https://lkml.kernel.org/r/20240219194431.159606-13-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/damon/core: let goal specified with only target and current valuesGravatar SeongJae Park 3-22/+16
DAMOS quota auto-tuning feature let users to set the goal by providing a function for getting the current score of the tuned quota. It allows flexible goal setup, but only simple user-set quota is currently being used. As a result, the only user of the DAMOS quota auto-tuning is using a silly void pointer casting based score value passing function. Simplify the interface and the user code by letting user directly set the target and the current value. Link: https://lkml.kernel.org/r/20240219194431.159606-12-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/damon/core: remove ->goal field of damos_quotaGravatar SeongJae Park 2-20/+9
DAMOS quota auto-tuning feature supports static signle goal and dynamic multiple goals via DAMON kernel API, specifically via ->goal and ->goals fields of damos_quota struct, respectively. All in-tree DAMOS kernel API users are using only the dynamic multiple goals now. Remove the unsued static single goal interface. Link: https://lkml.kernel.org/r/20240219194431.159606-11-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/damon/sysfs: use only quota->goalsGravatar SeongJae Park 3-19/+35
DAMON sysfs interface implements multiple quota auto-tuning goals on its level since the DAMOS core logic was supporting only single goal. Now the core logic supports multiple goals on its level. Update DAMON sysfs interface to reuse the core logic and drop unnecessary duplicated multiple goals implementation. Link: https://lkml.kernel.org/r/20240219194431.159606-10-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/damon/core: add multiple goals per damos_quota and helpers for thoseGravatar SeongJae Park 2-7/+88
The feedback-driven DAMOS quota auto-tuning feature allows only single goal to the DAMON kernel API users. The API users could implement multiple goals for the end-users on their level, and that's what DAMON sysfs interface is doing. More DAMON kernel API users such as DAMON_RECLAIM would need to do similar work. To reduce unnecessary future duplciated efforts, support multiple goals from DAMOS core layer. To make the support in minimum non-destructive change, keep the old single goal setup interface, and add multiple goals setup. The single goal will treated as one of the multiple goals, so old API users are not required to make any change. Link: https://lkml.kernel.org/r/20240219194431.159606-9-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/damon/core: split out quota goal related fields to a structGravatar SeongJae Park 3-25/+34
'struct damos_quota' is not small now. Split out fields for quota goal to a separate struct for easier reading. Link: https://lkml.kernel.org/r/20240219194431.159606-8-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/damon: move comments and fields for damos-quota-prioritization to the endGravatar SeongJae Park 1-16/+14
The comments and definition of 'struct damos_quota' lists a few fields for effective quota generation first, fields for regions prioritization under the quota, and then remaining fields for effective quota generation. Readers' should unnecesssarily switch their context in the middle. List all the fields for the effective quota first, and then fields for the prioritization for making it easier to read. Link: https://lkml.kernel.org/r/20240219194431.159606-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23Docs/admin-guide/mm/damon/usage: document effective_bytes fileGravatar SeongJae Park 1-3/+16
Update DAMON usage document for the effective quota file of the DAMON sysfs interface. Link: https://lkml.kernel.org/r/20240219194431.159606-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23Docs/ABI/damon: document effective_bytes sysfs fileGravatar SeongJae Park 1-1/+9
Update the DAMON ABI doc for the effective_bytes sysfs file and the kdamond state file input command for updating the content of the file. Link: https://lkml.kernel.org/r/20240219194431.159606-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/damon/sysfs: implement a kdamond command for updating schemes' effective ↵Gravatar SeongJae Park 3-0/+56
quotas Implement yet another kdamond 'state' file input command, namely 'update_schemes_effective_quotas'. If it is written, the 'effective_bytes' files of the kdamond will be updated to provide the current effective size quota of each scheme in bytes. Link: https://lkml.kernel.org/r/20240219194431.159606-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/damon/sysfs-schemes: implement quota effective_bytes fileGravatar SeongJae Park 1-0/+14
DAMON sysfs interface allows users to set two types of quotas, namely time quota and size quota. DAMOS converts time quota to a size quota and use smaller one among the resulting two size quotas. The resulting effective size quota can be helpful for debugging and analysis, but not exposed to the user. The recently added feedback-driven quota auto-tuning is making it even more mysterious. Implement a DAMON sysfs interface read-only empty file, namely 'effective_bytes', under the quota goal DAMON sysfs directory. It will be extended to expose the effective quota to the end user. Link: https://lkml.kernel.org/r/20240219194431.159606-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/damon/core: set damos_quota->esz as public field and documentGravatar SeongJae Park 2-6/+8
Patch series "mm/damon: let DAMOS feeds and tame/auto-tune itself". The Aim-oriented Feedback-driven DAMOS Aggressiveness Auto-tuning patchset[1] which has merged since commit 9294a037c015 ("mm/damon/core: implement goal-oriented feedback-driven quota auto-tuning") made the mechanism and the policy separated. That is, users can set a part of DAMOS control policies without a deep understanding of the mechanism but just their demands such as SLA. However, users are still required to do some additional work of manually collecting their target metric and feeding it to DAMOS. In the case of end-users who use DAMON sysfs interface, the context switches between user-space and kernel-space could also make it inefficient. The overhead is supposed to be only trivial in common cases, though. Meanwhile, in simple use cases, the target metric could be common system metrics that the kernel can efficiently self-retrieve, such as memory pressure stall time (PSI). Extend DAMOS quota auto-tuning to support multiple types of metrics including the DAMOS self-retrievable ones, and add support for memory pressure stall time metric. Different types of metrics can be supported in future. The auto-tuning capability is currently supported for only users of DAMOS kernel API and DAMON sysfs interface. Extend the support to DAMON_RECLAIM. Patches Sequence ================ First five patches are for helping debugging and fine-tuning existing quota control features. The first one (patch 1) exposes the effective quota that is made with given user inputs to DAMOS kernel API users and kernel-doc documents. Following four patches implement (patches 1, 2 and 3) and document (patches 4 and 5) a new DAMON sysfs file that exposes the value. Following six patches cleanup and simplify the existing DAMOS quota auto-tuning code by improving layout of comments and data structures (patches 6 and 7), supporting common use cases, namely multiple goals (patches 8, 9 and 10), and simplifying the interface (patch 11). Then six patches for the main purpose of this patchset follow. The first three changes extend the core logic for various target metrics (patch 12), implement memory pressure stall time-based target metric support (patch 13), and update DAMON sysfs interface to support the new target metric (patch 14). Then, documentation updates for the features on design (patch 15), ABI (patch 16), and usage (patch 17) follow. Last three patches add auto-tuning support on DAMON_RECLAIM. The patches implement DAMON_RECLAIM parameters for user-feedback driven quota auto-tuning (patch 18), memory pressure stall time-driven quota self-tuning (patch 19), and finally update the DAMON_RECLAIM usage document for the new parameters (patch 20). [1] https://lore.kernel.org/all/20231130023652.50284-1-sj@kernel.org/ This patch (of 20): DAMOS allow users to specify the quota as they want in multiple ways including time quota, size quota, and feedback-based auto-tuning. DAMOS makes one effective quota out of the inputs and use it at the end. Knowing the current effective quota helps understanding DAMOS' internal mechanism and fine-tuning quotas. DAMON kernel API users can get the information from ->esz field of damos_quota struct, but the field is marked as private purpose, and not kernel-doc documented. Make it public and document. Link: https://lkml.kernel.org/r/20240219194431.159606-1-sj@kernel.org Link: https://lkml.kernel.org/r/20240219194431.159606-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23mm/khugepaged: bypassing unnecessary scans with MMF_DISABLE_THP checkGravatar Lance Yang 1-6/+12
khugepaged scans the entire address space in the background for each given mm, looking for opportunities to merge sequences of basic pages into huge pages. However, when an mm is inserted to the mm_slots list, and the MMF_DISABLE_THP flag is set later, this scanning process becomes unnecessary for that mm and can be skipped to avoid redundant operations, especially in scenarios with a large address space. On an Intel Core i5 CPU, the time taken by khugepaged to scan the address space of the process, which has been set with the MMF_DISABLE_THP flag after being added to the mm_slots list, is as follows (shorter is better): VMA Count | Old | New | Change --------------------------------------- 50 | 23us | 9us | -60.9% 100 | 32us | 9us | -71.9% 200 | 44us | 9us | -79.5% 400 | 75us | 9us | -88.0% 800 | 98us | 9us | -90.8% Once the count of VMAs for the process exceeds page_to_scan, khugepaged needs to wait for scan_sleep_millisecs ms before scanning the next process. IMO, unnecessary scans could actually be skipped with a very inexpensive mm->flags check in this case. This commit introduces a check before each scanning process to test the MMF_DISABLE_THP flag for the given mm; if the flag is set, the scanning process is bypassed, thereby improving the efficiency of khugepaged. This optimization is not a correctness issue but rather an enhancement to save expensive checks on each VMA when userspace cannot prctl itself before spawning into the new process. On some servers within our company, we deploy a daemon responsible for monitoring and updating local applications. Some applications prefer not to use THP, so the daemon calls prctl to disable THP before fork/exec. Conversely, for other applications, the daemon calls prctl to enable THP before fork/exec. Ideally, the daemon should invoke prctl after the fork, but its current implementation follows the described approach. In the Go standard library, there is no direct encapsulation of the fork system call; instead, fork and execve are combined into one through syscall.ForkExec. Link: https://lkml.kernel.org/r/20240129054551.57728-1-ioworker0@gmail.com Signed-off-by: Lance Yang <ioworker0@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Peter Xu <peterx@redhat.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23MAINTAINERS: update mm and memcg entriesGravatar Mike Rapoport (IBM) 1-0/+10
Add F: lines for memory management and memory cgroup include files. Link: https://lkml.kernel.org/r/20240208055727.142387-1-rppt@kernel.org Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23arch, crash: move arch_crash_save_vmcoreinfo() out to file vmcore_info.cGravatar Baoquan He 12-61/+82
Nathan reported below building error: ===== $ curl -LSso .config https://git.alpinelinux.org/aports/plain/community/linux-edge/config-edge.armv7 $ make -skj"$(nproc)" ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- olddefconfig all .. arm-linux-gnueabi-ld: arch/arm/kernel/machine_kexec.o: in function `arch_crash_save_vmcoreinfo': machine_kexec.c:(.text+0x488): undefined reference to `vmcoreinfo_append_str' ==== On architecutres, like arm, s390, ppc, sh, function arch_crash_save_vmcoreinfo() is located in machine_kexec.c and it can only be compiled in when CONFIG_KEXEC_CORE=y. That's not right because arch_crash_save_vmcoreinfo() is used to export arch specific vmcoreinfo. CONFIG_VMCORE_INFO is supposed to control its compiling in. However, CONFIG_VMVCORE_INFO could be independent of CONFIG_KEXEC_CORE, e.g CONFIG_PROC_KCORE=y will select CONFIG_VMVCORE_INFO. Or CONFIG_KEXEC/CONFIG_KEXEC_FILE is set while CONFIG_CRASH_DUMP is not set, it will report linking error. So, on arm, s390, ppc and sh, move arch_crash_save_vmcoreinfo out to a new file vmcore_info.c. Let CONFIG_VMCORE_INFO decide if compiling in arch_crash_save_vmcoreinfo(). [akpm@linux-foundation.org: remove stray newlines at eof] Link: https://lkml.kernel.org/r/20240129135033.157195-3-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://lore.kernel.org/all/20240126045551.GA126645@dev-arch.thelio-3990X/T/#u Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Klara Modin <klarasmodin@gmail.com> Cc: Michael Kelley <mhklinux@outlook.com> Cc: Pingfan Liu <piliu@redhat.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>