aboutsummaryrefslogtreecommitdiff
path: root/Documentation/admin-guide
AgeCommit message (Collapse)AuthorFilesLines
2022-05-09Documentation: drop more IDE boot options and ide-cd.rstGravatar Randy Dunlap 1-20/+0
Drop ide-* command line options. Drop cdrom/ide-cd.rst documentation. Fixes: 898ee22c32be ("Drop Documentation/ide/") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Phillip Potter <phil@philpotter.co.uk> Link: https://lore.kernel.org/r/20220424033701.7988-1-rdunlap@infradead.org [jc: also deleted reference from cdrom/index.rst] Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-05-09doc: admin-guide: Update libata kernel parametersGravatar Damien Le Moal 1-16/+55
Cleanup the text text describing the libata.force boot parameter and update the list of the values to include all supported horkage and link flag that can be forced. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de>
2022-05-07docs: kdump: Update the crashkernel description for arm64Gravatar Zhen Lei 1-2/+7
Now arm64 has added support for "crashkernel=X,high" and "crashkernel=Y,low". Unlike x86, crash low memory is not allocated if "crashkernel=Y,low" is not specified. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Baoquan He <bhe@redhat.com> Link: https://lore.kernel.org/r/20220506114402.365-7-thunder.leizhen@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-05ima: define a new template field named 'd-ngv2' and templatesGravatar Mimi Zohar 1-1/+2
In preparation to differentiate between unsigned regular IMA file hashes and fs-verity's file digests in the IMA measurement list, define a new template field named 'd-ngv2'. Also define two new templates named 'ima-ngv2' and 'ima-sigv2', which include the new 'd-ngv2' field. Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-05-03Merge branches 'docs.2022.04.20a', 'fixes.2022.04.20a', 'nocb.2022.04.11b', ↵Gravatar Paul E. McKenney 1-3/+70
'rcu-tasks.2022.04.11b', 'srcu.2022.05.03a', 'torture.2022.04.11b', 'torture-tasks.2022.04.20a' and 'torturescript.2022.04.20a' into HEAD docs.2022.04.20a: Documentation updates. fixes.2022.04.20a: Miscellaneous fixes. nocb.2022.04.11b: Callback-offloading updates. rcu-tasks.2022.04.11b: RCU-tasks updates. srcu.2022.05.03a: Put SRCU on a memory diet. torture.2022.04.11b: Torture-test updates. torture-tasks.2022.04.20a: Avoid torture testing changing RCU configuration. torturescript.2022.04.20a: Torture-test scripting updates.
2022-05-03srcu: Automatically determine size-transition strategy at bootGravatar Paul E. McKenney 1-0/+10
This commit adds a srcutree.convert_to_big option of zero that causes SRCU to decide at boot whether to wait for contention (small systems) or immediately expand to large (large systems). A new srcutree.big_cpu_lim (defaulting to 128) defines how many CPUs constitute a large system. Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-05-01Documentation/sysctl: document max_rcu_stall_to_panicGravatar Joel Savitz 1-0/+7
commit dfe564045c65 ("rcu: Panic after fixed number of stalls") introduced a new systctl but no accompanying documentation. Add a simple entry to the documentation. Signed-off-by: Joel Savitz <jsavitz@redhat.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-04-29memcg: introduce per-memcg reclaim interfaceGravatar Shakeel Butt 1-0/+21
This patch series adds a memory.reclaim proactive reclaim interface. The rationale behind the interface and how it works are in the first patch. This patch (of 4): Introduce a memcg interface to trigger memory reclaim on a memory cgroup. Use case: Proactive Reclaim --------------------------- A userspace proactive reclaimer can continuously probe the memcg to reclaim a small amount of memory. This gives more accurate and up-to-date workingset estimation as the LRUs are continuously sorted and can potentially provide more deterministic memory overcommit behavior. The memory overcommit controller can provide more proactive response to the changing behavior of the running applications instead of being reactive. A userspace reclaimer's purpose in this case is not a complete replacement for kswapd or direct reclaim, it is to proactively identify memory savings opportunities and reclaim some amount of cold pages set by the policy to free up the memory for more demanding jobs or scheduling new jobs. A user space proactive reclaimer is used in Google data centers. Additionally, Meta's TMO paper recently referenced a very similar interface used for user space proactive reclaim: https://dl.acm.org/doi/pdf/10.1145/3503222.3507731 Benefits of a user space reclaimer: ----------------------------------- 1) More flexible on who should be charged for the cpu of the memory reclaim. For proactive reclaim, it makes more sense to be centralized. 2) More flexible on dedicating the resources (like cpu). The memory overcommit controller can balance the cost between the cpu usage and the memory reclaimed. 3) Provides a way to the applications to keep their LRUs sorted, so, under memory pressure better reclaim candidates are selected. This also gives more accurate and uptodate notion of working set for an application. Why memory.high is not enough? ------------------------------ - memory.high can be used to trigger reclaim in a memcg and can potentially be used for proactive reclaim. However there is a big downside in using memory.high. It can potentially introduce high reclaim stalls in the target application as the allocations from the processes or the threads of the application can hit the temporary memory.high limit. - Userspace proactive reclaimers usually use feedback loops to decide how much memory to proactively reclaim from a workload. The metrics used for this are usually either refaults or PSI, and these metrics will become messy if the application gets throttled by hitting the high limit. - memory.high is a stateful interface, if the userspace proactive reclaimer crashes for any reason while triggering reclaim it can leave the application in a bad state. - If a workload is rapidly expanding, setting memory.high to proactively reclaim memory can result in actually reclaiming more memory than intended. The benefits of such interface and shortcomings of existing interface were further discussed in this RFC thread: https://lore.kernel.org/linux-mm/5df21376-7dd1-bf81-8414-32a73cea45dd@google.com/ Interface: ---------- Introducing a very simple memcg interface 'echo 10M > memory.reclaim' to trigger reclaim in the target memory cgroup. The interface is introduced as a nested-keyed file to allow for future optional arguments to be easily added to configure the behavior of reclaim. Possible Extensions: -------------------- - This interface can be extended with an additional parameter or flags to allow specifying one or more types of memory to reclaim from (e.g. file, anon, ..). - The interface can also be extended with a node mask to reclaim from specific nodes. This has use cases for reclaim-based demotion in memory tiering systens. - A similar per-node interface can also be added to support proactive reclaim and reclaim-based demotion in systems without memcg. - Add a timeout parameter to make it easier for user space to call the interface without worrying about being blocked for an undefined amount of time. For now, let's keep things simple by adding the basic functionality. [yosryahmed@google.com: worked on versions v2 onwards, refreshed to current master, updated commit message based on recent discussions and use cases] Link: https://lkml.kernel.org/r/20220425190040.2475377-1-yosryahmed@google.com Link: https://lkml.kernel.org/r/20220425190040.2475377-2-yosryahmed@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Co-developed-by: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Wei Xu <weixugc@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: David Rientjes <rientjes@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Greg Thelen <gthelen@google.com> Cc: Chen Wandun <chenwandun@huawei.com> Cc: Vaibhav Jain <vaibhav@linux.ibm.com> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29zram: add a huge_idle writeback modeGravatar Brian Geffon 1-0/+5
Today it's only possible to write back as a page, idle, or huge. A user might want to writeback pages which are huge and idle first as these idle pages do not require decompression and make a good first pass for writeback. Idle writeback specifically has the advantage that a refault is unlikely given that the page has been swapped for some amount of time without being refaulted. Huge writeback has the advantage that you're guaranteed to get the maximum benefit from a single page writeback, that is, you're reclaiming one full page of memory. Pages which are compressed in zram being written back result in some benefit which is always less than a page size because of the fact that it was compressed. The primary use of this is for minimizing refaults in situations where the device has to be sensitive to storage endurance. On ChromeOS we have devices with slow eMMC and repeated writes and refaults can negatively affect performance and endurance. Link: https://lkml.kernel.org/r/20220322215821.1196994-1-bgeffon@google.com Signed-off-by: Brian Geffon <bgeffon@google.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28mm/vmstat: add events for ksm cowGravatar Yang Yang 1-0/+18
Users may use ksm by calling madvise(, , MADV_MERGEABLE) when they want to save memory, it's a tradeoff by suffering delay on ksm cow. Users can get to know how much memory ksm saved by reading /sys/kernel/mm/ksm/pages_sharing, but they don't know what's the costs of ksm cow, and this is important of some delay sensitive tasks. So add ksm cow events to help users evaluate whether or how to use ksm. Also update Documentation/admin-guide/mm/ksm.rst with new added events. Link: https://lkml.kernel.org/r/20220331035616.2390805-1-yang.yang29@zte.com.cn Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: xu xin <xu.xin16@zte.com.cn> Reviewed-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Saravanan D <saravanand@fb.com> Cc: Minchan Kim <minchan@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28mm: hugetlb_vmemmap: cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP*Gravatar Muchun Song 2-3/+3
The word of "free" is not expressive enough to express the feature of optimizing vmemmap pages associated with each HugeTLB, rename this keywork to "optimize". In this patch , cheanup configs to make code more expressive. Link: https://lkml.kernel.org/r/20220404074652.68024-4-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28Documentation/sysctl: document page_lock_unfairnessGravatar Joel Savitz 1-0/+9
commit 5ef64cc8987a ("mm: allow a controlled amount of unfairness in the page lock") introduced a new systctl but no accompanying documentation. Add a simple entry to the documentation. Link: https://lkml.kernel.org/r/20220325164437.120246-1-jsavitz@redhat.com Signed-off-by: Joel Savitz <jsavitz@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: "zhangyi (F)" <yi.zhang@huawei.com> Cc: Charan Teja Reddy <charante@codeaurora.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28Documentation: add missing angle bracket in cgroup-v2 docGravatar Joel Savitz 1-1/+1
Trivial addition of missing closing angle backet. Signed-off-by: Joel Savitz <jsavitz@redhat.com> Link: https://lore.kernel.org/r/20220422164526.3464306-1-jsavitz@redhat.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-04-27net: wan: remove support for COSA and SRP synchronous serial boardsGravatar Jakub Kicinski 1-1/+1
Looks like all the changes to this driver had been automated churn since git era begun. The driver is using virt_to_bus() so it should be updated to a proper DMA API or removed. Given the latest "news" entry on the website is from 1999 I'm opting for the latter. I'm marking the allocated char device major number as [REMOVED], I reckon we can't reuse it in case some SW out there assumes its COSA? Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-26docs: bootconfig: Add how to embed the bootconfig into kernelGravatar Masami Hiramatsu 1-3/+28
Add a description how to embed the bootconfig file into kernel. Link: https://lkml.kernel.org/r/164921228987.1090670.16843569536974147213.stgit@devnote2 Cc: Padmanabha Srinivasaiah <treasure4paddy@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Linux Kbuild mailing list <linux-kbuild@vger.kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-24media: docs: Fix vimc default pipeline graphGravatar Kwang Son 1-7/+7
RGB/YUV Input is sensor type and it should be sub-dev node. To generate this dot graph sudo modprobe vimc media-ctl --print-dot Signed-off-by: Kwang Son <dev.kwang.son@gmail.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-04-20kernel/smp: Provide boot-time timeout for CSD lock diagnosticsGravatar Paul E. McKenney 1-0/+11
Debugging of problems involving insanely long-running SMI handlers proceeds better if the CSD-lock timeout can be adjusted. This commit therefore provides a new smp.csd_lock_timeout kernel boot parameter that specifies the timeout in milliseconds. The default remains at the previously hard-coded value of five seconds. [ paulmck: Apply feedback from Juergen Gross. ] Cc: Rik van Riel <riel@surriel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-16docs/admin: alphabetize parts of kernel-parameters.txt (part 2)Gravatar Randy Dunlap 1-83/+83
Alphabetize several of the kernel boot parameters in kernel-parameters.txt. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-04-16Docs/admin: alphabetize some kernel-parameters (part 1)Gravatar Randy Dunlap 2-63/+61
Move some out-of-place kernel parameters into their correct locations. Move one out-of-order keyword/legend in kernel-parameters.rst. Add some missing keyword legends in kernel-parameters.rst: HIBERNATION HYPER_V and drop some obsolete/removed keyword legends: EIDE IOSCHED OSS TS XT Correct the location of the setup.h file. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-04-16Docs: admin/kernel-parameters: edit a few boot optionsGravatar Randy Dunlap 1-11/+36
Clean up some of admin-guide/kernel-parameters.txt: a. "smt" should be "smt=" (S390) b. (dropped) c. Sparc supports the vdso= boot option d. make the tp_printk options (2) formatting similar to other options by adding spacing e. add "trace_clock=" with a reference to Documentation/trace/ftrace.rst f. use [IA-64] as documented instead of [ia64] g. fix formatting and text for test_suspend= h. fix formatting for swapaccount= i. fix formatting and grammar for video.brightness_switch_enabled= Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: linux-s390@vger.kernel.org Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linux-ia64@vger.kernel.org Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Len Brown <lenb@kernel.org> Cc: linux-pm@vger.kernel.org Cc: linux-acpi@vger.kernel.org Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-04-16x86/efi: Remove references of EFI earlyprintk from documentationGravatar Akihiko Odaki 1-2/+2
x86 EFI earlyprink was removed with commit 69c1f396f25b ("efi/x86: Convert x86 EFI earlyprintk into generic earlycon implementation"). Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-04-11rcu-tasks: Print pre-stall-warning informational messagesGravatar Paul E. McKenney 1-3/+27
RCU-tasks stall-warning messages are printed after the grace period is ten minutes old. Unfortunately, most of us will have rebooted the system in response to an apparently-hung command long before the ten minutes is up, and will thus see what looks to be a silent hang. This commit therefore adds pr_info() messages that are printed earlier. These should avoid being classified as errors, but should give impatient users a hint. These are controlled by new rcupdate.rcu_task_stall_info and rcupdate.rcu_task_stall_info_mult kernel-boot parameters. The former defines the initial delay in jiffies (defaulting to 10 seconds) and the latter defines the multiplier (defaulting to 3). Thus, by default, the first message will appear 10 seconds into the RCU-tasks grace period, the second 40 seconds in, and the third 160 seconds in. There would be a fourth at 640 seconds in, but the stall warning message appears 600 seconds in, and once a stall warning is printed for a given grace period, no further informational messages are printed. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-11srcu: Add contention-triggered addition of srcu_node treeGravatar Paul E. McKenney 1-0/+9
This commit instruments the acquisitions of the srcu_struct structure's ->lock, enabling the initiation of a transition from SRCU_SIZE_SMALL to SRCU_SIZE_BIG when sufficient contention is experienced. The instrumentation counts the number of trylock failures within the confines of a single jiffy. If that number exceeds the value specified by the srcutree.small_contention_lim kernel boot parameter (which defaults to 100), and if the value specified by the srcutree.convert_to_big kernel boot parameter has the 0x10 bit set (defaults to 0), then a transition will be automatically initiated. By default, there will never be any transitions, so that none of the srcu_struct structures ever gains an srcu_node array. The useful values for srcutree.convert_to_big are: 0x00: Never convert. 0x01: Always convert at init_srcu_struct() time. 0x02: Convert when rcutorture prints its first round of statistics. 0x03: Decide conversion approach at boot given system size. 0x10: Convert if contention is encountered. 0x12: Convert if contention is encountered or when rcutorture prints its first round of statistics, whichever comes first. The value 0x11 acts the same as 0x01 because the conversion happens before there is any chance of contention. [ paulmck: Apply "static" feedback from kernel test robot. ] Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-11srcu: Add boot-time control over srcu_node array allocationGravatar Paul E. McKenney 1-0/+13
This commit adds an srcu_tree.convert_to_big kernel parameter that either refuses to convert at all (0), converts immediately at init_srcu_struct() time (1), or lets rcutorture convert it (2). An addition contention-based dynamic conversion choice will be added, along with documentation. [ paulmck: Apply callback-scanning feedback from Neeraj Upadhyay. ] Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-07x86/sev: Add a sev= cmdline optionGravatar Michael Roth 1-0/+2
For debugging purposes it is very useful to have a way to see the full contents of the SNP CPUID table provided to a guest. Add an sev=debug kernel command-line option to do so. Also introduce some infrastructure so that additional options can be specified via sev=option1[,option2] over time in a consistent manner. [ bp: Massage, simplify string parsing. ] Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-41-brijesh.singh@amd.com
2022-04-04x86/cpu: Remove "noclflush"Gravatar Borislav Petkov 1-2/+0
Not really needed anymore and there's clearcpuid=. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220127115626.14179-7-bp@alien8.de
2022-04-04x86/cpu: Remove "noexec"Gravatar Borislav Petkov 1-5/+0
It doesn't make any sense to disable non-executable mappings - security-wise or else. So rip out that switch and move the remaining code into setup.c and delete setup_nx.c Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220127115626.14179-6-bp@alien8.de
2022-04-04x86/cpu: Remove "nosmep"Gravatar Borislav Petkov 1-1/+1
There should be no need to disable SMEP anymore. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220127115626.14179-5-bp@alien8.de
2022-04-04x86/cpu: Remove CONFIG_X86_SMAP and "nosmap"Gravatar Borislav Petkov 1-1/+1
Those were added as part of the SMAP enablement but SMAP is currently an integral part of kernel proper and there's no need to disable it anymore. Rip out that functionality. Leave --uaccess default on for objtool as this is what objtool should do by default anyway. If still needed - clearcpuid=smap. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220127115626.14179-4-bp@alien8.de
2022-04-04x86/cpu: Remove "nosep"Gravatar Borislav Petkov 1-2/+0
That chicken bit was added by 4f88651125e2 ("[PATCH] i386: allow disabling X86_FEATURE_SEP at boot") but measuring int80 vsyscall performance on 32-bit doesn't matter anymore. If still needed, one can boot with clearcpuid=sep to disable that feature for testing. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220127115626.14179-3-bp@alien8.de
2022-04-04x86/cpu: Allow feature bit names from /proc/cpuinfo in clearcpuid=Gravatar Borislav Petkov 1-3/+8
Having to give the X86_FEATURE array indices in order to disable a feature bit for testing is not really user-friendly. So accept the feature bit names too. Some feature bits don't have names so there the array indices are still accepted, of course. Clearing CPUID flags is not something which should be done in production so taint the kernel too. An exemplary cmdline would then be something like: clearcpuid=de,440,smca,succory,bmi1,3dnow ("succory" is wrong on purpose). And it says: [ ... ] Clearing CPUID bits: de 13:24 smca (unknown: succory) bmi1 3dnow [ Fix CONFIG_X86_FEATURE_NAMES=n build error as reported by the 0day robot: https://lore.kernel.org/r/202203292206.ICsY2RKX-lkp@intel.com ] Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220127115626.14179-2-bp@alien8.de
2022-03-31Merge tag 'random-5.18-rc1-for-linus' of ↵Gravatar Linus Torvalds 1-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator fixes from Jason Donenfeld: - If a hardware random number generator passes a sufficiently large chunk of entropy to random.c during early boot, we now skip the "fast_init" business and let it initialize the RNG. This makes CONFIG_RANDOM_TRUST_BOOTLOADER=y actually useful. - We already have the command line `random.trust_cpu=0/1` option for RDRAND, which let distros enable CONFIG_RANDOM_TRUST_CPU=y while placating concerns of more paranoid users. Now we add `random.trust_bootloader=0/1` so that distros can similarly enable CONFIG_RANDOM_TRUST_BOOTLOADER=y. - Re-add a comment that got removed by accident in the recent revert. - Add the spec-compliant ACPI CID for vmgenid, which Microsoft added to the vmgenid spec at Ard's request during earlier review. - Restore build-time randomness via the latent entropy plugin, which was lost when we transitioned to using a hash function. * tag 'random-5.18-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: random: mix build-time latent entropy into pool at init virt: vmgenid: recognize new CID added by Hyper-V random: re-add removed comment about get_random_{u32,u64} reseeding random: treat bootloader trust toggle the same way as cpu trust toggle random: skip fast_init if hwrng provides large chunk of entropy
2022-03-29Merge tag 'pm-5.18-rc1-2' of ↵Gravatar Linus Torvalds 1-68/+67
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These update ARM cpufreq drivers, the OPP (Operating Performance Points) library and the power management documentation. Specifics: - Add per core DVFS support for QCom SoC (Bjorn Andersson), convert to yaml binding (Manivannan Sadhasivam) and various other fixes to the QCom drivers (Luca Weiss). - Add OPP table for imx7s SoC (Denys Drozdov) and minor fixes (Stefan Agner). - Fix CPPC driver's freq/performance conversions (Pierre Gondois). - Minor generic cleanups (Yury Norov). - Introduce opp-microwatt property to the OPP core, bindings, etc (Lukasz Luba). - Convert DT bindings to schema format and various related fixes (Yassine Oudjana). - Expose OPP's OF node in debugfs (Viresh Kumar). - Add Intel uncore frequency scaling documentation file to its MAINTAINERS entry (Srinivas Pandruvada). - Clean up the AMD P-state driver documentation (Jan Engelhardt)" * tag 'pm-5.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (24 commits) Documentation: amd-pstate: grammar and sentence structure updates dt-bindings: cpufreq: cpufreq-qcom-hw: Convert to YAML bindings dt-bindings: dvfs: Use MediaTek CPUFREQ HW as an example Documentation: EM: Describe new registration method using DT OPP: Add support of "opp-microwatt" for EM registration PM: EM: add macro to set .active_power() callback conditionally OPP: Add "opp-microwatt" supporting code dt-bindings: opp: Add "opp-microwatt" entry in the OPP MAINTAINERS: Add additional file to uncore frequency control cpufreq: blocklist Qualcomm sc8280xp and sa8540p in cpufreq-dt-platdev cpufreq: qcom-hw: Add support for per-core-dcvs dt-bindings: power: avs: qcom,cpr: Convert to DT schema arm64: dts: qcom: qcs404: Rename CPU and CPR OPP tables arm64: dts: qcom: msm8996: Rename cluster OPP tables dt-bindings: opp: Convert qcom-nvmem-cpufreq to DT schema dt-bindings: opp: qcom-opp: Convert to DT schema arm64: dts: qcom: msm8996-mtp: Add msm8996 compatible dt-bindings: arm: qcom: Add msm8996 and apq8096 compatibles opp: Expose of-node's name in debugfs cpufreq: CPPC: Fix performance/frequency conversion ...
2022-03-29Merge branch 'pm-docs'Gravatar Rafael J. Wysocki 1-68/+67
Merge additional power management documentation udates for 5.18-rc1: - Add Intel uncore frequency scaling documentation file to its MAINTAINERS entry (Srinivas Pandruvada). - Clean up the AMD P-state driver documentation (Jan Engelhardt). * pm-docs: Documentation: amd-pstate: grammar and sentence structure updates MAINTAINERS: Add additional file to uncore frequency control
2022-03-25Merge tag 'platform-drivers-x86-v5.18-1' of ↵Gravatar Linus Torvalds 1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver updates from Hans de Goede: "New drivers: - AMD Host System Management Port (HSMP) - Intel Software Defined Silicon Removed drivers (functionality folded into other drivers): - intel_cht_int33fe_microb - surface3_button amd-pmc: - s2idle bug-fixes - Support for AMD Spill to DRAM STB feature hp-wmi: - Fix SW_TABLET_MODE detection method (and other fixes) - Support omen thermal profile policy v1 serial-multi-instantiate: - Add SPI device support - Add support for CS35L41 amplifiers used in new laptops think-lmi: - syfs-class-firmware-attributes Certificate authentication support thinkpad_acpi: - Fixes + quirks - Add platform_profile support on AMD based ThinkPads x86-android-tablets: - Improve Asus ME176C / TF103C support - Support Nextbook Ares 8, Lenovo Tab 2 830 and 1050 tablets Lots of various other small fixes and hardware-id additions" * tag 'platform-drivers-x86-v5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (60 commits) platform/x86: think-lmi: Certificate authentication support Documentation: syfs-class-firmware-attributes: Lenovo Certificate support platform/x86: amd-pmc: Only report STB errors when STB enabled platform/x86: amd-pmc: Drop CPU QoS workaround platform/x86: amd-pmc: Output error codes in messages platform/x86: amd-pmc: Move to later in the suspend process ACPI / x86: Add support for LPS0 callback handler platform/x86: thinkpad_acpi: consistently check fan_get_status return. platform/x86: hp-wmi: support omen thermal profile policy v1 platform/x86: hp-wmi: Changing bios_args.data to be dynamically allocated platform/x86: hp-wmi: Fix 0x05 error code reported by several WMI calls platform/x86: hp-wmi: Fix SW_TABLET_MODE detection method platform/x86: hp-wmi: Fix hp_wmi_read_int() reporting error (0x05) platform/x86: amd-pmc: Validate entry into the deepest state on resume platform/x86: thinkpad_acpi: Don't use test_bit on an integer platform/x86: thinkpad_acpi: Fix compiler warning about uninitialized err variable platform/x86: thinkpad_acpi: clean up dytc profile convert platform/x86: x86-android-tablets: Depend on EFI and SPI platform/x86: amd-pmc: uninitialized variable in amd_pmc_s2d_init() platform/x86: intel-uncore-freq: fix uncore_freq_common_init() error codes ...
2022-03-25Documentation: amd-pstate: grammar and sentence structure updatesGravatar Jan Engelhardt 1-68/+67
Signed-off-by: Jan Engelhardt <jengelh@inai.de> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-25random: treat bootloader trust toggle the same way as cpu trust toggleGravatar Jason A. Donenfeld 1-0/+6
If CONFIG_RANDOM_TRUST_CPU is set, the RNG initializes using RDRAND. But, the user can disable (or enable) this behavior by setting `random.trust_cpu=0/1` on the kernel command line. This allows system builders to do reasonable things while avoiding howls from tinfoil hatters. (Or vice versa.) CONFIG_RANDOM_TRUST_BOOTLOADER is basically the same thing, but regards the seed passed via EFI or device tree, which might come from RDRAND or a TPM or somewhere else. In order to allow distros to more easily enable this while avoiding those same howls (or vice versa), this commit adds the corresponding `random.trust_bootloader=0/1` toggle. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Graham Christensen <graham@grahamc.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Link: https://github.com/NixOS/nixpkgs/pull/165355 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-03-24Merge branch 'akpm' (patches from Andrew)Gravatar Linus Torvalds 3-3/+14
Merge more updates from Andrew Morton: "Various misc subsystems, before getting into the post-linux-next material. 41 patches. Subsystems affected by this patch series: procfs, misc, core-kernel, lib, checkpatch, init, pipe, minix, fat, cgroups, kexec, kdump, taskstats, panic, kcov, resource, and ubsan" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits) Revert "ubsan, kcsan: Don't combine sanitizer with kcov on clang" kernel/resource: fix kfree() of bootmem memory again kcov: properly handle subsequent mmap calls kcov: split ioctl handling into locked and unlocked parts panic: move panic_print before kmsg dumpers panic: add option to dump all CPUs backtraces in panic_print docs: sysctl/kernel: add missing bit to panic_print taskstats: remove unneeded dead assignment kasan: no need to unset panic_on_warn in end_report() ubsan: no need to unset panic_on_warn in ubsan_epilogue() panic: unset panic_on_warn inside panic() docs: kdump: add scp example to write out the dump file docs: kdump: update description about sysfs file system support arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef x86/setup: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef riscv: mm: init: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef kexec: make crashk_res, crashk_low_res and crash_notes symbols always visible cgroup: use irqsave in cgroup_rstat_flush_locked(). fat: use pointer to simple type in put_user() minix: fix bug when opening a file with O_DIRECT ...
2022-03-24Merge tag 'net-next-5.18' of ↵Gravatar Linus Torvalds 1-0/+9
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "The sprinkling of SPI drivers is because we added a new one and Mark sent us a SPI driver interface conversion pull request. Core ---- - Introduce XDP multi-buffer support, allowing the use of XDP with jumbo frame MTUs and combination with Rx coalescing offloads (LRO). - Speed up netns dismantling (5x) and lower the memory cost a little. Remove unnecessary per-netns sockets. Scope some lists to a netns. Cut down RCU syncing. Use batch methods. Allow netdev registration to complete out of order. - Support distinguishing timestamp types (ingress vs egress) and maintaining them across packet scrubbing points (e.g. redirect). - Continue the work of annotating packet drop reasons throughout the stack. - Switch netdev error counters from an atomic to dynamically allocated per-CPU counters. - Rework a few preempt_disable(), local_irq_save() and busy waiting sections problematic on PREEMPT_RT. - Extend the ref_tracker to allow catching use-after-free bugs. BPF --- - Introduce "packing allocator" for BPF JIT images. JITed code is marked read only, and used to be allocated at page granularity. Custom allocator allows for more efficient memory use, lower iTLB pressure and prevents identity mapping huge pages from getting split. - Make use of BTF type annotations (e.g. __user, __percpu) to enforce the correct probe read access method, add appropriate helpers. - Convert the BPF preload to use light skeleton and drop the user-mode-driver dependency. - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling its use as a packet generator. - Allow local storage memory to be allocated with GFP_KERNEL if called from a hook allowed to sleep. - Introduce fprobe (multi kprobe) to speed up mass attachment (arch bits to come later). - Add unstable conntrack lookup helpers for BPF by using the BPF kfunc infra. - Allow cgroup BPF progs to return custom errors to user space. - Add support for AF_UNIX iterator batching. - Allow iterator programs to use sleepable helpers. - Support JIT of add, and, or, xor and xchg atomic ops on arm64. - Add BTFGen support to bpftool which allows to use CO-RE in kernels without BTF info. - Large number of libbpf API improvements, cleanups and deprecations. Protocols --------- - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev. - Adjust TSO packet sizes based on min_rtt, allowing very low latency links (data centers) to always send full-sized TSO super-frames. - Make IPv6 flow label changes (AKA hash rethink) more configurable, via sysctl and setsockopt. Distinguish between server and client behavior. - VxLAN support to "collect metadata" devices to terminate only configured VNIs. This is similar to VLAN filtering in the bridge. - Support inserting IPv6 IOAM information to a fraction of frames. - Add protocol attribute to IP addresses to allow identifying where given address comes from (kernel-generated, DHCP etc.) - Support setting socket and IPv6 options via cmsg on ping6 sockets. - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS. Define dscp_t and stop taking ECN bits into account in fib-rules. - Add support for locked bridge ports (for 802.1X). - tun: support NAPI for packets received from batched XDP buffs, doubling the performance in some scenarios. - IPv6 extension header handling in Open vSwitch. - Support IPv6 control message load balancing in bonding, prevent neighbor solicitation and advertisement from using the wrong port. Support NS/NA monitor selection similar to existing ARP monitor. - SMC - improve performance with TCP_CORK and sendfile() - support auto-corking - support TCP_NODELAY - MCTP (Management Component Transport Protocol) - add user space tag control interface - I2C binding driver (as specified by DMTF DSP0237) - Multi-BSSID beacon handling in AP mode for WiFi. - Bluetooth: - handle MSFT Monitor Device Event - add MGMT Adv Monitor Device Found/Lost events - Multi-Path TCP: - add support for the SO_SNDTIMEO socket option - lots of selftest cleanups and improvements - Increase the max PDU size in CAN ISOTP to 64 kB. Driver API ---------- - Add HW counters for SW netdevs, a mechanism for devices which offload packet forwarding to report packet statistics back to software interfaces such as tunnels. - Select the default NIC queue count as a fraction of number of physical CPU cores, instead of hard-coding to 8. - Expose devlink instance locks to drivers. Allow device layer of drivers to use that lock directly instead of creating their own which always runs into ordering issues in devlink callbacks. - Add header/data split indication to guide user space enabling of TCP zero-copy Rx. - Allow configuring completion queue event size. - Refactor page_pool to enable fragmenting after allocation. - Add allocation and page reuse statistics to page_pool. - Improve Multiple Spanning Trees support in the bridge to allow reuse of topologies across VLANs, saving HW resources in switches. - DSA (Distributed Switch Architecture): - replay and offload of host VLAN entries - offload of static and local FDB entries on LAG interfaces - FDB isolation and unicast filtering New hardware / drivers ---------------------- - Ethernet: - LAN937x T1 PHYs - Davicom DM9051 SPI NIC driver - Realtek RTL8367S, RTL8367RB-VB switch and MDIO - Microchip ksz8563 switches - Netronome NFP3800 SmartNICs - Fungible SmartNICs - MediaTek MT8195 switches - WiFi: - mt76: MediaTek mt7916 - mt76: MediaTek mt7921u USB adapters - brcmfmac: Broadcom BCM43454/6 - Mobile: - iosm: Intel M.2 7360 WWAN card Drivers ------- - Convert many drivers to the new phylink API built for split PCS designs but also simplifying other cases. - Intel Ethernet NICs: - add TTY for GNSS module for E810T device - improve AF_XDP performance - GTP-C and GTP-U filter offload - QinQ VLAN support - Mellanox Ethernet NICs (mlx5): - support xdp->data_meta - multi-buffer XDP - offload tc push_eth and pop_eth actions - Netronome Ethernet NICs (nfp): - flow-independent tc action hardware offload (police / meter) - AF_XDP - Other Ethernet NICs: - at803x: fiber and SFP support - xgmac: mdio: preamble suppression and custom MDC frequencies - r8169: enable ASPM L1.2 if system vendor flags it as safe - macb/gem: ZynqMP SGMII - hns3: add TX push mode - dpaa2-eth: software TSO - lan743x: multi-queue, mdio, SGMII, PTP - axienet: NAPI and GRO support - Mellanox Ethernet switches (mlxsw): - source and dest IP address rewrites - RJ45 ports - Marvell Ethernet switches (prestera): - basic routing offload - multi-chain TC ACL offload - NXP embedded Ethernet switches (ocelot & felix): - PTP over UDP with the ocelot-8021q DSA tagging protocol - basic QoS classification on Felix DSA switch using dcbnl - port mirroring for ocelot switches - Microchip high-speed industrial Ethernet (sparx5): - offloading of bridge port flooding flags - PTP Hardware Clock - Other embedded switches: - lan966x: PTP Hardward Clock - qca8k: mdio read/write operations via crafted Ethernet packets - Qualcomm 802.11ax WiFi (ath11k): - add LDPC FEC type and 802.11ax High Efficiency data in radiotap - enable RX PPDU stats in monitor co-exist mode - Intel WiFi (iwlwifi): - UHB TAS enablement via BIOS - band disablement via BIOS - channel switch offload - 32 Rx AMPDU sessions in newer devices - MediaTek WiFi (mt76): - background radar detection - thermal management improvements on mt7915 - SAR support for more mt76 platforms - MBSSID and 6 GHz band on mt7915 - RealTek WiFi: - rtw89: AP mode - rtw89: 160 MHz channels and 6 GHz band - rtw89: hardware scan - Bluetooth: - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS) - Microchip CAN (mcp251xfd): - multiple RX-FIFOs and runtime configurable RX/TX rings - internal PLL, runtime PM handling simplification - improve chip detection and error handling after wakeup" * tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits) llc: fix netdevice reference leaks in llc_ui_bind() drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool ice: don't allow to run ice_send_event_to_aux() in atomic ctx ice: fix 'scheduling while atomic' on aux critical err interrupt net/sched: fix incorrect vlan_push_eth dest field net: bridge: mst: Restrict info size queries to bridge ports net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init() drivers: net: xgene: Fix regression in CRC stripping net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT net: dsa: fix missing host-filtered multicast addresses net/mlx5e: Fix build warning, detected write beyond size of field iwlwifi: mvm: Don't fail if PPAG isn't supported selftests/bpf: Fix kprobe_multi test. Revert "rethook: x86: Add rethook x86 implementation" Revert "arm64: rethook: Add arm64 rethook implementation" Revert "powerpc: Add rethook support" Revert "ARM: rethook: Add rethook arm implementation" netdevice: add missing dm_private kdoc net: bridge: mst: prevent NULL deref in br_mst_info_size() selftests: forwarding: Use same VRF for port and VLAN upper ...
2022-03-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmGravatar Linus Torvalds 1-4/+26
Pull kvm updates from Paolo Bonzini: "ARM: - Proper emulation of the OSLock feature of the debug architecture - Scalibility improvements for the MMU lock when dirty logging is on - New VMID allocator, which will eventually help with SVA in VMs - Better support for PMUs in heterogenous systems - PSCI 1.1 support, enabling support for SYSTEM_RESET2 - Implement CONFIG_DEBUG_LIST at EL2 - Make CONFIG_ARM64_ERRATUM_2077057 default y - Reduce the overhead of VM exit when no interrupt is pending - Remove traces of 32bit ARM host support from the documentation - Updated vgic selftests - Various cleanups, doc updates and spelling fixes RISC-V: - Prevent KVM_COMPAT from being selected - Optimize __kvm_riscv_switch_to() implementation - RISC-V SBI v0.3 support s390: - memop selftest - fix SCK locking - adapter interruptions virtualization for secure guests - add Claudio Imbrenda as maintainer - first step to do proper storage key checking x86: - Continue switching kvm_x86_ops to static_call(); introduce static_call_cond() and __static_call_ret0 when applicable. - Cleanup unused arguments in several functions - Synthesize AMD 0x80000021 leaf - Fixes and optimization for Hyper-V sparse-bank hypercalls - Implement Hyper-V's enlightened MSR bitmap for nested SVM - Remove MMU auditing - Eager splitting of page tables (new aka "TDP" MMU only) when dirty page tracking is enabled - Cleanup the implementation of the guest PGD cache - Preparation for the implementation of Intel IPI virtualization - Fix some segment descriptor checks in the emulator - Allow AMD AVIC support on systems with physical APIC ID above 255 - Better API to disable virtualization quirks - Fixes and optimizations for the zapping of page tables: - Zap roots in two passes, avoiding RCU read-side critical sections that last too long for very large guests backed by 4 KiB SPTEs. - Zap invalid and defunct roots asynchronously via concurrency-managed work queue. - Allowing yielding when zapping TDP MMU roots in response to the root's last reference being put. - Batch more TLB flushes with an RCU trick. Whoever frees the paging structure now holds RCU as a proxy for all vCPUs running in the guest, i.e. to prolongs the grace period on their behalf. It then kicks the the vCPUs out of guest mode before doing rcu_read_unlock(). Generic: - Introduce __vcalloc and use it for very large allocations that need memcg accounting" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits) KVM: use kvcalloc for array allocations KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2 kvm: x86: Require const tsc for RT KVM: x86: synthesize CPUID leaf 0x80000021h if useful KVM: x86: add support for CPUID leaf 0x80000021 KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask Revert "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()" kvm: x86/mmu: Flush TLB before zap_gfn_range releases RCU KVM: arm64: fix typos in comments KVM: arm64: Generalise VM features into a set of flags KVM: s390: selftests: Add error memop tests KVM: s390: selftests: Add more copy memop tests KVM: s390: selftests: Add named stages for memop test KVM: s390: selftests: Add macro as abstraction for MEM_OP KVM: s390: selftests: Split memop tests KVM: s390x: fix SCK locking RISC-V: KVM: Implement SBI HSM suspend call RISC-V: KVM: Add common kvm_riscv_vcpu_wfi() function RISC-V: Add SBI HSM suspend related defines RISC-V: KVM: Implement SBI v0.3 SRST extension ...
2022-03-23panic: move panic_print before kmsg dumpersGravatar Guilherme G. Piccoli 1-0/+4
The panic_print setting allows users to collect more information in a panic event, like memory stats, tasks, CPUs backtraces, etc. This is an interesting debug mechanism, but currently the print event happens *after* kmsg_dump(), meaning that pstore, for example, cannot collect a dmesg with the panic_print extra information. This patch changes that in 2 steps: (a) The panic_print setting allows to replay the existing kernel log buffer to the console (bit 5), besides the extra information dump. This functionality makes sense only at the end of the panic() function. So, we hereby allow to distinguish the two situations by a new boolean parameter in the function panic_print_sys_info(). (b) With the above change, we can safely call panic_print_sys_info() before kmsg_dump(), allowing to dump the extra information when using pstore or other kmsg dumpers. The additional messages from panic_print could overwrite the oldest messages when the buffer is full. The only reasonable solution is to use a large enough log buffer, hence we added an advice into the kernel parameters documentation about that. Link: https://lkml.kernel.org/r/20220214141308.841525-1-gpiccoli@igalia.com Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Acked-by: Baoquan He <bhe@redhat.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Feng Tang <feng.tang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23panic: add option to dump all CPUs backtraces in panic_printGravatar Guilherme G. Piccoli 2-0/+2
Currently the "panic_print" parameter/sysctl allows some interesting debug information to be printed during a panic event. This is useful for example in cases the user cannot kdump due to resource limits, or if the user collects panic logs in a serial output (or pstore) and prefers a fast reboot instead of a kdump. Happens that currently there's no way to see all CPUs backtraces in a panic using "panic_print" on architectures that support that. We do have "oops_all_cpu_backtrace" sysctl, but although partially overlapping in the functionality, they are orthogonal in nature: "panic_print" is a panic tuning (and we have panics without oopses, like direct calls to panic() or maybe other paths that don't go through oops_enter() function), and the original purpose of "oops_all_cpu_backtrace" is to provide more information on oopses for cases in which the users desire to continue running the kernel even after an oops, i.e., used in non-panic scenarios. So, we hereby introduce an additional bit for "panic_print" to allow dumping the CPUs backtraces during a panic event. Link: https://lkml.kernel.org/r/20211109202848.610874-3-gpiccoli@igalia.com Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Reviewed-by: Feng Tang <feng.tang@intel.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Samuel Iglesias Gonsalvez <siglesias@igalia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23docs: sysctl/kernel: add missing bit to panic_printGravatar Guilherme G. Piccoli 1-0/+1
Patch series "Some improvements on panic_print". This is a mix of a documentation fix with some additions to the "panic_print" syscall / parameter. The goal here is being able to collect all CPUs backtraces during a panic event and also to enable "panic_print" in a kdump event - details of the reasoning and design choices in the patches. This patch (of 3): Commit de6da1e8bcf0 ("panic: add an option to replay all the printk message in buffer") added a new bit to the sysctl/kernel parameter "panic_print", but the documentation was added only in kernel-parameters.txt, not in the sysctl guide. Fix it here by adding bit 5 to sysctl admin-guide documentation. [rdunlap@infradead.org: fix table format warning] Link: https://lkml.kernel.org/r/20220109055635.6999-1-rdunlap@infradead.org Link: https://lkml.kernel.org/r/20211109202848.610874-1-gpiccoli@igalia.com Link: https://lkml.kernel.org/r/20211109202848.610874-2-gpiccoli@igalia.com Fixes: de6da1e8bcf0 ("panic: add an option to replay all the printk message in buffer") Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Reviewed-by: Feng Tang <feng.tang@intel.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Samuel Iglesias Gonsalvez <siglesias@igalia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23docs: kdump: add scp example to write out the dump fileGravatar Tiezhu Yang 1-0/+4
Except cp and makedumpfile, add scp example to write out the dump file. Link: https://lkml.kernel.org/r/1644324666-15947-3-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Baoquan He <bhe@redhat.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Marco Elver <elver@google.com> Cc: Xuefeng Li <lixuefeng@loongson.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23docs: kdump: update description about sysfs file system supportGravatar Tiezhu Yang 1-3/+3
Patch series "Update doc and fix some issues about kdump", v2. This patch (of 5): After commit 6a108a14fa35 ("kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT"), "Configure standard kernel features (for small systems)" is not exist, we should use "Configure standard kernel features (expert users)" now. Link: https://lkml.kernel.org/r/1644324666-15947-1-git-send-email-yangtiezhu@loongson.cn Link: https://lkml.kernel.org/r/1644324666-15947-2-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Baoquan He <bhe@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Marco Elver <elver@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Xuefeng Li <lixuefeng@loongson.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23Merge tag 'media/v5.18-1' of ↵Gravatar Linus Torvalds 6-5/+18
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: - a major reorg at platform Kconfig/Makefile files, organizing them per vendor. The other media Kconfig/Makefile files also sorted - New sensor drivers: hi847, isl7998x, ov08d10 - New Amphion vpu decoder stateful driver - New Atmel microchip csi2dc driver - tegra-vde driver promoted from staging - atomisp: some fixes for it to work on BYT - imx7-mipi-csis driver promoted from staging and renamed - camss driver got initial support for VFE hardware version Titan 480 - mtk-vcodec has gained support for MT8192 - lots of driver changes, fixes and improvements * tag 'media/v5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (417 commits) media: nxp: Restrict VIDEO_IMX_MIPI_CSIS to ARCH_MXC or COMPILE_TEST media: amphion: cleanup media device if register it fail media: amphion: fix some issues to improve robust media: amphion: fix some error related with undefined reference to __divdi3 media: amphion: fix an issue that using pm_runtime_get_sync incorrectly media: vidtv: use vfree() for memory allocated with vzalloc() media: m5mols/m5mols.h: document new reset field media: pixfmt-yuv-planar.rst: fix PIX_FMT labels media: platform: Remove unnecessary print function dev_err() media: amphion: Add missing of_node_put() in vpu_core_parse_dt() media: mtk-vcodec: Add missing of_node_put() in mtk_vdec_hw_prob_done() media: platform: amphion: Fix build error without MAILBOX media: spi: Kconfig: Place SPI drivers on a single menu media: i2c: Kconfig: move camera drivers to the top media: atomisp: fix bad usage at error handling logic media: platform: rename mediatek/mtk-jpeg/ to mediatek/jpeg/ media: media/*/Kconfig: sort entries media: Kconfig: cleanup VIDEO_DEV dependencies media: platform/*/Kconfig: make manufacturer menus more uniform media: platform: Create vendor/{Makefile,Kconfig} files ...
2022-03-23Merge tag 'trace-v5.18' of ↵Gravatar Linus Torvalds 1-0/+8
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - New user_events interface. User space can register an event with the kernel describing the format of the event. Then it will receive a byte in a page mapping that it can check against. A privileged task can then enable that event like any other event, which will change the mapped byte to true, telling the user space application to start writing the event to the tracing buffer. - Add new "ftrace_boot_snapshot" kernel command line parameter. When set, the tracing buffer will be saved in the snapshot buffer at boot up when the kernel hands things over to user space. This will keep the traces that happened at boot up available even if user space boot up has tracing as well. - Have TRACE_EVENT_ENUM() also update trace event field type descriptions. Thus if a static array defines its size with an enum, the user space trace event parsers can still know how to parse that array. - Add new TRACE_CUSTOM_EVENT() macro. This acts the same as the TRACE_EVENT() macro, but will attach to an existing tracepoint. This will make one tracepoint be able to trace different content and not be stuck at only what the original TRACE_EVENT() macro exports. - Fixes to tracing error logging. - Better saving of cmdlines to PIDs when tracing (use the wakeup events for mapping). * tag 'trace-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (30 commits) tracing: Have type enum modifications copy the strings user_events: Add trace event call as root for low permission cases tracing/user_events: Use alloc_pages instead of kzalloc() for register pages tracing: Add snapshot at end of kernel boot up tracing: Have TRACE_DEFINE_ENUM affect trace event types as well tracing: Fix strncpy warning in trace_events_synth.c user_events: Prevent dyn_event delete racing with ioctl add/delete tracing: Add TRACE_CUSTOM_EVENT() macro tracing: Move the defines to create TRACE_EVENTS into their own files tracing: Add sample code for custom trace events tracing: Allow custom events to be added to the tracefs directory tracing: Fix last_cmd_set() string management in histogram code user_events: Fix potential uninitialized pointer while parsing field tracing: Fix allocation of last_cmd in last_cmd_set() user_events: Add documentation file user_events: Add sample code for typical usage user_events: Add self-test for validator boundaries user_events: Add self-test for perf_event integration user_events: Add self-test for dynamic_events integration user_events: Add self-test for ftrace integration ...
2022-03-23Merge tag 'printk-for-5.18' of ↵Gravatar Linus Torvalds 1-2/+7
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Make %pK behave the same as %p for kptr_restrict == 0 also with no_hash_pointers parameter - Ignore the default console in the device tree also when console=null or console="" is used on the command line - Document console=null and console="" behavior - Prevent a deadlock and a livelock caused by console_lock in panic() - Make console_lock available for panicking CPU - Fast query for the next to-be-used sequence number - Use the expected return values in printk.devkmsg __setup handler - Use the correct atomic operations in wake_up_klogd() irq_work handler - Avoid possible unaligned access when handling %4cc printing format * tag 'printk-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: fix return value of printk.devkmsg __setup handler vsprintf: Fix %pK with kptr_restrict == 0 printk: make suppress_panic_printk static printk: Set console_set_on_cmdline=1 when __add_preferred_console() is called with user_specified == true Docs: printk: add 'console=null|""' to admin/kernel-parameters printk: use atomic updates for klogd work printk: Drop console_sem during panic printk: Avoid livelock with heavy printk during panic printk: disable optimistic spin during panic printk: Add panic_in_progress helper vsprintf: Move space out of string literals in fourcc_string() vsprintf: Fix potential unaligned access printk: ringbuffer: Improve prb_next_seq() performance
2022-03-22Merge branch 'akpm' (patches from Andrew)Gravatar Linus Torvalds 6-32/+409
Merge updates from Andrew Morton: - A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs - Most the MM patches which precede the patches in Willy's tree: kasan, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb, userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp, cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap, zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (227 commits) mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release() Docs/ABI/testing: add DAMON sysfs interface ABI document Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface selftests/damon: add a test for DAMON sysfs interface mm/damon/sysfs: support DAMOS stats mm/damon/sysfs: support DAMOS watermarks mm/damon/sysfs: support schemes prioritization mm/damon/sysfs: support DAMOS quotas mm/damon/sysfs: support DAMON-based Operation Schemes mm/damon/sysfs: support the physical address space monitoring mm/damon/sysfs: link DAMON for virtual address spaces monitoring mm/damon: implement a minimal stub for sysfs-based DAMON interface mm/damon/core: add number of each enum type values mm/damon/core: allow non-exclusive DAMON start/stop Docs/damon: update outdated term 'regions update interval' Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling Docs/vm/damon: call low level monitoring primitives the operations mm/damon: remove unnecessary CONFIG_DAMON option mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}() mm/damon/dbgfs-test: fix is_target_id() change ...
2022-03-22Docs/admin-guide/mm/damon/usage: document DAMON sysfs interfaceGravatar SeongJae Park 1-6/+344
This commit adds detailed usage of DAMON sysfs interface in the admin-guide document for DAMON. Link: https://lkml.kernel.org/r/20220228081314.5770-13-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Xin Hao <xhao@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>