aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-04-17x86/retpolines: Enable the default thunk warning only on relevant configsGravatar Borislav Petkov (AMD) 1-0/+7
The using-default-thunk warning check makes sense only with configurations which actually enable the special return thunks. Otherwise, it fires on unrelated 32-bit configs on which the special return thunks won't even work (they're 64-bit only) and, what is more, those configs even go off into the weeds when booting in the alternatives patching code, leading to a dead machine. Fixes: 4461438a8405 ("x86/retpoline: Ensure default return thunk isn't used at runtime") Reported-by: Klara Modin <klarasmodin@gmail.com> Reported-by: Erhard Furtner <erhard_f@mailbox.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Klara Modin <klarasmodin@gmail.com> Link: https://lore.kernel.org/r/78e0d19c-b77a-4169-a80f-2eef91f4a1d6@gmail.com Link: https://lore.kernel.org/r/20240413024956.488d474e@yea
2024-04-14x86/bugs: Fix BHI retpoline checkGravatar Josh Poimboeuf 1-4/+7
Confusingly, X86_FEATURE_RETPOLINE doesn't mean retpolines are enabled, as it also includes the original "AMD retpoline" which isn't a retpoline at all. Also replace cpu_feature_enabled() with boot_cpu_has() because this is before alternatives are patched and cpu_feature_enabled()'s fallback path is slower than plain old boot_cpu_has(). Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/ad3807424a3953f0323c011a643405619f2a4927.1712944776.git.jpoimboe@kernel.org
2024-04-12x86/cpu/amd: Move TOPOEXT enablement into the topology parserGravatar Thomas Gleixner 2-15/+21
The topology rework missed that early_init_amd() tries to re-enable the Topology Extensions when the BIOS disabled them. The new parser is invoked before early_init_amd() so the re-enable attempt happens too late. Move it into the AMD specific topology parser code where it belongs. Fixes: f7fb3b2dd92c ("x86/cpu: Provide an AMD/HYGON specific topology parser") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/878r1j260l.ffs@tglx
2024-04-12x86/cpu/amd: Make the NODEID_MSR union actually workGravatar Thomas Gleixner 1-3/+3
A system with NODEID_MSR was reported to crash during early boot without any output. The reason is that the union which is used for accessing the bitfields in the MSR is written wrongly and the resulting executable code accesses the wrong part of the MSR data. As a consequence a later division by that value results in 0 and that result is used for another division as divisor, which obviously does not work well. The magic world of C, unions and bitfields: union { u64 bita : 3, bitb : 3; u64 all; } x; x.all = foo(); a = x.bita; b = x.bitb; results in the effective executable code of: a = b = x.bita; because bita and bitb are treated as union members and therefore both end up at bit offset 0. Wrapping the bitfield into an anonymous struct: union { struct { u64 bita : 3, bitb : 3; }; u64 all; } x; works like expected. Rework the NODEID_MSR union in exactly that way to cure the problem. Fixes: f7fb3b2dd92c ("x86/cpu: Provide an AMD/HYGON specific topology parser") Reported-by: "kernelci.org bot" <bot@kernelci.org> Reported-by: Laura Nao <laura.nao@collabora.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Laura Nao <laura.nao@collabora.com> Link: https://lore.kernel.org/r/20240410194311.596282919@linutronix.de Closes: https://lore.kernel.org/all/20240322175210.124416-1-laura.nao@collabora.com/
2024-04-12x86/cpu/amd: Make the CPUID 0x80000008 parser correctGravatar Thomas Gleixner 1-6/+18
CPUID 0x80000008 ECX.cpu_nthreads describes the number of threads in the package. The parser uses this value to initialize the SMT domain level. That's wrong because cpu_nthreads does not describe the number of threads per physical core. So this needs to set the CORE domain level and let the later parsers set the SMT shift if available. Preset the SMT domain level with the assumption of one thread per core, which is correct ifrt here are no other CPUID leafs to parse, and propagate cpu_nthreads and the core level APIC bitwidth into the CORE domain. Fixes: f7fb3b2dd92c ("x86/cpu: Provide an AMD/HYGON specific topology parser") Reported-by: "kernelci.org bot" <bot@kernelci.org> Reported-by: Laura Nao <laura.nao@collabora.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Laura Nao <laura.nao@collabora.com> Link: https://lore.kernel.org/r/20240410194311.535206450@linutronix.de
2024-04-12x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHIGravatar Josh Poimboeuf 2-15/+4
For consistency with the other CONFIG_MITIGATION_* options, replace the CONFIG_SPECTRE_BHI_{ON,OFF} options with a single CONFIG_MITIGATION_SPECTRE_BHI option. [ mingo: Fix ] Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/3833812ea63e7fdbe36bf8b932e63f70d18e2a2a.1712813475.git.jpoimboe@kernel.org
2024-04-12x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=autoGravatar Josh Poimboeuf 4-21/+1
Unlike most other mitigations' "auto" options, spectre_bhi=auto only mitigates newer systems, which is confusing and not particularly useful. Remove it. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/412e9dc87971b622bbbaf64740ebc1f140bff343.1712813475.git.jpoimboe@kernel.org
2024-04-11x86/bugs: Clarify that syscall hardening isn't a BHI mitigationGravatar Josh Poimboeuf 3-11/+9
While syscall hardening helps prevent some BHI attacks, there's still other low-hanging fruit remaining. Don't classify it as a mitigation and make it clear that the system may still be vulnerable if it doesn't have a HW or SW mitigation enabled. Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/b5951dae3fdee7f1520d5136a27be3bdfe95f88b.1712813475.git.jpoimboe@kernel.org
2024-04-11x86/bugs: Fix BHI handling of RRSBAGravatar Josh Poimboeuf 1-12/+18
The ARCH_CAP_RRSBA check isn't correct: RRSBA may have already been disabled by the Spectre v2 mitigation (or can otherwise be disabled by the BHI mitigation itself if needed). In that case retpolines are fine. Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/6f56f13da34a0834b69163467449be7f58f253dc.1712813475.git.jpoimboe@kernel.org
2024-04-11x86/bugs: Rename various 'ia32_cap' variables to 'x86_arch_cap_msr'Gravatar Ingo Molnar 3-42/+42
So we are using the 'ia32_cap' value in a number of places, which got its name from MSR_IA32_ARCH_CAPABILITIES MSR register. But there's very little 'IA32' about it - this isn't 32-bit only code, nor does it originate from there, it's just a historic quirk that many Intel MSR names are prefixed with IA32_. This is already clear from the helper method around the MSR: x86_read_arch_cap_msr(), which doesn't have the IA32 prefix. So rename 'ia32_cap' to 'x86_arch_cap_msr' to be consistent with its role and with the naming of the helper function. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nikolay Borisov <nik.borisov@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/9592a18a814368e75f8f4b9d74d3883aa4fd1eaf.1712813475.git.jpoimboe@kernel.org
2024-04-11x86/bugs: Cache the value of MSR_IA32_ARCH_CAPABILITIESGravatar Josh Poimboeuf 1-15/+7
There's no need to keep reading MSR_IA32_ARCH_CAPABILITIES over and over. It's even read in the BHI sysfs function which is a big no-no. Just read it once and cache it. Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/9592a18a814368e75f8f4b9d74d3883aa4fd1eaf.1712813475.git.jpoimboe@kernel.org
2024-04-11x86/bugs: Fix BHI documentationGravatar Josh Poimboeuf 2-12/+15
Fix up some inaccuracies in the BHI documentation. Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/8c84f7451bfe0dd08543c6082a383f390d4aa7e2.1712813475.git.jpoimboe@kernel.org
2024-04-10x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=nGravatar Sean Christopherson 1-1/+2
Initialize cpu_mitigations to CPU_MITIGATIONS_OFF if the kernel is built with CONFIG_SPECULATION_MITIGATIONS=n, as the help text quite clearly states that disabling SPECULATION_MITIGATIONS is supposed to turn off all mitigations by default. │ If you say N, all mitigations will be disabled. You really │ should know what you are doing to say so. As is, the kernel still defaults to CPU_MITIGATIONS_AUTO, which results in some mitigations being enabled in spite of SPECULATION_MITIGATIONS=n. Fixes: f43b9876e857 ("x86/retbleed: Add fine grained Kconfig knobs") Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Cc: stable@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20240409175108.1512861-2-seanjc@google.com
2024-04-10x86/topology: Don't update cpu_possible_map in topo_set_cpuids()Gravatar Thomas Gleixner 1-2/+5
topo_set_cpuids() updates cpu_present_map and cpu_possible map. It is invoked during enumeration and "physical hotplug" operations. In the latter case this results in a kernel crash because cpu_possible_map is marked read only after init completes. There is no reason to update cpu_possible_map in that function. During enumeration cpu_possible_map is not relevant and gets fully initialized after enumeration completed. On "physical hotplug" the bit is already set because the kernel allows only CPUs to be plugged which have been enumerated and associated to a CPU number during early boot. Remove the bogus update of cpu_possible_map. Fixes: 0e53e7b656cf ("x86/cpu/topology: Sanitize the APIC admission logic") Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/87ttkc6kwx.ffs@tglx
2024-04-10x86/bugs: Fix return type of spectre_bhi_state()Gravatar Daniel Sneddon 1-1/+1
The definition of spectre_bhi_state() incorrectly returns a const char * const. This causes the a compiler warning when building with W=1: warning: type qualifiers ignored on function return type [-Wignored-qualifiers] 2812 | static const char * const spectre_bhi_state(void) Remove the const qualifier from the pointer. Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Reported-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20240409230806.1545822-1-daniel.sneddon@linux.intel.com
2024-04-10Merge branch 'linus' into x86/urgent, to pick up dependent commitsGravatar Ingo Molnar 33-87/+463
Prepare to fix aspects of the new BHI code. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-04-09Merge tag 'drm-fixes-2024-04-09' of https://gitlab.freedesktop.org/drm/kernelGravatar Linus Torvalds 2-4/+9
Pull drm nouveau fix from Dave Airlie: "A previous fix to nouveau devinit on the GSP paths fixed the Turing but broke Ampere, I did some more digging and found the proper fix. Sending it early as I want to make sure it makes the next 6.8 stable kernels to fix the regression. Regular fixes will be at end of week as usual. nouveau: - regression fix for GSP display enable" * tag 'drm-fixes-2024-04-09' of https://gitlab.freedesktop.org/drm/kernel: nouveau: fix devinit paths to only handle display on GSP.
2024-04-09compiler.h: Add missing quote in macro commentGravatar Thorsten Blum 1-1/+1
Add a missing doublequote in the __is_constexpr() macro comment. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-04-09nouveau: fix devinit paths to only handle display on GSP.Gravatar Dave Airlie 2-4/+9
This reverts: nouveau/gsp: don't check devinit disable on GSP. and applies a further fix. It turns out the open gpu driver, checks this register, but only for display. Match that behaviour and in the turing path only disable the display block. (ampere already only does displays). Fixes: 5d4e8ae6e57b ("nouveau/gsp: don't check devinit disable on GSP.") Reviewed-by: Danilo Krummrich <dakr@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240408064243.2219527-1-airlied@gmail.com
2024-04-08Merge tag 'nativebhi' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipGravatar Linus Torvalds 19-49/+371
Pull x86 mitigations from Thomas Gleixner: "Mitigations for the native BHI hardware vulnerabilty: Branch History Injection (BHI) attacks may allow a malicious application to influence indirect branch prediction in kernel by poisoning the branch history. eIBRS isolates indirect branch targets in ring0. The BHB can still influence the choice of indirect branch predictor entry, and although branch predictor entries are isolated between modes when eIBRS is enabled, the BHB itself is not isolated between modes. Add mitigations against it either with the help of microcode or with software sequences for the affected CPUs" [ This also ends up enabling the full mitigation by default despite the system call hardening, because apparently there are other indirect calls that are still sufficiently reachable, and the 'auto' case just isn't hardened enough. We'll have some more inevitable tweaking in the future - Linus ] * tag 'nativebhi' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: KVM: x86: Add BHI_NO x86/bhi: Mitigate KVM by default x86/bhi: Add BHI mitigation knob x86/bhi: Enumerate Branch History Injection (BHI) bug x86/bhi: Define SPEC_CTRL_BHI_DIS_S x86/bhi: Add support for clearing branch history at syscall entry x86/syscall: Don't force use of indirect calls for system calls x86/bugs: Change commas to semicolons in 'spectre_v2' sysfs file
2024-04-08Merge tag 'for-6.9-rc2-tag' of ↵Gravatar Linus Torvalds 7-33/+55
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Several fixes to qgroups that have been recently identified by test generic/475: - fix prealloc reserve leak in subvolume operations - various other fixes in reservation setup, conversion or cleanup" * tag 'for-6.9-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: always clear PERTRANS metadata during commit btrfs: make btrfs_clear_delalloc_extent() free delalloc reserve btrfs: qgroup: convert PREALLOC to PERTRANS after record_root_in_trans btrfs: record delayed inode root in transaction btrfs: qgroup: fix qgroup prealloc rsv leak in subvolume operations btrfs: qgroup: correctly model root qgroup rsv in convert
2024-04-08KVM: x86: Add BHI_NOGravatar Daniel Sneddon 1-1/+1
Intel processors that aren't vulnerable to BHI will set MSR_IA32_ARCH_CAPABILITIES[BHI_NO] = 1;. Guests may use this BHI_NO bit to determine if they need to implement BHI mitigations or not. Allow this bit to be passed to the guests. Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-04-08x86/bhi: Mitigate KVM by defaultGravatar Pawan Gupta 6-6/+23
BHI mitigation mode spectre_bhi=auto does not deploy the software mitigation by default. In a cloud environment, it is a likely scenario where userspace is trusted but the guests are not trusted. Deploying system wide mitigation in such cases is not desirable. Update the auto mode to unconditionally mitigate against malicious guests. Deploy the software sequence at VMexit in auto mode also, when hardware mitigation is not available. Unlike the force =on mode, software sequence is not deployed at syscalls in auto mode. Suggested-by: Alexandre Chartre <alexandre.chartre@oracle.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-04-08x86/bhi: Add BHI mitigation knobGravatar Pawan Gupta 5-7/+165
Branch history clearing software sequences and hardware control BHI_DIS_S were defined to mitigate Branch History Injection (BHI). Add cmdline spectre_bhi={on|off|auto} to control BHI mitigation: auto - Deploy the hardware mitigation BHI_DIS_S, if available. on - Deploy the hardware mitigation BHI_DIS_S, if available, otherwise deploy the software sequence at syscall entry and VMexit. off - Turn off BHI mitigation. The default is auto mode which does not deploy the software sequence mitigation. This is because of the hardening done in the syscall dispatch path, which is the likely target of BHI. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-04-08x86/bhi: Enumerate Branch History Injection (BHI) bugGravatar Pawan Gupta 3-8/+21
Mitigation for BHI is selected based on the bug enumeration. Add bits needed to enumerate BHI bug. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-04-08x86/bhi: Define SPEC_CTRL_BHI_DIS_SGravatar Daniel Sneddon 4-2/+8
Newer processors supports a hardware control BHI_DIS_S to mitigate Branch History Injection (BHI). Setting BHI_DIS_S protects the kernel from userspace BHI attacks without having to manually overwrite the branch history. Define MSR_SPEC_CTRL bit BHI_DIS_S and its enumeration CPUID.BHI_CTRL. Mitigation is enabled later. Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-04-08x86/bhi: Add support for clearing branch history at syscall entryGravatar Pawan Gupta 7-3/+96
Branch History Injection (BHI) attacks may allow a malicious application to influence indirect branch prediction in kernel by poisoning the branch history. eIBRS isolates indirect branch targets in ring0. The BHB can still influence the choice of indirect branch predictor entry, and although branch predictor entries are isolated between modes when eIBRS is enabled, the BHB itself is not isolated between modes. Alder Lake and new processors supports a hardware control BHI_DIS_S to mitigate BHI. For older processors Intel has released a software sequence to clear the branch history on parts that don't support BHI_DIS_S. Add support to execute the software sequence at syscall entry and VMexit to overwrite the branch history. For now, branch history is not cleared at interrupt entry, as malicious applications are not believed to have sufficient control over the registers, since previous register state is cleared at interrupt entry. Researchers continue to poke at this area and it may become necessary to clear at interrupt entry as well in the future. This mitigation is only defined here. It is enabled later. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Co-developed-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-04-08x86/syscall: Don't force use of indirect calls for system callsGravatar Linus Torvalds 5-16/+50
Make <asm/syscall.h> build a switch statement instead, and the compiler can either decide to generate an indirect jump, or - more likely these days due to mitigations - just a series of conditional branches. Yes, the conditional branches also have branch prediction, but the branch prediction is much more controlled, in that it just causes speculatively running the wrong system call (harmless), rather than speculatively running possibly wrong random less controlled code gadgets. This doesn't mitigate other indirect calls, but the system call indirection is the first and most easily triggered case. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-04-08x86/bugs: Change commas to semicolons in 'spectre_v2' sysfs fileGravatar Josh Poimboeuf 1-12/+12
Change the format of the 'spectre_v2' vulnerabilities sysfs file slightly by converting the commas to semicolons, so that mitigations for future variants can be grouped together and separated by commas. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2024-04-08Merge tag 'fixes-2024-04-08' of ↵Gravatar Linus Torvalds 4-0/+27
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock fixes from Mike Rapoport: "Fix build errors in memblock tests: - add stubs to functions that calls to them were recently added to memblock but they were missing in tests - update gfp_types.h to include bits.h so that BIT() definitions won't depend on other includes" * tag 'fixes-2024-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: memblock tests: fix undefined reference to `BIT' memblock tests: fix undefined reference to `panic' memblock tests: fix undefined reference to `early_pfn_to_nid'
2024-04-08x86/apic: Force native_apic_mem_read() to use the MOV instructionGravatar Adam Dunlap 1-1/+2
When done from a virtual machine, instructions that touch APIC memory must be emulated. By convention, MMIO accesses are typically performed via io.h helpers such as readl() or writeq() to simplify instruction emulation/decoding (ex: in KVM hosts and SEV guests) [0]. Currently, native_apic_mem_read() does not follow this convention, allowing the compiler to emit instructions other than the MOV instruction generated by readl(). In particular, when the kernel is compiled with clang and run as a SEV-ES or SEV-SNP guest, the compiler would emit a TESTL instruction which is not supported by the SEV-ES emulator, causing a boot failure in that environment. It is likely the same problem would happen in a TDX guest as that uses the same instruction emulator as SEV-ES. To make sure all emulators can emulate APIC memory reads via MOV, use the readl() function in native_apic_mem_read(). It is expected that any emulator would support MOV in any addressing mode as it is the most generic and is what is usually emitted currently. The TESTL instruction is emitted when native_apic_mem_read() is inlined into apic_mem_wait_icr_idle(). The emulator comes from insn_decode_mmio() in arch/x86/lib/insn-eval.c. It's not worth it to extend insn_decode_mmio() to support more instructions since, in theory, the compiler could choose to output nearly any instruction for such reads which would bloat the emulator beyond reason. [0] https://lore.kernel.org/all/20220405232939.73860-12-kirill.shutemov@linux.intel.com/ [ bp: Massage commit message, fix typos. ] Signed-off-by: Adam Dunlap <acdunlap@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Kevin Loughlin <kevinloughlin@google.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20240318230927.2191933-1-acdunlap@google.com
2024-04-07Linux 6.9-rc3v6.9-rc3Gravatar Linus Torvalds 1-1/+1
2024-04-07Merge tag 'x86-urgent-2024-04-07' of ↵Gravatar Linus Torvalds 17-41/+166
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Fix MCE timer reinit locking - Fix/improve CoCo guest random entropy pool init - Fix SEV-SNP late disable bugs - Fix false positive objtool build warning - Fix header dependency bug - Fix resctrl CPU offlining bug * tag 'x86-urgent-2024-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/retpoline: Add NOENDBR annotation to the SRSO dummy return thunk x86/mce: Make sure to grab mce_sysfs_mutex in set_bank() x86/CPU/AMD: Track SNP host status with cc_platform_*() x86/cc: Add cc_platform_set/_clear() helpers x86/kvm/Kconfig: Have KVM_AMD_SEV select ARCH_HAS_CC_PLATFORM x86/coco: Require seeding RNG with RDRAND on CoCo systems x86/numa/32: Include missing <asm/pgtable_areas.h> x86/resctrl: Fix uninitialized memory read when last CPU of domain goes offline
2024-04-07Merge tag 'timers-urgent-2024-04-07' of ↵Gravatar Linus Torvalds 9-36/+121
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Ingo Molnar: "Fix various timer bugs: - Fix a timer migration bug that may result in missed events - Fix timer migration group hierarchy event updates - Fix a PowerPC64 build warning - Fix a handful of DocBook annotation bugs" * tag 'timers-urgent-2024-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers/migration: Return early on deactivation timers/migration: Fix ignored event due to missing CPU update vdso: Use CONFIG_PAGE_SHIFT in vdso/datapage.h timers: Fix text inconsistencies and spelling tick/sched: Fix struct tick_sched doc warnings tick/sched: Fix various kernel-doc warnings timers: Fix kernel-doc format and add Return values time/timekeeping: Fix kernel-doc warnings and typos time/timecounter: Fix inline documentation
2024-04-07Merge tag 'perf-urgent-2024-04-07' of ↵Gravatar Linus Torvalds 1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf fix from Ingo Molnar: "Fix a combined PEBS events bug on x86 Intel CPUs" * tag 'perf-urgent-2024-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/ds: Don't clear ->pebs_data_cfg for the last PEBS event
2024-04-06Merge tag 'nfsd-6.9-2' of ↵Gravatar Linus Torvalds 2-14/+3
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Address a slow memory leak with RPC-over-TCP - Prevent another NFS4ERR_DELAY loop during CREATE_SESSION * tag 'nfsd-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: hold a lighter-weight client reference over CB_RECALL_ANY SUNRPC: Fix a slow server-side memory leak with RPC-over-TCP
2024-04-06Merge tag 'i2c-for-6.9-rc3' of ↵Gravatar Linus Torvalds 1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fix from Wolfram Sang: "A host driver build fix" * tag 'i2c-for-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: pxa: hide unused icr_bits[] variable
2024-04-06Merge tag 'xfs-6.9-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxGravatar Linus Torvalds 1-2/+13
Pull xfs fix from Chandan Babu: - Allow creating new links to special files which were not associated with a project quota * tag 'xfs-6.9-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: allow cross-linking special files without project quota
2024-04-06Merge tag '6.9-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Gravatar Linus Torvalds 22-178/+369
Pull smb client fixes from Steve French: - fix to retry close to avoid potential handle leaks when server returns EBUSY - DFS fixes including a fix for potential use after free - fscache fix - minor strncpy cleanup - reconnect race fix - deal with various possible UAF race conditions tearing sessions down * tag '6.9-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: fix potential UAF in cifs_signal_cifsd_for_reconnect() smb: client: fix potential UAF in smb2_is_network_name_deleted() smb: client: fix potential UAF in is_valid_oplock_break() smb: client: fix potential UAF in smb2_is_valid_oplock_break() smb: client: fix potential UAF in smb2_is_valid_lease_break() smb: client: fix potential UAF in cifs_stats_proc_show() smb: client: fix potential UAF in cifs_stats_proc_write() smb: client: fix potential UAF in cifs_dump_full_key() smb: client: fix potential UAF in cifs_debug_files_proc_show() smb3: retrying on failed server close smb: client: serialise cifs_construct_tcon() with cifs_mount_mutex smb: client: handle DFS tcons in cifs_construct_tcon() smb: client: refresh referral without acquiring refpath_lock smb: client: guarantee refcounted children from parent session cifs: Fix caching to try to do open O_WRONLY as rdwr on server smb: client: fix UAF in smb2_reconnect_server() smb: client: replace deprecated strncpy with strscpy
2024-04-06x86/retpoline: Add NOENDBR annotation to the SRSO dummy return thunkGravatar Borislav Petkov (AMD) 1-0/+1
srso_alias_untrain_ret() is special code, even if it is a dummy which is called in the !SRSO case, so annotate it like its real counterpart, to address the following objtool splat: vmlinux.o: warning: objtool: .export_symbol+0x2b290: data relocation to !ENDBR: srso_alias_untrain_ret+0x0 Fixes: 4535e1a4174c ("x86/bugs: Fix the SRSO mitigation on Zen3/4") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20240405144637.17908-1-bp@kernel.org
2024-04-06Merge branch 'linus' into x86/urgent, to pick up dependent commitGravatar Ingo Molnar 397-2689/+6406
We want to fix: 0e110732473e ("x86/retpoline: Do the necessary fixup to the Zen3/4 srso return thunk for !SRSO") So merge in Linus's latest into x86/urgent to have it available. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-04-06Merge tag 'i2c-host-fixes-6.9-rc3' of ↵Gravatar Wolfram Sang 1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current An unused const variable kind of error has been fixed by placing the definition of icr_bits[] inside the ifdef block where it is used.
2024-04-05Merge tag 'firewire-fixes-6.9-rc2' of ↵Gravatar Linus Torvalds 1-1/+5
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394 Pull firewire fixes from Takashi Sakamoto: "The firewire-ohci kernel module has a parameter for verbose kernel logging. It is well-known that it logs the spurious IRQ for bus-reset event due to the unmasked register for IRQ event. This update fixes the issue" * tag 'firewire-fixes-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394: firewire: ohci: mask bus reset interrupts between ISR and bottom half
2024-04-06firewire: ohci: mask bus reset interrupts between ISR and bottom halfGravatar Adam Goldman 1-1/+5
In the FireWire OHCI interrupt handler, if a bus reset interrupt has occurred, mask bus reset interrupts until bus_reset_work has serviced and cleared the interrupt. Normally, we always leave bus reset interrupts masked. We infer the bus reset from the self-ID interrupt that happens shortly thereafter. A scenario where we unmask bus reset interrupts was introduced in 2008 in a007bb857e0b26f5d8b73c2ff90782d9c0972620: If OHCI_PARAM_DEBUG_BUSRESETS (8) is set in the debug parameter bitmask, we will unmask bus reset interrupts so we can log them. irq_handler logs the bus reset interrupt. However, we can't clear the bus reset event flag in irq_handler, because we won't service the event until later. irq_handler exits with the event flag still set. If the corresponding interrupt is still unmasked, the first bus reset will usually freeze the system due to irq_handler being called again each time it exits. This freeze can be reproduced by loading firewire_ohci with "modprobe firewire_ohci debug=-1" (to enable all debugging output). Apparently there are also some cases where bus_reset_work will get called soon enough to clear the event, and operation will continue normally. This freeze was first reported a few months after a007bb85 was committed, but until now it was never fixed. The debug level could safely be set to -1 through sysfs after the module was loaded, but this would be ineffectual in logging bus reset interrupts since they were only unmasked during initialization. irq_handler will now leave the event flag set but mask bus reset interrupts, so irq_handler won't be called again and there will be no freeze. If OHCI_PARAM_DEBUG_BUSRESETS is enabled, bus_reset_work will unmask the interrupt after servicing the event, so future interrupts will be caught as desired. As a side effect to this change, OHCI_PARAM_DEBUG_BUSRESETS can now be enabled through sysfs in addition to during initial module loading. However, when enabled through sysfs, logging of bus reset interrupts will be effective only starting with the second bus reset, after bus_reset_work has executed. Signed-off-by: Adam Goldman <adamg@pobox.com> Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-04-05Merge tag 'spi-fix-v6.9-rc2' of ↵Gravatar Linus Torvalds 3-11/+10
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A few small driver specific fixes, the most important being the s3c64xx change which is likely to be hit during normal operation" * tag 'spi-fix-v6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: mchp-pci1xxx: Fix a possible null pointer dereference in pci1xxx_spi_probe spi: spi-fsl-lpspi: remove redundant spi_controller_put call spi: s3c64xx: Use DMA mode from fifo size
2024-04-05Merge tag 'regulator-fix-v6.9-rc2' of ↵Gravatar Linus Torvalds 1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fix from Mark Brown: "One simple regualtor fix, fixing module autoloading on tps65132" * tag 'regulator-fix-v6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: tps65132: Add of_match table
2024-04-05Merge tag 'regmap-fix-v6.9-rc2' of ↵Gravatar Linus Torvalds 1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap fixes from Mark Brown: "Richard found a nasty corner case in the maple tree code which he fixed, and also fixed a compiler warning which was showing up with the toolchain he uses and helpfully identified a possible incorrect error code which could have runtime impacts" * tag 'regmap-fix-v6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: maple: Fix uninitialized symbol 'ret' warnings regmap: maple: Fix cache corruption in regcache_maple_drop()
2024-04-05Merge tag 'block-6.9-20240405' of git://git.kernel.dk/linuxGravatar Linus Torvalds 9-37/+133
Pull block fixes from Jens Axboe: - NVMe pull request via Keith: - Atomic queue limits fixes (Christoph) - Fabrics fixes (Hannes, Daniel) - Discard overflow fix (Li) - Cleanup fix for null_blk (Damien) * tag 'block-6.9-20240405' of git://git.kernel.dk/linux: nvme-fc: rename free_ctrl callback to match name pattern nvmet-fc: move RCU read lock to nvmet_fc_assoc_exists nvmet: implement unique discovery NQN nvme: don't create a multipath node for zero capacity devices nvme: split nvme_update_zone_info nvme-multipath: don't inherit LBA-related fields for the multipath node block: fix overflow in blk_ioctl_discard() nullblk: Fix cleanup order in null_add_dev() error path
2024-04-05Merge tag 'io_uring-6.9-20240405' of git://git.kernel.dk/linuxGravatar Linus Torvalds 5-94/+73
Pull io_uring fixes from Jens Axboe: - Backport of some fixes that came up during development of the 6.10 io_uring patches. This includes some kbuf cleanups and reference fixes. - Disable multishot read if we don't have NOWAIT support on the target - Fix for a dependency issue with workqueue flushing * tag 'io_uring-6.9-20240405' of git://git.kernel.dk/linux: io_uring/kbuf: hold io_buffer_list reference over mmap io_uring/kbuf: protect io_buffer_list teardown with a reference io_uring/kbuf: get rid of bl->is_ready io_uring/kbuf: get rid of lower BGID lists io_uring: use private workqueue for exit work io_uring: disable io-wq execution of multishot NOWAIT requests io_uring/rw: don't allow multishot reads without NOWAIT support
2024-04-05Merge tag 'scsi-fixes' of ↵Gravatar Linus Torvalds 5-26/+31
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "The most important is the libsas fix, which is a problem for DMA to a kmalloc'd structure too small causing cache line interference. The other fixes (all in drivers) are mostly for allocation length fixes, error leg unwinding, suspend races and a missing retry" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: core: Fix MCQ mode dev command timeout scsi: libsas: Align SMP request allocation to ARCH_DMA_MINALIGN scsi: sd: Unregister device if device_add_disk() failed in sd_probe() scsi: ufs: core: WLUN suspend dev/link state error recovery scsi: mylex: Fix sysfs buffer lengths