aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-04-02KVM: Don't actually set a request when evicting vCPUs for GFN cache invdGravatar Sean Christopherson 5-16/+55
Don't actually set a request bit in vcpu->requests when making a request purely to force a vCPU to exit the guest. Logging a request but not actually consuming it would cause the vCPU to get stuck in an infinite loop during KVM_RUN because KVM would see the pending request and bail from VM-Enter to service the request. Note, it's currently impossible for KVM to set KVM_REQ_GPC_INVALIDATE as nothing in KVM is wired up to set guest_uses_pa=true. But, it'd be all too easy for arch code to introduce use of kvm_gfn_to_pfn_cache_init() without implementing handling of the request, especially since getting test coverage of MMU notifier interaction with specific KVM features usually requires a directed test. Opportunistically rename gfn_to_pfn_cache_invalidate_start()'s wake_vcpus to evict_vcpus. The purpose of the request is to get vCPUs out of guest mode, it's supposed to _avoid_ waking vCPUs that are blocking. Opportunistically rename KVM_REQ_GPC_INVALIDATE to be more specific as to what it wants to accomplish, and to genericize the name so that it can used for similar but unrelated scenarios, should they arise in the future. Add a comment and documentation to explain why the "no action" request exists. Add compile-time assertions to help detect improper usage. Use the inner assertless helper in the one s390 path that makes requests without a hardcoded request. Cc: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220223165302.3205276-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02KVM: avoid double put_page with gfn-to-pfn cacheGravatar David Woodhouse 1-0/+1
If the cache's user host virtual address becomes invalid, there is still a path from kvm_gfn_to_pfn_cache_refresh() where __release_gpc() could release the pfn but the gpc->pfn field has not been overwritten with an error value. If this happens, kvm_gfn_to_pfn_cache_unmap will call put_page again on the same page. Cc: stable@vger.kernel.org Fixes: 982ed0de4753 ("KVM: Reinstate gfn_to_pfn_cache with invalidation support") Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02KVM: x86/mmu: Zap only TDP MMU leafs in zap range and mmu_notifier unmapGravatar Sean Christopherson 3-48/+19
Re-introduce zapping only leaf SPTEs in kvm_zap_gfn_range() and kvm_tdp_mmu_unmap_gfn_range(), this time without losing a pending TLB flush when processing multiple roots (including nested TDP shadow roots). Dropping the TLB flush resulted in random crashes when running Hyper-V Server 2019 in a guest with KSM enabled in the host (or any source of mmu_notifier invalidations, KSM is just the easiest to force). This effectively revert commits 873dd122172f8cce329113cfb0dfe3d2344d80c0 and fcb93eb6d09dd302cbef22bd95a5858af75e4156, and thus restores commit cf3e26427c08ad9015956293ab389004ac6a338e, plus this delta on top: bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, gfn_t end, struct kvm_mmu_page *root; for_each_tdp_mmu_root_yield_safe(kvm, root, as_id) - flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, false); + flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, flush); return flush; } Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220325230348.2587437-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02KVM: SVM: fix panic on out-of-bounds guest IRQGravatar Yi Wang 1-2/+8
As guest_irq is coming from KVM_IRQFD API call, it may trigger crash in svm_update_pi_irte() due to out-of-bounds: crash> bt PID: 22218 TASK: ffff951a6ad74980 CPU: 73 COMMAND: "vcpu8" #0 [ffffb1ba6707fa40] machine_kexec at ffffffff8565b397 #1 [ffffb1ba6707fa90] __crash_kexec at ffffffff85788a6d #2 [ffffb1ba6707fb58] crash_kexec at ffffffff8578995d #3 [ffffb1ba6707fb70] oops_end at ffffffff85623c0d #4 [ffffb1ba6707fb90] no_context at ffffffff856692c9 #5 [ffffb1ba6707fbf8] exc_page_fault at ffffffff85f95b51 #6 [ffffb1ba6707fc50] asm_exc_page_fault at ffffffff86000ace [exception RIP: svm_update_pi_irte+227] RIP: ffffffffc0761b53 RSP: ffffb1ba6707fd08 RFLAGS: 00010086 RAX: ffffb1ba6707fd78 RBX: ffffb1ba66d91000 RCX: 0000000000000001 RDX: 00003c803f63f1c0 RSI: 000000000000019a RDI: ffffb1ba66db2ab8 RBP: 000000000000019a R8: 0000000000000040 R9: ffff94ca41b82200 R10: ffffffffffffffcf R11: 0000000000000001 R12: 0000000000000001 R13: 0000000000000001 R14: ffffffffffffffcf R15: 000000000000005f ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffffb1ba6707fdb8] kvm_irq_routing_update at ffffffffc09f19a1 [kvm] #8 [ffffb1ba6707fde0] kvm_set_irq_routing at ffffffffc09f2133 [kvm] #9 [ffffb1ba6707fe18] kvm_vm_ioctl at ffffffffc09ef544 [kvm] RIP: 00007f143c36488b RSP: 00007f143a4e04b8 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 00007f05780041d0 RCX: 00007f143c36488b RDX: 00007f05780041d0 RSI: 000000004008ae6a RDI: 0000000000000020 RBP: 00000000000004e8 R8: 0000000000000008 R9: 00007f05780041e0 R10: 00007f0578004560 R11: 0000000000000246 R12: 00000000000004e0 R13: 000000000000001a R14: 00007f1424001c60 R15: 00007f0578003bc0 ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b Vmx have been fix this in commit 3a8b0677fc61 (KVM: VMX: Do not BUG() on out-of-bounds guest IRQ), so we can just copy source from that to fix this. Co-developed-by: Yi Liu <liu.yi24@zte.com.cn> Signed-off-by: Yi Liu <liu.yi24@zte.com.cn> Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Message-Id: <20220309113025.44469-1-wang.yi59@zte.com.cn> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02KVM: MMU: propagate alloc_workqueue failureGravatar Paolo Bonzini 5-17/+32
If kvm->arch.tdp_mmu_zap_wq cannot be created, the failure has to be propagated up to kvm_mmu_init_vm and kvm_arch_init_vm. kvm_arch_init_vm also has to undo all the initialization, so group all the MMU initialization code at the beginning and handle cleaning up of kvm_page_track_init. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-29KVM: x86: Forbid VMM to set SYNIC/STIMER MSRs when SynIC wasn't activatedGravatar Vitaly Kuznetsov 1-3/+6
Setting non-zero values to SYNIC/STIMER MSRs activates certain features, this should not happen when KVM_CAP_HYPERV_SYNIC{,2} was not activated. Note, it would've been better to forbid writing anything to SYNIC/STIMER MSRs, including zeroes, however, at least QEMU tries clearing HV_X64_MSR_STIMER0_CONFIG without SynIC. HV_X64_MSR_EOM MSR is somewhat 'special' as writing zero there triggers an action, this also should not happen when SynIC wasn't activated. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220325132140.25650-4-vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-29KVM: x86: Avoid theoretical NULL pointer dereference in ↵Gravatar Vitaly Kuznetsov 1-0/+4
kvm_irq_delivery_to_apic_fast() When kvm_irq_delivery_to_apic_fast() is called with APIC_DEST_SELF shorthand, 'src' must not be NULL. Crash the VM with KVM_BUG_ON() instead of crashing the host. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220325132140.25650-3-vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-29KVM: x86: Check lapic_in_kernel() before attempting to set a SynIC irqGravatar Vitaly Kuznetsov 1-0/+3
When KVM_CAP_HYPERV_SYNIC{,2} is activated, KVM already checks for irqchip_in_kernel() so normally SynIC irqs should never be set. It is, however, possible for a misbehaving VMM to write to SYNIC/STIMER MSRs causing erroneous behavior. The immediate issue being fixed is that kvm_irq_delivery_to_apic() (kvm_irq_delivery_to_apic_fast()) crashes when called with 'irq.shorthand = APIC_DEST_SELF' and 'src == NULL'. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220325132140.25650-2-vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-29Documentation: KVM: add API issues sectionGravatar Paolo Bonzini 1-0/+46
Add a section to document all the different ways in which the KVM API sucks. I am sure there are way more, give people a place to vent so that userspace authors are aware. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220322110712.222449-4-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-29Documentation: KVM: add virtual CPU errata documentationGravatar Paolo Bonzini 2-0/+40
Add a file to document all the different ways in which the virtual CPU emulation is imperfect. Include an example to show how to document such errata. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Message-Id: <20220322110712.222449-3-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-29Documentation: KVM: add separate directories for architecture-specific ↵Gravatar Paolo Bonzini 15-19/+37
documentation ARM already has an arm/ subdirectory, but s390 and x86 do not even though they have a relatively large number of files specific to them. Create new directories in Documentation/virt/kvm for these two architectures as well. While at it, group the API documentation and the developer documentation in the table of contents. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220322110712.222449-2-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-29Documentation: kvm: include new locksGravatar Paolo Bonzini 1-0/+15
kvm->mn_invalidate_lock and kvm->slots_arch_lock were not included in the documentation, add them. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220322110720.222499-3-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-29Documentation: kvm: fixes for locking.rstGravatar Paolo Bonzini 1-9/+19
Separate the various locks clearly, and include the new names of blocked_vcpu_on_cpu_lock and blocked_vcpu_on_cpu. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220322110720.222499-2-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-29KVM: x86: Fix clang -Wimplicit-fallthrough in do_host_cpuid()Gravatar Nathan Chancellor 1-0/+1
Clang warns: arch/x86/kvm/cpuid.c:739:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough] default: ^ arch/x86/kvm/cpuid.c:739:2: note: insert 'break;' to avoid fall-through default: ^ break; 1 error generated. Clang is a little more pedantic than GCC, which does not warn when falling through to a case that is just break or return. Clang's version is more in line with the kernel's own stance in deprecated.rst, which states that all switch/case blocks must end in either break, fallthrough, continue, goto, or return. Add the missing break to silence the warning. Fixes: f144c49e8c39 ("KVM: x86: synthesize CPUID leaf 0x80000021h if useful") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Message-Id: <20220322152906.112164-1-nathan@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-29Revert "KVM: set owner of cpu and vm file operations"Gravatar David Matlack 1-4/+2
This reverts commit 3d3aab1b973b01bd2a1aa46307e94a1380b1d802. Now that the KVM module's lifetime is tied to kvm.users_count, there is no need to also tie it's lifetime to the lifetime of the VM and vCPU file descriptors. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220303183328.1499189-3-dmatlack@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-29KVM: Prevent module exit until all VMs are freedGravatar David Matlack 1-0/+13
Tie the lifetime the KVM module to the lifetime of each VM via kvm.users_count. This way anything that grabs a reference to the VM via kvm_get_kvm() cannot accidentally outlive the KVM module. Prior to this commit, the lifetime of the KVM module was tied to the lifetime of /dev/kvm file descriptors, VM file descriptors, and vCPU file descriptors by their respective file_operations "owner" field. This approach is insufficient because references grabbed via kvm_get_kvm() do not prevent closing any of the aforementioned file descriptors. This fixes a long standing theoretical bug in KVM that at least affects async page faults. kvm_setup_async_pf() grabs a reference via kvm_get_kvm(), and drops it in an asynchronous work callback. Nothing prevents the VM file descriptor from being closed and the KVM module from being unloaded before this callback runs. Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped out") Fixes: 3d3aab1b973b ("KVM: set owner of cpu and vm file operations") Cc: stable@vger.kernel.org Suggested-by: Ben Gardon <bgardon@google.com> [ Based on a patch from Ben implemented for Google's kernel. ] Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220303183328.1499189-2-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-21KVM: use kvcalloc for array allocationsGravatar Paolo Bonzini 1-3/+2
Instead of using array_size, use a function that takes care of the multiplication. While at it, switch to kvcalloc since this allocation should not be very large. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-21KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2Gravatar Oliver Upton 4-0/+66
KVM_CAP_DISABLE_QUIRKS is irrevocably broken. The capability does not advertise the set of quirks which may be disabled to userspace, so it is impossible to predict the behavior of KVM. Worse yet, KVM_CAP_DISABLE_QUIRKS will tolerate any value for cap->args[0], meaning it fails to reject attempts to set invalid quirk bits. The only valid workaround for the quirky quirks API is to add a new CAP. Actually advertise the set of quirks that can be disabled to userspace so it can predict KVM's behavior. Reject values for cap->args[0] that contain invalid bits. Finally, add documentation for the new capability and describe the existing quirks. Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20220301060351.442881-5-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-21kvm: x86: Require const tsc for RTGravatar Thomas Gleixner 1-0/+6
Non constant TSC is a nightmare on bare metal already, but with virtualization it becomes a complete disaster because the workarounds are horrible latency wise. That's also a preliminary for running RT in a guest on top of a RT host. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Message-Id: <Yh5eJSG19S2sjZfy@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-21KVM: x86: synthesize CPUID leaf 0x80000021h if usefulGravatar Paolo Bonzini 1-0/+25
Guests X86_BUG_NULL_SEG if and only if the host has them. Use the info from static_cpu_has_bug to form the 0x80000021 CPUID leaf that was defined for Zen3. Userspace can then set the bit even on older CPUs that do not have the bug, such as Zen2. Do the same for X86_FEATURE_LFENCE_RDTSC as well, since various processors have had very different ways of detecting it and not all of them are available to userspace. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-21KVM: x86: add support for CPUID leaf 0x80000021Gravatar Paolo Bonzini 1-1/+18
CPUID leaf 0x80000021 defines some features (or lack of bugs) of AMD processors. Expose the ones that make sense via KVM_GET_SUPPORTED_CPUID. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-21KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_maskGravatar Maxim Levitsky 2-1/+7
KVM_X86_OP_OPTIONAL_RET0 can only be used with 32-bit return values on 32-bit systems, because unsigned long is only 32-bits wide there and 64-bit values are returned in edx:eax. Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-21Revert "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()"Gravatar Paolo Bonzini 3-14/+39
This reverts commit cf3e26427c08ad9015956293ab389004ac6a338e. Multi-vCPU Hyper-V guests started crashing randomly on boot with the latest kvm/queue and the problem can be bisected the problem to this particular patch. Basically, I'm not able to boot e.g. 16-vCPU guest successfully anymore. Both Intel and AMD seem to be affected. Reverting the commit saves the day. Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-21kvm: x86/mmu: Flush TLB before zap_gfn_range releases RCUGravatar Paolo Bonzini 1-5/+9
Since "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()" is going to be reverted, it's not going to be true anymore that the zap-page flow does not free any 'struct kvm_mmu_page'. Introduce an early flush before tdp_mmu_zap_leafs() returns, to preserve bisectability. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-18Merge tag 'kvmarm-5.18' of ↵Gravatar Paolo Bonzini 39-311/+865
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 5.18 - Proper emulation of the OSLock feature of the debug architecture - Scalibility improvements for the MMU lock when dirty logging is on - New VMID allocator, which will eventually help with SVA in VMs - Better support for PMUs in heterogenous systems - PSCI 1.1 support, enabling support for SYSTEM_RESET2 - Implement CONFIG_DEBUG_LIST at EL2 - Make CONFIG_ARM64_ERRATUM_2077057 default y - Reduce the overhead of VM exit when no interrupt is pending - Remove traces of 32bit ARM host support from the documentation - Updated vgic selftests - Various cleanups, doc updates and spelling fixes
2022-03-18KVM: arm64: fix typos in commentsGravatar Julia Lawall 6-7/+7
Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220318103729.157574-24-Julia.Lawall@inria.fr
2022-03-18KVM: arm64: Generalise VM features into a set of flagsGravatar Marc Zyngier 4-12/+17
We currently deal with a set of booleans for VM features, while they could be better represented as set of flags contained in an unsigned long, similarily to what we are doing on the CPU side. Signed-off-by: Marc Zyngier <maz@kernel.org> [Oliver: Flag-ify the 'ran_once' boolean] Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220311174001.605719-2-oupton@google.com
2022-03-15Merge tag 'kvm-riscv-5.18-1' of https://github.com/kvm-riscv/linux into HEADGravatar Paolo Bonzini 11-59/+161
KVM/riscv changes for 5.18 - Prevent KVM_COMPAT from being selected - Refine __kvm_riscv_switch_to() implementation - RISC-V SBI v0.3 support
2022-03-15Merge tag 'kvm-s390-next-5.18-2' of ↵Gravatar Paolo Bonzini 6-133/+717
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fix, test and feature for 5.18 part 2 - memop selftest - fix SCK locking - adapter interruptions virtualization for secure guests
2022-03-14KVM: s390: selftests: Add error memop testsGravatar Janis Schoetterl-Glausch 1-13/+124
Test that errors occur if key protection disallows access, including tests for storage and fetch protection override. Perform tests for both logical vcpu and absolute vm ioctls. Also extend the existing tests to the vm ioctl. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Link: https://lore.kernel.org/r/20220308125841.3271721-6-scgl@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-03-14KVM: s390: selftests: Add more copy memop testsGravatar Janis Schoetterl-Glausch 1-13/+230
Do not just test the actual copy, but also that success is indicated when using the check only flag. Add copy test with storage key checking enabled, including tests for storage and fetch protection override. These test cover both logical vcpu ioctls as well as absolute vm ioctls. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Link: https://lore.kernel.org/r/20220308125841.3271721-5-scgl@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-03-14KVM: s390: selftests: Add named stages for memop testGravatar Janis Schoetterl-Glausch 1-11/+33
The stages synchronize guest and host execution. This helps the reader and constraits the execution of the test -- if the observed staging differs from the expected the test fails. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Link: https://lore.kernel.org/r/20220308125841.3271721-4-scgl@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-03-14KVM: s390: selftests: Add macro as abstraction for MEM_OPGravatar Janis Schoetterl-Glausch 1-75/+197
In order to achieve good test coverage we need to be able to invoke the MEM_OP ioctl with all possible parametrizations. However, for a given test, we want to be concise and not specify a long list of default values for parameters not relevant for the test, so the readers attention is not needlessly diverted. Add a macro that enables this and convert the existing test to use it. The macro emulates named arguments and hides some of the ioctl's redundancy, e.g. sets the key flag if an access key is specified. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Link: https://lore.kernel.org/r/20220308125841.3271721-3-scgl@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-03-14KVM: s390: selftests: Split memop testsGravatar Janis Schoetterl-Glausch 1-55/+82
Split success case/copy test from error test, making them independent. This means they do not share state and are easier to understand. Also, new test can be added in the same manner without affecting the old ones. In order to make that simpler, introduce functionality for the setup of commonly used variables. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Link: https://lore.kernel.org/r/20220308125841.3271721-2-scgl@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-03-14KVM: s390x: fix SCK lockingGravatar Claudio Imbrenda 3-6/+32
When handling the SCK instruction, the kvm lock is taken, even though the vcpu lock is already being held. The normal locking order is kvm lock first and then vcpu lock. This is can (and in some circumstances does) lead to deadlocks. The function kvm_s390_set_tod_clock is called both by the SCK handler and by some IOCTLs to set the clock. The IOCTLs will not hold the vcpu lock, so they can safely take the kvm lock. The SCK handler holds the vcpu lock, but will also somehow need to acquire the kvm lock without relinquishing the vcpu lock. The solution is to factor out the code to set the clock, and provide two wrappers. One is called like the original function and does the locking, the other is called kvm_s390_try_set_tod_clock and uses trylock to try to acquire the kvm lock. This new wrapper is then used in the SCK handler. If locking fails, -EAGAIN is returned, which is eventually propagated to userspace, thus also freeing the vcpu lock and allowing for forward progress. This is not the most efficient or elegant way to solve this issue, but the SCK instruction is deprecated and its performance is not critical. The goal of this patch is just to provide a simple but correct way to fix the bug. Fixes: 6a3f95a6b04c ("KVM: s390: Intercept SCK instruction") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Link: https://lore.kernel.org/r/20220301143340.111129-1-imbrenda@linux.ibm.com Cc: stable@vger.kernel.org Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-03-11RISC-V: KVM: Implement SBI HSM suspend callGravatar Anup Patel 1-0/+14
The SBI v0.3 specification extends SBI HSM extension by adding SBI HSM suspend call and related HART states. This patch extends the KVM RISC-V HSM implementation to provide KVM guest a minimal SBI HSM suspend call which is equivalent to a WFI instruction. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-03-11RISC-V: KVM: Add common kvm_riscv_vcpu_wfi() functionGravatar Anup Patel 2-6/+17
The wait for interrupt (WFI) instruction emulation can share the VCPU halt logic with SBI HSM suspend emulation so this patch adds a common kvm_riscv_vcpu_wfi() function for this purpose. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-03-11RISC-V: Add SBI HSM suspend related definesGravatar Anup Patel 3-8/+25
We add defines related to SBI HSM suspend call and also update HSM states naming as-per the latest SBI specification. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-03-11RISC-V: KVM: Implement SBI v0.3 SRST extensionGravatar Anup Patel 2-0/+46
The SBI v0.3 specification defines SRST (System Reset) extension which provides a standard poweroff and reboot interface. This patch implements SRST extension for the KVM Guest. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-03-11RISC-V: KVM: Add common kvm_riscv_vcpu_sbi_system_reset() functionGravatar Anup Patel 3-16/+22
We rename kvm_sbi_system_shutdown() to kvm_riscv_vcpu_sbi_system_reset() and move it to vcpu_sbi.c so that it can be shared by SBI v0.1 shutdown and SBI v0.3 SRST extension. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-03-11RISC-V: KVM: Upgrade SBI spec version to v0.3Gravatar Anup Patel 1-1/+1
We upgrade SBI spec version implemented by KVM RISC-V to v0.3 so that Guest kernel can probe and use SBI extensions added by the SBI v0.3 specification. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-03-11RISC-V: KVM: Refine __kvm_riscv_switch_to() implementationGravatar Vincent Chen 1-26/+34
Kernel uses __kvm_riscv_switch_to() and __kvm_switch_return() to switch the context of host kernel and guest kernel. Several CSRs belonging to the context will be read and written during the context switch. To ensure atomic read-modify-write control of CSR and ordering of CSR accesses, some hardware blocks flush the pipeline when writing a CSR. In this circumstance, grouping CSR executions together as much as possible can reduce the performance impact of the pipeline. Therefore, this commit reorders the CSR instructions to enhance the context switch performance.. Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Suggested-by: Hsinyi Lee <hsinyi.lee@sifive.com> Suggested-by: Fu-Ching Yang <fu-ching.yang@sifive.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-03-11KVM: compat: riscv: Prevent KVM_COMPAT from being selectedGravatar Guo Ren 1-1/+1
Current riscv doesn't support the 32bit KVM API. Let's make it clear by not selecting KVM_COMPAT. Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Anup Patel <anup@brainfault.org> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-03-11RISC-V: KVM: remove unneeded semicolonGravatar Yang Li 1-1/+1
Eliminate the following coccicheck warning: ./arch/riscv/kvm/vcpu_sbi_v01.c:117:2-3: Unneeded semicolon Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-03-09Merge branch kvm-arm64/psci-1.1 into kvmarm-master/nextGravatar Marc Zyngier 1-6/+6
* kvm-arm64/psci-1.1: : . : Limited PSCI-1.1 support from Will Deacon: : : This small series exposes the PSCI SYSTEM_RESET2 call to guests, which : allows the propagation of a "reset_type" and a "cookie" back to the VMM. : Although Linux guests only ever pass 0 for the type ("SYSTEM_WARM_RESET"), : the vendor-defined range can be used by a bootloader to provide additional : information about the reset, such as an error code. : . KVM: arm64: Really propagate PSCI SYSTEM_RESET2 arguments to userspace Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-03-09KVM: arm64: Really propagate PSCI SYSTEM_RESET2 arguments to userspaceGravatar Will Deacon 1-6/+6
Commit d43583b890e7 ("KVM: arm64: Expose PSCI SYSTEM_RESET2 call to the guest") hooked up the SYSTEM_RESET2 PSCI call for guests but failed to preserve its arguments for userspace, instead overwriting them with zeroes via smccc_set_retval(). As Linux only passes zeroes for these arguments, this appeared to be working for Linux guests. Oh well. Don't call smccc_set_retval() for a SYSTEM_RESET2 heading to userspace and instead set X0 (and only X0) explicitly to PSCI_RET_INTERNAL_FAILURE just in case the vCPU re-enters the guest. Fixes: d43583b890e7 ("KVM: arm64: Expose PSCI SYSTEM_RESET2 call to the guest") Reported-by: Andrew Walbran <qwandor@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220309181308.982-1-will@kernel.org
2022-03-09Merge branch kvm-arm64/misc-5.18 into kvmarm-master/nextGravatar Marc Zyngier 4-48/+50
* kvm-arm64/misc-5.18: : . : Misc fixes for KVM/arm64 5.18: : : - Drop unused kvm parameter to kvm_psci_version() : : - Implement CONFIG_DEBUG_LIST at EL2 : : - Make CONFIG_ARM64_ERRATUM_2077057 default y : : - Only do the interrupt dance if we have exited because of an interrupt : : - Remove traces of 32bit ARM host support from the documentation : . Documentation: KVM: Update documentation to indicate KVM is arm64-only KVM: arm64: Only open the interrupt window on exit due to an interrupt KVM: arm64: Enable Cortex-A510 erratum 2077057 by default Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-03-09Documentation: KVM: Update documentation to indicate KVM is arm64-onlyGravatar Oliver Upton 2-45/+44
KVM support for 32-bit ARM hosts (KVM/arm) has been removed from the kernel since commit 541ad0150ca4 ("arm: Remove 32bit KVM host support"). There still exists some remnants of the old architecture in the KVM documentation. Remove all traces of 32-bit host support from the documentation. Note that AArch32 guests are still supported. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220308172856.2997250-1-oupton@google.com
2022-03-08KVM: SVM: Allow AVIC support on system w/ physical APIC ID > 255Gravatar Suravee Suthikulpanit 3-7/+13
Expand KVM's mask for the AVIC host physical ID to the full 12 bits defined by the architecture. The number of bits consumed by hardware is model specific, e.g. early CPUs ignored bits 11:8, but there is no way for KVM to enumerate the "true" size. So, KVM must allow using all bits, else it risks rejecting completely legal x2APIC IDs on newer CPUs. This means KVM relies on hardware to not assign x2APIC IDs that exceed the "true" width of the field, but presumably hardware is smart enough to tie the width to the max x2APIC ID. KVM also relies on hardware to support at least 8 bits, as the legacy xAPIC ID is writable by software. But, those assumptions are unavoidable due to the lack of any way to enumerate the "true" width. Cc: stable@vger.kernel.org Cc: Maxim Levitsky <mlevitsk@redhat.com> Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Fixes: 44a95dae1d22 ("KVM: x86: Detect and Initialize AVIC support") Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220211000851.185799-1-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-08KVM: selftests: Add test to populate a VM with the max possible guest memGravatar Sean Christopherson 3-0/+294
Add a selftest that enables populating a VM with the maximum amount of guest memory allowed by the underlying architecture. Abuse KVM's memslots by mapping a single host memory region into multiple memslots so that the selftest doesn't require a system with terabytes of RAM. Default to 512gb of guest memory, which isn't all that interesting, but should work on all MMUs and doesn't take an exorbitant amount of memory or time. E.g. testing with ~64tb of guest memory takes the better part of an hour, and requires 200gb of memory for KVM's page tables when using 4kb pages. To inflicit maximum abuse on KVM' MMU, default to 4kb pages (or whatever the not-hugepage size is) in the backing store (memfd). Use memfd for the host backing store to ensure that hugepages are guaranteed when requested, and to give the user explicit control of the size of hugepage being tested. By default, spin up as many vCPUs as there are available to the selftest, and distribute the work of dirtying each 4kb chunk of memory across all vCPUs. Dirtying guest memory forces KVM to populate its page tables, and also forces KVM to write back accessed/dirty information to struct page when the guest memory is freed. On x86, perform two passes with a MMU context reset between each pass to coerce KVM into dropping all references to the MMU root, e.g. to emulate a vCPU dropping the last reference. Perform both passes and all rendezvous on all architectures in the hope that arm64 and s390x can gain similar shenanigans in the future. Measure and report the duration of each operation, which is helpful not only to verify the test is working as intended, but also to easily evaluate the performance differences different page sizes. Provide command line options to limit the amount of guest memory, set the size of each slot (i.e. of the host memory region), set the number of vCPUs, and to enable usage of hugepages. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220226001546.360188-29-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>