aboutsummaryrefslogtreecommitdiff
path: root/virt/kvm
AgeCommit message (Collapse)AuthorFilesLines
2018-12-19arm/arm64: KVM: Add ARM_EXCEPTION_IS_TRAP macroGravatar Marc Zyngier 1-1/+1
32 and 64bit use different symbols to identify the traps. 32bit has a fine grained approach (prefetch abort, data abort and HVC), while 64bit is pretty happy with just "trap". This has been fine so far, except that we now need to decode some of that in tracepoints that are common to both architectures. Introduce ARM_EXCEPTION_IS_TRAP which abstracts the trap symbols and make the tracepoint use it. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-19KVM: arm/arm64: Fix unintended stage 2 PMD mappingsGravatar Christoffer Dall 1-22/+64
There are two things we need to take care of when we create block mappings in the stage 2 page tables: (1) The alignment within a PMD between the host address range and the guest IPA range must be the same, since otherwise we end up mapping pages with the wrong offset. (2) The head and tail of a memory slot may not cover a full block size, and we have to take care to not map those with block descriptors, since we could expose memory to the guest that the host did not intend to expose. So far, we have been taking care of (1), but not (2), and our commentary describing (1) was somewhat confusing. This commit attempts to factor out the checks of both into a common function, and if we don't pass the check, we won't attempt any PMD mappings for neither hugetlbfs nor THP. Note that we used to only check the alignment for THP, not for hugetlbfs, but as far as I can tell the check needs to be applied to both scenarios. Cc: Ralph Palutke <ralph.palutke@fau.de> Cc: Lukas Braun <koomi@moshbit.net> Reported-by: Lukas Braun <koomi@moshbit.net> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-19arm/arm64: KVM: vgic: Force VM halt when changing the active state of GICv3 ↵Gravatar Marc Zyngier 1-2/+4
PPIs/SGIs We currently only halt the guest when a vCPU messes with the active state of an SPI. This is perfectly fine for GICv2, but isn't enough for GICv3, where all vCPUs can access the state of any other vCPU. Let's broaden the condition to include any GICv3 interrupt that has an active state (i.e. all but LPIs). Cc: stable@vger.kernel.org Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-19KVM: arm/arm64: arch_timer: Simplify kvm_timer_vcpu_terminateGravatar Christoffer Dall 1-3/+0
kvm_timer_vcpu_terminate can only be called in two scenarios: 1. As part of cleanup during a failed VCPU create 2. As part of freeing the whole VM (struct kvm refcount == 0) In the first case, we cannot have programmed any timers or mapped any IRQs, and therefore we do not have to cancel anything or unmap anything. In the second case, the VCPU will have gone through kvm_timer_vcpu_put, which will have canceled the emulated physical timer's hrtimer, and we do not need to that here as well. We also do not care if the irq is recorded as mapped or not in the VGIC data structure, because the whole VM is going away. That leaves us only with having to ensure that we cancel the bg_timer if we were blocking the last time we called kvm_timer_vcpu_put(). Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-19KVM: arm/arm64: Remove arch timer workqueueGravatar Christoffer Dall 1-27/+7
The use of a work queue in the hrtimer expire function for the bg_timer is a leftover from the time when we would inject interrupts when the bg_timer expired. Since we are no longer doing that, we can instead call kvm_vcpu_wake_up() directly from the hrtimer function and remove all workqueue functionality from the arch timer code. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-19KVM: arm/arm64: Fixup the kvm_exit tracepointGravatar Christoffer Dall 1-9/+9
The kvm_exit tracepoint strangely always reported exits as being IRQs. This seems to be because either the __print_symbolic or the tracepoint macros use a variable named idx. Take this chance to update the fields in the tracepoint to reflect the concepts in the arm64 architecture that we pass to the tracepoint and move the exception type table to the same location and header files as the exits code. We also clear out the exception code to 0 for IRQ exits (which translates to UNKNOWN in text) to make it slighyly less confusing to parse the trace output. Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-19KVM: arm/arm64: vgic: Consider priority and active state for pending irqGravatar Christoffer Dall 1-1/+6
When checking if there are any pending IRQs for the VM, consider the active state and priority of the IRQs as well. Otherwise we could be continuously scheduling a guest hypervisor without it seeing an IRQ. Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-19KVM: arm/arm64: vgic: Fix off-by-one bug in vgic_get_irq()Gravatar Gustavo A. R. Silva 1-1/+1
When using the nospec API, it should be taken into account that: "...if the CPU speculates past the bounds check then * array_index_nospec() will clamp the index within the range of [0, * size)." The above is part of the header for macro array_index_nospec() in linux/nospec.h Now, in this particular case, if intid evaluates to exactly VGIC_MAX_SPI or to exaclty VGIC_MAX_PRIVATE, the array_index_nospec() macro ends up returning VGIC_MAX_SPI - 1 or VGIC_MAX_PRIVATE - 1 respectively, instead of VGIC_MAX_SPI or VGIC_MAX_PRIVATE, which, based on the original logic: /* SGIs and PPIs */ if (intid <= VGIC_MAX_PRIVATE) return &vcpu->arch.vgic_cpu.private_irqs[intid]; /* SPIs */ if (intid <= VGIC_MAX_SPI) return &kvm->arch.vgic.spis[intid - VGIC_NR_PRIVATE_IRQS]; are valid values for intid. Fix this by calling array_index_nospec() macro with VGIC_MAX_PRIVATE + 1 and VGIC_MAX_SPI + 1 as arguments for its parameter size. Fixes: 41b87599c743 ("KVM: arm/arm64: vgic: fix possible spectre-v1 in vgic_get_irq()") Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> [dropped the SPI part which was fixed separately] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm/arm64: vgic: Cap SPIs to the VM-defined maximumGravatar Marc Zyngier 1-2/+2
SPIs should be checked against the VMs specific configuration, and not the architectural maximum. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm64: Clarify explanation of STAGE2_PGTABLE_LEVELSGravatar Christoffer Dall 1-1/+1
In attempting to re-construct the logic for our stage 2 page table layout I found the reasoning in the comment explaining how we calculate the number of levels used for stage 2 page tables a bit backwards. This commit attempts to clarify the comment, to make it slightly easier to read without having the Arm ARM open on the right page. While we're at it, fixup a typo in a comment that was recently changed. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm/arm64: vgic: Do not cond_resched_lock() with IRQs disabledGravatar Julien Thierry 1-21/+0
To change the active state of an MMIO, halt is requested for all vcpus of the affected guest before modifying the IRQ state. This is done by calling cond_resched_lock() in vgic_mmio_change_active(). However interrupts are disabled at this point and we cannot reschedule a vcpu. We actually don't need any of this, as kvm_arm_halt_guest ensures that all the other vcpus are out of the guest. Let's just drop that useless code. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Suggested-by: Christoffer Dall <christoffer.dall@arm.com> Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm64: Add support for creating PUD hugepages at stage 2Gravatar Punit Agrawal 1-6/+98
KVM only supports PMD hugepages at stage 2. Now that the various page handling routines are updated, extend the stage 2 fault handling to map in PUD hugepages. Addition of PUD hugepage support enables additional page sizes (e.g., 1G with 4K granule) which can be useful on cores that support mapping larger block sizes in the TLB entries. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replace BUG() => WARN_ON(1) for arm32 PUD helpers ] Signed-off-by: Suzuki Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm64: Update age handlers to support PUD hugepagesGravatar Punit Agrawal 1-19/+20
In preparation for creating larger hugepages at Stage 2, add support to the age handling notifiers for PUD hugepages when encountered. Provide trivial helpers for arm32 to allow sharing code. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replaced BUG() => WARN_ON(1) for arm32 PUD helpers ] Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm64: Support handling access faults for PUD hugepagesGravatar Punit Agrawal 1-11/+11
In preparation for creating larger hugepages at Stage 2, extend the access fault handling at Stage 2 to support PUD hugepages when encountered. Provide trivial helpers for arm32 to allow sharing of code. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replaced BUG() => WARN_ON(1) in PUD helpers ] Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm64: Support PUD hugepage in stage2_is_exec()Gravatar Punit Agrawal 1-5/+48
In preparation for creating PUD hugepages at stage 2, add support for detecting execute permissions on PUD page table entries. Faults due to lack of execute permissions on page table entries is used to perform i-cache invalidation on first execute. Provide trivial implementations of arm32 helpers to allow sharing of code. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replaced BUG() => WARN_ON(1) in arm32 PUD helpers ] Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm64: Support dirty page tracking for PUD hugepagesGravatar Punit Agrawal 1-4/+7
In preparation for creating PUD hugepages at stage 2, add support for write protecting PUD hugepages when they are encountered. Write protecting guest tables is used to track dirty pages when migrating VMs. Also, provide trivial implementations of required kvm_s2pud_* helpers to allow sharing of code with arm32. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replaced BUG() => WARN_ON() in arm32 pud helpers ] Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm/arm64: Introduce helpers to manipulate page table entriesGravatar Punit Agrawal 1-6/+8
Introduce helpers to abstract architectural handling of the conversion of pfn to page table entries and marking a PMD page table entry as a block entry. The helpers are introduced in preparation for supporting PUD hugepages at stage 2 - which are supported on arm64 but do not exist on arm. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm/arm64: Re-factor setting the Stage 2 entry to exec on faultGravatar Punit Agrawal 1-13/+15
Stage 2 fault handler marks a page as executable if it is handling an execution fault or if it was a permission fault in which case the executable bit needs to be preserved. The logic to decide if the page should be marked executable is duplicated for PMD and PTE entries. To avoid creating another copy when support for PUD hugepages is introduced refactor the code to share the checks needed to mark a page table entry as executable. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm/arm64: Share common code in user_mem_abort()Gravatar Punit Agrawal 1-19/+30
The code for operations such as marking the pfn as dirty, and dcache/icache maintenance during stage 2 fault handling is duplicated between normal pages and PMD hugepages. Instead of creating another copy of the operations when we introduce PUD hugepages, let's share them across the different pagesizes. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm/arm64: vgic-v2: Set active_source to 0 when restoring stateGravatar Christoffer Dall 1-1/+16
When restoring the active state from userspace, we don't know which CPU was the source for the active state, and this is not architecturally exposed in any of the register state. Set the active_source to 0 in this case. In the future, we can expand on this and exposse the information as additional information to userspace for GICv2 if anyone cares. Cc: stable@vger.kernel.org Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm/arm64: Fix VMID alloc race by reverting to lock-lessGravatar Christoffer Dall 1-12/+11
We recently addressed a VMID generation race by introducing a read/write lock around accesses and updates to the vmid generation values. However, kvm_arch_vcpu_ioctl_run() also calls need_new_vmid_gen() but does so without taking the read lock. As far as I can tell, this can lead to the same kind of race: VM 0, VCPU 0 VM 0, VCPU 1 ------------ ------------ update_vttbr (vmid 254) update_vttbr (vmid 1) // roll over read_lock(kvm_vmid_lock); force_vm_exit() local_irq_disable need_new_vmid_gen == false //because vmid gen matches enter_guest (vmid 254) kvm_arch.vttbr = <PGD>:<VMID 1> read_unlock(kvm_vmid_lock); enter_guest (vmid 1) Which results in running two VCPUs in the same VM with different VMIDs and (even worse) other VCPUs from other VMs could now allocate clashing VMID 254 from the new generation as long as VCPU 0 is not exiting. Attempt to solve this by making sure vttbr is updated before another CPU can observe the updated VMID generation. Cc: stable@vger.kernel.org Fixes: f0cf47d939d0 "KVM: arm/arm64: Close VMID generation race" Reviewed-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18arm64: KVM: Consistently advance singlestep when emulating instructionsGravatar Mark Rutland 2-3/+5
When we emulate a guest instruction, we don't advance the hardware singlestep state machine, and thus the guest will receive a software step exception after a next instruction which is not emulated by the host. We bodge around this in an ad-hoc fashion. Sometimes we explicitly check whether userspace requested a single step, and fake a debug exception from within the kernel. Other times, we advance the HW singlestep state rely on the HW to generate the exception for us. Thus, the observed step behaviour differs for host and guest. Let's make this simpler and consistent by always advancing the HW singlestep state machine when we skip an instruction. Thus we can rely on the hardware to generate the singlestep exception for us, and never need to explicitly check for an active-pending step, nor do we need to fake a debug exception from the guest. Cc: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18arm64: KVM: Skip MMIO insn after emulationGravatar Mark Rutland 1-5/+6
When we emulate an MMIO instruction, we advance the CPU state within decode_hsr(), before emulating the instruction effects. Having this logic in decode_hsr() is opaque, and advancing the state before emulation is problematic. It gets in the way of applying consistent single-step logic, and it prevents us from being able to fail an MMIO instruction with a synchronous exception. Clean this up by only advancing the CPU state *after* the effects of the instruction are emulated. Cc: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-10-26Revert "mm, mmu_notifier: annotate mmu notifiers with blockable invalidate ↵Gravatar Michal Hocko 1-1/+0
callbacks" Revert 5ff7091f5a2ca ("mm, mmu_notifier: annotate mmu notifiers with blockable invalidate callbacks"). MMU_INVALIDATE_DOES_NOT_BLOCK flags was the only one used and it is no longer needed since 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers"). We now have a full support for per range !blocking behavior so we can drop the stop gap workaround which the per notifier flag was used for. Link: http://lkml.kernel.org/r/20180827112623.8992-4-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: David Rientjes <rientjes@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-25Merge tag 'kvm-4.20-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmGravatar Linus Torvalds 7-120/+125
Pull KVM updates from Radim Krčmář: "ARM: - Improved guest IPA space support (32 to 52 bits) - RAS event delivery for 32bit - PMU fixes - Guest entry hardening - Various cleanups - Port of dirty_log_test selftest PPC: - Nested HV KVM support for radix guests on POWER9. The performance is much better than with PR KVM. Migration and arbitrary level of nesting is supported. - Disable nested HV-KVM on early POWER9 chips that need a particular hardware bug workaround - One VM per core mode to prevent potential data leaks - PCI pass-through optimization - merge ppc-kvm topic branch and kvm-ppc-fixes to get a better base s390: - Initial version of AP crypto virtualization via vfio-mdev - Improvement for vfio-ap - Set the host program identifier - Optimize page table locking x86: - Enable nested virtualization by default - Implement Hyper-V IPI hypercalls - Improve #PF and #DB handling - Allow guests to use Enlightened VMCS - Add migration selftests for VMCS and Enlightened VMCS - Allow coalesced PIO accesses - Add an option to perform nested VMCS host state consistency check through hardware - Automatic tuning of lapic_timer_advance_ns - Many fixes, minor improvements, and cleanups" * tag 'kvm-4.20-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits) KVM/nVMX: Do not validate that posted_intr_desc_addr is page aligned Revert "kvm: x86: optimize dr6 restore" KVM: PPC: Optimize clearing TCEs for sparse tables x86/kvm/nVMX: tweak shadow fields selftests/kvm: add missing executables to .gitignore KVM: arm64: Safety check PSTATE when entering guest and handle IL KVM: PPC: Book3S HV: Don't use streamlined entry path on early POWER9 chips arm/arm64: KVM: Enable 32 bits kvm vcpu events support arm/arm64: KVM: Rename function kvm_arch_dev_ioctl_check_extension() KVM: arm64: Fix caching of host MDCR_EL2 value KVM: VMX: enable nested virtualization by default KVM/x86: Use 32bit xor to clear registers in svm.c kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD kvm: vmx: Defer setting of DR6 until #DB delivery kvm: x86: Defer setting of CR2 until #PF delivery kvm: x86: Add payload operands to kvm_multiple_exception kvm: x86: Add exception payload fields to kvm_vcpu_events kvm: x86: Add has_payload and payload to kvm_queued_exception KVM: Documentation: Fix omission in struct kvm_vcpu_events KVM: selftests: add Enlightened VMCS test ...
2018-10-24Merge branch 'siginfo-linus' of ↵Gravatar Linus Torvalds 1-10/+4
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull siginfo updates from Eric Biederman: "I have been slowly sorting out siginfo and this is the culmination of that work. The primary result is in several ways the signal infrastructure has been made less error prone. The code has been updated so that manually specifying SEND_SIG_FORCED is never necessary. The conversion to the new siginfo sending functions is now complete, which makes it difficult to send a signal without filling in the proper siginfo fields. At the tail end of the patchset comes the optimization of decreasing the size of struct siginfo in the kernel from 128 bytes to about 48 bytes on 64bit. The fundamental observation that enables this is by definition none of the known ways to use struct siginfo uses the extra bytes. This comes at the cost of a small user space observable difference. For the rare case of siginfo being injected into the kernel only what can be copied into kernel_siginfo is delivered to the destination, the rest of the bytes are set to 0. For cases where the signal and the si_code are known this is safe, because we know those bytes are not used. For cases where the signal and si_code combination is unknown the bits that won't fit into struct kernel_siginfo are tested to verify they are zero, and the send fails if they are not. I made an extensive search through userspace code and I could not find anything that would break because of the above change. If it turns out I did break something it will take just the revert of a single change to restore kernel_siginfo to the same size as userspace siginfo. Testing did reveal dependencies on preferring the signo passed to sigqueueinfo over si->signo, so bit the bullet and added the complexity necessary to handle that case. Testing also revealed bad things can happen if a negative signal number is passed into the system calls. Something no sane application will do but something a malicious program or a fuzzer might do. So I have fixed the code that performs the bounds checks to ensure negative signal numbers are handled" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (80 commits) signal: Guard against negative signal numbers in copy_siginfo_from_user32 signal: Guard against negative signal numbers in copy_siginfo_from_user signal: In sigqueueinfo prefer sig not si_signo signal: Use a smaller struct siginfo in the kernel signal: Distinguish between kernel_siginfo and siginfo signal: Introduce copy_siginfo_from_user and use it's return value signal: Remove the need for __ARCH_SI_PREABLE_SIZE and SI_PAD_SIZE signal: Fail sigqueueinfo if si_signo != sig signal/sparc: Move EMT_TAGOVF into the generic siginfo.h signal/unicore32: Use force_sig_fault where appropriate signal/unicore32: Generate siginfo in ucs32_notify_die signal/unicore32: Use send_sig_fault where appropriate signal/arc: Use force_sig_fault where appropriate signal/arc: Push siginfo generation into unhandled_exception signal/ia64: Use force_sig_fault where appropriate signal/ia64: Use the force_sig(SIGSEGV,...) in ia64_rt_sigreturn signal/ia64: Use the generic force_sigsegv in setup_frame signal/arm/kvm: Use send_sig_mceerr signal/arm: Use send_sig_fault where appropriate signal/arm: Use force_sig_fault where appropriate ...
2018-10-22Merge tag 'arm64-upstream' of ↵Gravatar Linus Torvalds 1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "Apart from some new arm64 features and clean-ups, this also contains the core mmu_gather changes for tracking the levels of the page table being cleared and a minor update to the generic compat_sys_sigaltstack() introducing COMPAT_SIGMINSKSZ. Summary: - Core mmu_gather changes which allow tracking the levels of page-table being cleared together with the arm64 low-level flushing routines - Support for the new ARMv8.5 PSTATE.SSBS bit which can be used to mitigate Spectre-v4 dynamically without trapping to EL3 firmware - Introduce COMPAT_SIGMINSTKSZ for use in compat_sys_sigaltstack - Optimise emulation of MRS instructions to ID_* registers on ARMv8.4 - Support for Common Not Private (CnP) translations allowing threads of the same CPU to share the TLB entries - Accelerated crc32 routines - Move swapper_pg_dir to the rodata section - Trap WFI instruction executed in user space - ARM erratum 1188874 workaround (arch_timer) - Miscellaneous fixes and clean-ups" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (78 commits) arm64: KVM: Guests can skip __install_bp_hardening_cb()s HYP work arm64: cpufeature: Trap CTR_EL0 access only where it is necessary arm64: cpufeature: Fix handling of CTR_EL0.IDC field arm64: cpufeature: ctr: Fix cpu capability check for late CPUs Documentation/arm64: HugeTLB page implementation arm64: mm: Use __pa_symbol() for set_swapper_pgd() arm64: Add silicon-errata.txt entry for ARM erratum 1188873 Revert "arm64: uaccess: implement unsafe accessors" arm64: mm: Drop the unused cpu parameter MAINTAINERS: fix bad sdei paths arm64: mm: Use #ifdef for the __PAGETABLE_P?D_FOLDED defines arm64: Fix typo in a comment in arch/arm64/mm/kasan_init.c arm64: xen: Use existing helper to check interrupt status arm64: Use daifflag_restore after bp_hardening arm64: daifflags: Use irqflags functions for daifflags arm64: arch_timer: avoid unused function warning arm64: Trap WFI executed in userspace arm64: docs: Document SSBS HWCAP arm64: docs: Fix typos in ELF hwcaps arm64/kprobes: remove an extra semicolon in arch_prepare_kprobe ...
2018-10-19Merge tag 'kvmarm-for-v4.20' of ↵Gravatar Paolo Bonzini 5-102/+92
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm updates for 4.20 - Improved guest IPA space support (32 to 52 bits) - RAS event delivery for 32bit - PMU fixes - Guest entry hardening - Various cleanups
2018-10-18arm/arm64: KVM: Enable 32 bits kvm vcpu events supportGravatar Dongjiu Geng 1-0/+1
The commit 539aee0edb9f ("KVM: arm64: Share the parts of get/set events useful to 32bit") shares the get/set events helper for arm64 and arm32, but forgot to share the cap extension code. User space will check whether KVM supports vcpu events by checking the KVM_CAP_VCPU_EVENTS extension Acked-by: James Morse <james.morse@arm.com> Reviewed-by : Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-10-18arm/arm64: KVM: Rename function kvm_arch_dev_ioctl_check_extension()Gravatar Dongjiu Geng 1-1/+1
Rename kvm_arch_dev_ioctl_check_extension() to kvm_arch_vm_ioctl_check_extension(), because it does not have any relationship with device. Renaming this function can make code readable. Cc: James Morse <james.morse@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-10-17KVM: arm64: Fix caching of host MDCR_EL2 valueGravatar Mark Rutland 1-2/+2
At boot time, KVM stashes the host MDCR_EL2 value, but only does this when the kernel is not running in hyp mode (i.e. is non-VHE). In these cases, the stashed value of MDCR_EL2.HPMN happens to be zero, which can lead to CONSTRAINED UNPREDICTABLE behaviour. Since we use this value to derive the MDCR_EL2 value when switching to/from a guest, after a guest have been run, the performance counters do not behave as expected. This has been observed to result in accesses via PMXEVTYPER_EL0 and PMXEVCNTR_EL0 not affecting the relevant counters, resulting in events not being counted. In these cases, only the fixed-purpose cycle counter appears to work as expected. Fix this by always stashing the host MDCR_EL2 value, regardless of VHE. Cc: Christopher Dall <christoffer.dall@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: stable@vger.kernel.org Fixes: 1e947bad0b63b351 ("arm64: KVM: Skip HYP setup when already running in HYP") Tested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-10-17KVM: refine the comment of function gfn_to_hva_memslot_prot()Gravatar Wei Yang 1-2/+6
The original comment is little hard to understand. No functional change, just amend the comment a little. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17kvm/x86 : add coalesced pio supportGravatar Peng Hao 2-3/+11
Coalesced pio is based on coalesced mmio and can be used for some port like rtc port, pci-host config port and so on. Specially in case of rtc as coalesced pio, some versions of windows guest access rtc frequently because of rtc as system tick. guest access rtc like this: write register index to 0x70, then write or read data from 0x71. writing 0x70 port is just as index and do nothing else. So we can use coalesced pio to handle this scene to reduce VM-EXIT time. When starting and closing a virtual machine, it will access pci-host config port frequently. So setting these port as coalesced pio can reduce startup and shutdown time. without my patch, get the vm-exit time of accessing rtc 0x70 and piix 0xcf8 using perf tools: (guest OS : windows 7 64bit) IO Port Access Samples Samples% Time% Min Time Max Time Avg time 0x70:POUT 86 30.99% 74.59% 9us 29us 10.75us (+- 3.41%) 0xcf8:POUT 1119 2.60% 2.12% 2.79us 56.83us 3.41us (+- 2.23%) with my patch IO Port Access Samples Samples% Time% Min Time Max Time Avg time 0x70:POUT 106 32.02% 29.47% 0us 10us 1.57us (+- 7.38%) 0xcf8:POUT 1065 1.67% 0.28% 0.41us 65.44us 0.66us (+- 10.55%) Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17KVM: leverage change to adjust slots->used_slots in update_memslots()Gravatar Wei Yang 1-9/+14
update_memslots() is only called by __kvm_set_memory_region(), in which "change" is calculated and indicates how to adjust slots->used_slots * increase by one if it is KVM_MR_CREATE * decrease by one if it is KVM_MR_DELETE * not change for others This patch adjusts slots->used_slots in update_memslots() based on "change" value instead of re-calculate those states again. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17KVM: x86: hyperv: optimize 'all cpus' case in kvm_hv_flush_tlb()Gravatar Vitaly Kuznetsov 1-4/+2
We can use 'NULL' to represent 'all cpus' case in kvm_make_vcpus_request_mask() and avoid building vCPU mask with all vCPUs. Suggested-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-03KVM: arm/arm64: Ensure only THP is candidate for adjustmentGravatar Punit Agrawal 1-1/+7
PageTransCompoundMap() returns true for hugetlbfs and THP hugepages. This behaviour incorrectly leads to stage 2 faults for unsupported hugepage sizes (e.g., 64K hugepage with 4K pages) to be treated as THP faults. Tighten the check to filter out hugetlbfs pages. This also leads to consistently mapping all unsupported hugepage sizes as PTE level entries at stage 2. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Suzuki Poulose <suzuki.poulose@arm.com> Cc: Christoffer Dall <christoffer.dall@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-10-03KVM: arm64: Drop __cpu_init_stage2 on the VHE pathGravatar Marc Zyngier 1-8/+2
__cpu_init_stage2 doesn't do anything anymore on arm64, and is totally non-sensical if running VHE (as VHE is 64bit only). Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-10-03KVM: arm/arm64: Rename kvm_arm_config_vm to kvm_arm_setup_stage2Gravatar Marc Zyngier 1-1/+1
VM tends to be a very overloaded term in KVM, so let's keep it to describe the virtual machine. For the virtual memory setup, let's use the "stage2" suffix. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-10-03kvm: arm64: Set a limit on the IPA sizeGravatar Suzuki K Poulose 1-0/+2
So far we have restricted the IPA size of the VM to the default value (40bits). Now that we can manage the IPA size per VM and support dynamic stage2 page tables, we can allow VMs to have larger IPA. This patch introduces a the maximum IPA size supported on the host. This is decided by the following factors : 1) Maximum PARange supported by the CPUs - This can be inferred from the system wide safe value. 2) Maximum PA size supported by the host kernel (48 vs 52) 3) Number of levels in the host page table (as we base our stage2 tables on the host table helpers). Since the stage2 page table code is dependent on the stage1 page table, we always ensure that : Number of Levels at Stage1 >= Number of Levels at Stage2 So we limit the IPA to make sure that the above condition is satisfied. This will affect the following combinations of VA_BITS and IPA for different page sizes. Host configuration | Unsupported IPA ranges 39bit VA, 4K | [44, 48] 36bit VA, 16K | [41, 48] 42bit VA, 64K | [47, 52] Supporting the above combinations need independent stage2 page table manipulation code, which would need substantial changes. We could purse the solution independently and switch the page table code once we have it ready. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <cdall@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-10-01vgic: Add support for 52bit guest physical addressGravatar Kristina Martsenko 2-28/+10
Add support for handling 52bit guest physical address to the VGIC layer. So far we have limited the guest physical address to 48bits, by explicitly masking the upper bits. This patch removes the restriction. We do not have to check if the host supports 52bit as the gpa is always validated during an access. (e.g, kvm_{read/write}_guest, kvm_is_visible_gfn()). Also, the ITS table save-restore is also not affected with the enhancement. The DTE entries already store the bits[51:8] of the ITT_addr (with a 256byte alignment). Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <cdall@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> [ Macro clean ups, fix PROPBASER and PENDBASER accesses ] Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-10-01kvm: arm/arm64: Prepare for VM specific stage2 translationsGravatar Suzuki K Poulose 3-60/+63
Right now the stage2 page table for a VM is hard coded, assuming an IPA of 40bits. As we are about to add support for per VM IPA, prepare the stage2 page table helpers to accept the kvm instance to make the right decision for the VM. No functional changes. Adds stage2_pgd_size(kvm) to replace S2_PGD_SIZE. Also, moves some of the definitions in arm32 to align with the arm64. Also drop the _AC() specifier constants wherever possible. Cc: Christoffer Dall <cdall@kernel.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-10-01kvm: arm/arm64: Allow arch specific configurations for VMGravatar Suzuki K Poulose 1-2/+3
Allow the arch backends to perform VM specific initialisation. This will be later used to handle IPA size configuration and per-VM VTCR configuration on arm64. Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <cdall@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-10-01kvm: arm/arm64: Remove spurious WARN_ONGravatar Suzuki K Poulose 1-1/+1
On a 4-level page table pgd entry can be empty, unlike a 3-level page table. Remove the spurious WARN_ON() in stage_get_pud(). Acked-by: Christoffer Dall <cdall@kernel.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-10-01kvm: arm/arm64: Fix stage2_flush_memslot for 4 level page tableGravatar Suzuki K Poulose 1-1/+2
So far we have only supported 3 level page table with fixed IPA of 40bits, where PUD is folded. With 4 level page tables, we need to check if the PUD entry is valid or not. Fix stage2_flush_memslot() to do this check, before walking down the table. Acked-by: Christoffer Dall <cdall@kernel.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-09-27signal/arm/kvm: Use send_sig_mceerrGravatar Eric W. Biederman 1-10/+4
This simplifies the code making it clearer what is going on, and making the siginfo generation easier to maintain. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-09-18arm64: KVM: Enable Common Not Private translationsGravatar Vladimir Murzin 1-2/+2
We rely on cpufeature framework to detect and enable CNP so for KVM we need to patch hyp to set CNP bit just before TTBR0_EL2 gets written. For the guest we encode CNP bit while building vttbr, so we don't need to bother with that in a world switch. Reviewed-by: James Morse <james.morse@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-07KVM: Remove obsolete kvm_unmap_hva notifier backendGravatar Marc Zyngier 2-27/+0
kvm_unmap_hva is long gone, and we only have kvm_unmap_hva_range to deal with. Drop the now obsolete code. Fixes: fb1522e099f0 ("KVM: update to new mmu_notifier semantic v2") Cc: James Hogan <jhogan@kernel.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2018-09-07KVM: arm/arm64: Clean dcache to PoC when changing PTE due to CoWGravatar Marc Zyngier 1-1/+8
When triggering a CoW, we unmap the RO page via an MMU notifier (invalidate_range_start), and then populate the new PTE using another one (change_pte). In the meantime, we'll have copied the old page into the new one. The problem is that the data for the new page is sitting in the cache, and should the guest have an uncached mapping to that page (or its MMU off), following accesses will bypass the cache. In a way, this is similar to what happens on a translation fault: We need to clean the page to the PoC before mapping it. So let's just do that. This fixes a KVM unit test regression observed on a HiSilicon platform, and subsequently reproduced on Seattle. Fixes: a9c0e12ebee5 ("KVM: arm/arm64: Only clean the dcache on translation fault") Cc: stable@vger.kernel.org # v4.16+ Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2018-08-22Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmGravatar Linus Torvalds 14-109/+413
Pull second set of KVM updates from Paolo Bonzini: "ARM: - Support for Group0 interrupts in guests - Cache management optimizations for ARMv8.4 systems - Userspace interface for RAS - Fault path optimization - Emulated physical timer fixes - Random cleanups x86: - fixes for L1TF - a new test case - non-support for SGX (inject the right exception in the guest) - fix lockdep false positive" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (49 commits) KVM: VMX: fixes for vmentry_l1d_flush module parameter kvm: selftest: add dirty logging test kvm: selftest: pass in extra memory when create vm kvm: selftest: include the tools headers kvm: selftest: unify the guest port macros tools: introduce test_and_clear_bit KVM: x86: SVM: Call x86_spec_ctrl_set_guest/host() with interrupts disabled KVM: vmx: Inject #UD for SGX ENCLS instruction in guest KVM: vmx: Add defines for SGX ENCLS exiting x86/kvm/vmx: Fix coding style in vmx_setup_l1d_flush() x86: kvm: avoid unused variable warning KVM: Documentation: rename the capability of KVM_CAP_ARM_SET_SERROR_ESR KVM: arm/arm64: Skip updating PTE entry if no change KVM: arm/arm64: Skip updating PMD entry if no change KVM: arm: Use true and false for boolean values KVM: arm/arm64: vgic: Do not use spin_lock_irqsave/restore with irq disabled KVM: arm/arm64: vgic: Move DEBUG_SPINLOCK_BUG_ON to vgic.h KVM: arm: vgic-v3: Add support for ICC_SGI0R and ICC_ASGI1R accesses KVM: arm64: vgic-v3: Add support for ICC_SGI0R_EL1 and ICC_ASGI1R_EL1 accesses KVM: arm/arm64: vgic-v3: Add core support for Group0 SGIs ...
2018-08-22Merge branch 'akpm' (patches from Andrew)Gravatar Linus Torvalds 1-5/+10
Merge more updates from Andrew Morton: - the rest of MM - procfs updates - various misc things - more y2038 fixes - get_maintainer updates - lib/ updates - checkpatch updates - various epoll updates - autofs updates - hfsplus - some reiserfs work - fatfs updates - signal.c cleanups - ipc/ updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (166 commits) ipc/util.c: update return value of ipc_getref from int to bool ipc/util.c: further variable name cleanups ipc: simplify ipc initialization ipc: get rid of ids->tables_initialized hack lib/rhashtable: guarantee initial hashtable allocation lib/rhashtable: simplify bucket_table_alloc() ipc: drop ipc_lock() ipc/util.c: correct comment in ipc_obtain_object_check ipc: rename ipcctl_pre_down_nolock() ipc/util.c: use ipc_rcu_putref() for failues in ipc_addid() ipc: reorganize initialization of kern_ipc_perm.seq ipc: compute kern_ipc_perm.id under the ipc lock init/Kconfig: remove EXPERT from CHECKPOINT_RESTORE fs/sysv/inode.c: use ktime_get_real_seconds() for superblock stamp adfs: use timespec64 for time conversion kernel/sysctl.c: fix typos in comments drivers/rapidio/devices/rio_mport_cdev.c: remove redundant pointer md fork: don't copy inconsistent signal handler state to child signal: make get_signal() return bool signal: make sigkill_pending() return bool ...