aboutsummaryrefslogtreecommitdiff
path: root/include/kvm
AgeCommit message (Collapse)AuthorFilesLines
2024-05-08Merge branch kvm-arm64/misc-6.10 into kvmarm-master/nextGravatar Marc Zyngier 1-1/+1
* kvm-arm64/misc-6.10: : . : Misc fixes and updates targeting 6.10 : : - Improve boot-time diagnostics when the sysreg tables : are not correctly sorted : : - Allow FFA_MSG_SEND_DIRECT_REQ in the FFA proxy : : - Fix duplicate XNX field in the ID_AA64MMFR1_EL1 : writeable mask : : - Allocate PPIs and SGIs outside of the vcpu structure, allowing : for smaller EL2 mapping and some flexibility in implementing : more or less than 32 private IRQs. : : - Use bitmap_gather() instead of its open-coded equivalent : : - Make protected mode use hVHE if available : : - Purge stale mpidr_data if a vcpu is created after the MPIDR : map has been created : . KVM: arm64: Destroy mpidr_data for 'late' vCPU creation KVM: arm64: Use hVHE in pKVM by default on CPUs with VHE support KVM: arm64: Fix hvhe/nvhe early alias parsing KVM: arm64: Convert kvm_mpidr_index() to bitmap_gather() KVM: arm64: vgic: Allocate private interrupts on demand KVM: arm64: Remove duplicated AA64MMFR1_EL1 XNX KVM: arm64: Remove FFA_MSG_SEND_DIRECT_REQ from the denylist KVM: arm64: Improve out-of-order sysreg table diagnostics Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-03Merge branch kvm-arm64/pkvm-6.10 into kvmarm-master/nextGravatar Marc Zyngier 1-1/+0
* kvm-arm64/pkvm-6.10: (25 commits) : . : At last, a bunch of pKVM patches, courtesy of Fuad Tabba. : From the cover letter: : : "This series is a bit of a bombay-mix of patches we've been : carrying. There's no one overarching theme, but they do improve : the code by fixing existing bugs in pKVM, refactoring code to : make it more readable and easier to re-use for pKVM, or adding : functionality to the existing pKVM code upstream." : . KVM: arm64: Force injection of a data abort on NISV MMIO exit KVM: arm64: Restrict supported capabilities for protected VMs KVM: arm64: Refactor setting the return value in kvm_vm_ioctl_enable_cap() KVM: arm64: Document the KVM/arm64-specific calls in hypercalls.rst KVM: arm64: Rename firmware pseudo-register documentation file KVM: arm64: Reformat/beautify PTP hypercall documentation KVM: arm64: Clarify rationale for ZCR_EL1 value restored on guest exit KVM: arm64: Introduce and use predicates that check for protected VMs KVM: arm64: Add is_pkvm_initialized() helper KVM: arm64: Simplify vgic-v3 hypercalls KVM: arm64: Move setting the page as dirty out of the critical section KVM: arm64: Change kvm_handle_mmio_return() return polarity KVM: arm64: Fix comment for __pkvm_vcpu_init_traps() KVM: arm64: Prevent kmemleak from accessing .hyp.data KVM: arm64: Do not map the host fpsimd state to hyp in pKVM KVM: arm64: Rename __tlb_switch_to_{guest,host}() in VHE KVM: arm64: Support TLB invalidation in guest context KVM: arm64: Avoid BBM when changing only s/w bits in Stage-2 PTE KVM: arm64: Check for PTE validity when checking for executable/cacheable KVM: arm64: Avoid BUG-ing from the host abort path ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-03KVM: arm64: vgic: Allocate private interrupts on demandGravatar Marc Zyngier 1-1/+1
Private interrupts are currently part of the CPU interface structure that is part of each and every vcpu we create. Currently, we have 32 of them per vcpu, resulting in a per-vcpu array that is just shy of 4kB. On its own, that's no big deal, but it gets in the way of other things: - each vcpu gets mapped at EL2 on nVHE/hVHE configurations. This requires memory that is physically contiguous. However, the EL2 code has no purpose looking at the interrupt structures and could do without them being mapped. - supporting features such as EPPIs, which extend the number of private interrupts past the 32 limit would make the array even larger, even for VMs that do not use the EPPI feature. Address these issues by moving the private interrupt array outside of the vcpu, and replace it with a simple pointer. We take this opportunity to make it obvious what gets initialised when, as that path was remarkably opaque, and tighten the locking. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240502154545.3012089-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-01KVM: arm64: Simplify vgic-v3 hypercallsGravatar Marc Zyngier 1-1/+0
Consolidate the GICv3 VMCR accessor hypercalls into the APR save/restore hypercalls so that all of the EL2 GICv3 state is covered by a single pair of hypercalls. Signed-off-by: Fuad Tabba <tabba@google.com> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240423150538.2103045-17-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-04-25KVM: arm64: vgic-its: Get rid of the lpi_list_lockGravatar Oliver Upton 1-3/+0
The last genuine use case for the lpi_list_lock was the global LPI translation cache, which has been removed in favor of a per-ITS xarray. Remove a layer from the locking puzzle by getting rid of it. vgic_add_lpi() still has a critical section that needs to protect against the insertion of other LPIs; change it to take the LPI xarray's xa_lock to retain this property. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240422200158.2606761-13-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-04-25KVM: arm64: vgic-its: Rip out the global translation cacheGravatar Oliver Upton 1-3/+0
The MSI injection fast path has been transitioned away from the global translation cache. Rip it out. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240422200158.2606761-12-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-04-25KVM: arm64: vgic-its: Maintain a translation cache per ITSGravatar Oliver Upton 1-0/+6
Within the context of a single ITS, it is possible to use an xarray to cache the device ID & event ID translation to a particular irq descriptor. Take advantage of this to build a translation cache capable of fitting all valid translations for a given ITS. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240422200158.2606761-9-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-04-25KVM: arm64: vgic-its: Get rid of vgic_copy_lpi_list()Gravatar Oliver Upton 1-1/+0
The last user has been transitioned to walking the LPI xarray directly. Cut the wart off, and get rid of the now unneeded lpi_count while doing so. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240422200158.2606761-7-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-04-25KVM: arm64: vgic-debug: Use an xarray mark for debug iteratorGravatar Oliver Upton 1-0/+2
The vgic debug iterator is the final user of vgic_copy_lpi_list(), but is a bit more complicated to transition to something else. Use a mark in the LPI xarray to record the indices 'known' to the debug iterator. Protect against the LPIs from being freed by associating an additional reference with the xarray mark. Rework iter_next() to let the xarray walk 'drive' the iteration after visiting all of the SGIs, PPIs, and SPIs. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240422200158.2606761-6-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-03-26KVM: arm64: Fix host-programmed guest events in nVHEGravatar Oliver Upton 1-1/+1
Programming PMU events in the host that count during guest execution is a feature supported by perf, e.g. perf stat -e cpu_cycles:G ./lkvm run While this works for VHE, the guest/host event bitmaps are not carried through to the hypervisor in the nVHE configuration. Make kvm_pmu_update_vcpu_events() conditional on whether or not _hardware_ supports PMUv3 rather than if the vCPU as vPMU enabled. Cc: stable@vger.kernel.org Fixes: 84d751a019a9 ("KVM: arm64: Pass pmu events to hyp via vcpu") Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240305184840.636212-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-03-07Merge branch kvm-arm64/lpi-xarray into kvmarm/nextGravatar Oliver Upton 1-4/+5
* kvm-arm64/lpi-xarray: : xarray-based representation of vgic LPIs : : KVM's linked-list of LPI state has proven to be a bottleneck in LPI : injection paths, due to lock serialization when acquiring / releasing a : reference on an IRQ. : : Start the tedious process of reworking KVM's LPI injection by replacing : the LPI linked-list with an xarray, leveraging this to allow RCU readers : to walk it outside of the spinlock. KVM: arm64: vgic: Don't acquire the lpi_list_lock in vgic_put_irq() KVM: arm64: vgic: Ensure the irq refcount is nonzero when taking a ref KVM: arm64: vgic: Rely on RCU protection in vgic_get_lpi() KVM: arm64: vgic: Free LPI vgic_irq structs in an RCU-safe manner KVM: arm64: vgic: Use atomics to count LPIs KVM: arm64: vgic: Get rid of the LPI linked-list KVM: arm64: vgic-its: Walk the LPI xarray in vgic_copy_lpi_list() KVM: arm64: vgic-v3: Iterate the xarray to find pending LPIs KVM: arm64: vgic: Use xarray to find LPI in vgic_get_lpi() KVM: arm64: vgic: Store LPIs in an xarray Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-23KVM: arm64: vgic: Free LPI vgic_irq structs in an RCU-safe mannerGravatar Oliver Upton 1-0/+1
Free the vgic_irq structs in an RCU-safe manner to allow reads of the LPI configuration data to happen in parallel with the release of LPIs. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-8-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-23KVM: arm64: vgic: Use atomics to count LPIsGravatar Oliver Upton 1-2/+2
Switch to using atomics for LPI accounting, allowing vgic_irq references to be dropped in parallel. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-7-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-23KVM: arm64: vgic: Get rid of the LPI linked-listGravatar Oliver Upton 1-2/+0
All readers of LPI configuration have been transitioned to use the LPI xarray. Get rid of the linked-list altogether. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-6-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-23KVM: arm64: vgic: Store LPIs in an xarrayGravatar Oliver Upton 1-0/+2
Using a linked-list for LPIs is less than ideal as it of course requires iterative searches to find a particular entry. An xarray is a better data structure for this use case, as it provides faster searches and can still handle a potentially sparse range of INTID allocations. Start by storing LPIs in an xarray, punting usage of the xarray to a subsequent change. The observant among you will notice that we added yet another lock to the chain of locking order rules; document the ordering of the xa_lock. Don't worry, we'll get rid of the lpi_list_lock one day... Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240221054253.3848076-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-19KVM: arm64: Add feature checking helpersGravatar Marc Zyngier 1-11/+0
In order to make it easier to check whether a particular feature is exposed to a guest, add a new set of helpers, with kvm_has_feat() being the most useful. Let's start making use of them in the PMU code (courtesy of Oliver). Follow-up changes will introduce additional use patterns. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Co-developed--by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240214131827.2856277-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-31Merge tag 'kvmarm-6.7' of ↵Gravatar Paolo Bonzini 4-6/+30
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.7 - Generalized infrastructure for 'writable' ID registers, effectively allowing userspace to opt-out of certain vCPU features for its guest - Optimization for vSGI injection, opportunistically compressing MPIDR to vCPU mapping into a table - Improvements to KVM's PMU emulation, allowing userspace to select the number of PMCs available to a VM - Guest support for memory operation instructions (FEAT_MOPS) - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing bugs and getting rid of useless code - Changes to the way the SMCCC filter is constructed, avoiding wasted memory allocations when not in use - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing the overhead of errata mitigations - Miscellaneous kernel and selftest fixes
2023-10-30Merge branch kvm-arm64/pmu_pmcr_n into kvmarm/nextGravatar Oliver Upton 1-1/+20
* kvm-arm64/pmu_pmcr_n: : User-defined PMC limit, courtesy Raghavendra Rao Ananta : : Certain VMMs may want to reserve some PMCs for host use while running a : KVM guest. This was a bit difficult before, as KVM advertised all : supported counters to the guest. Userspace can now limit the number of : advertised PMCs by writing to PMCR_EL0.N, as KVM's sysreg and PMU : emulation enforce the specified limit for handling guest accesses. KVM: selftests: aarch64: vPMU test for validating user accesses KVM: selftests: aarch64: vPMU register test for unimplemented counters KVM: selftests: aarch64: vPMU register test for implemented counters KVM: selftests: aarch64: Introduce vpmu_counter_access test tools: Import arm_pmuv3.h KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} KVM: arm64: PMU: Set PMCR_EL0.N for vCPU based on the associated PMU KVM: arm64: PMU: Add a helper to read a vCPU's PMCR_EL0 KVM: arm64: Select default PMU in KVM_ARM_VCPU_INIT handler KVM: arm64: PMU: Introduce helpers to set the guest's PMU Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/sgi-injection into kvmarm/nextGravatar Oliver Upton 1-2/+2
* kvm-arm64/sgi-injection: : vSGI injection improvements + fixes, courtesy Marc Zyngier : : Avoid linearly searching for vSGI targets using a compressed MPIDR to : index a cache. While at it, fix some egregious bugs in KVM's mishandling : of vcpuid (user-controlled value) and vcpu_idx. KVM: arm64: Clarify the ordering requirements for vcpu/RD creation KVM: arm64: vgic-v3: Optimize affinity-based SGI injection KVM: arm64: Fast-track kvm_mpidr_to_vcpu() when mpidr_data is available KVM: arm64: Build MPIDR to vcpu index cache at runtime KVM: arm64: Simplify kvm_vcpu_get_mpidr_aff() KVM: arm64: Use vcpu_idx for invalidation tracking KVM: arm64: vgic: Use vcpu_idx for the debug information KVM: arm64: vgic-v2: Use cpuid from userspace as vcpu_id KVM: arm64: vgic-v3: Refactor GICv3 SGI generation KVM: arm64: vgic-its: Treat the collection target address as a vcpu_id KVM: arm64: vgic: Make kvm_vgic_inject_irq() take a vcpu pointer Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/pmevtyper-filter into kvmarm/nextGravatar Oliver Upton 1-0/+5
* kvm-arm64/pmevtyper-filter: : Fixes to KVM's handling of the PMUv3 exception level filtering bits : : - NSH (count at EL2) and M (count at EL3) should be stateful when the : respective EL is advertised in the ID registers but have no effect on : event counting. : : - NSU and NSK modify the event filtering of EL0 and EL1, respectively. : Though the kernel may not use these bits, other KVM guests might. : Implement these bits exactly as written in the pseudocode if EL3 is : advertised. KVM: arm64: Add PMU event filter bits required if EL3 is implemented KVM: arm64: Make PMEVTYPER<n>_EL0.NSH RES0 if EL2 isn't advertised Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first runGravatar Raghavendra Rao Ananta 1-1/+2
For unimplemented counters, the registers PM{C,I}NTEN{SET,CLR} and PMOVS{SET,CLR} are expected to have the corresponding bits RAZ. Hence to ensure correct KVM's PMU emulation, mask out the RES0 bits. Defer this work to the point that userspace can no longer change the number of advertised PMCs. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231020214053.2144305-7-rananta@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24KVM: arm64: PMU: Set PMCR_EL0.N for vCPU based on the associated PMUGravatar Raghavendra Rao Ananta 1-0/+6
The number of PMU event counters is indicated in PMCR_EL0.N. For a vCPU with PMUv3 configured, the value is set to the same value as the current PE on every vCPU reset. Unless the vCPU is pinned to PEs that has the PMU associated to the guest from the initial vCPU reset, the value might be different from the PMU's PMCR_EL0.N on heterogeneous PMU systems. Fix this by setting the vCPU's PMCR_EL0.N to the PMU's PMCR_EL0.N value. Track the PMCR_EL0.N per guest, as only one PMU can be set for the guest (PMCR_EL0.N must be the same for all vCPUs of the guest), and it is convenient for updating the value. To achieve this, the patch introduces a helper, kvm_arm_pmu_get_max_counters(), that reads the maximum number of counters from the arm_pmu associated to the VM. Make the function global as upcoming patches will be interested to know the value while setting the PMCR.N of the guest from userspace. KVM does not yet support userspace modifying PMCR_EL0.N. The following patch will add support for that. Reviewed-by: Sebastian Ott <sebott@redhat.com> Co-developed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Link: https://lore.kernel.org/r/20231020214053.2144305-5-rananta@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24KVM: arm64: PMU: Add a helper to read a vCPU's PMCR_EL0Gravatar Reiji Watanabe 1-0/+6
Add a helper to read a vCPU's PMCR_EL0, and use it whenever KVM reads a vCPU's PMCR_EL0. Currently, the PMCR_EL0 value is tracked per vCPU. The following patches will make (only) PMCR_EL0.N track per guest. Having the new helper will be useful to combine the PMCR_EL0.N field (tracked per guest) and the other fields (tracked per vCPU) to provide the value of PMCR_EL0. No functional change intended. Reviewed-by: Sebastian Ott <sebott@redhat.com> Signed-off-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231020214053.2144305-4-rananta@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24KVM: arm64: Select default PMU in KVM_ARM_VCPU_INIT handlerGravatar Reiji Watanabe 1-0/+6
Future changes to KVM's sysreg emulation will rely on having a valid PMU instance to determine the number of implemented counters (PMCR_EL0.N). This is earlier than when userspace is expected to modify the vPMU device attributes, where the default is selected today. Select the default PMU when handling KVM_ARM_VCPU_INIT such that it is available in time for sysreg emulation. Reviewed-by: Sebastian Ott <sebott@redhat.com> Co-developed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Link: https://lore.kernel.org/r/20231020214053.2144305-3-rananta@google.com [Oliver: rewrite changelog] Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24KVM: arm64: Make PMEVTYPER<n>_EL0.NSH RES0 if EL2 isn't advertisedGravatar Oliver Upton 1-0/+5
The NSH bit, which filters event counting at EL2, is required by the architecture if an implementation has EL2. Even though KVM doesn't support nested virt yet, it makes no effort to hide the existence of EL2 from the ID registers. Userspace can, however, change the value of PFR0 to hide EL2. Align KVM's sysreg emulation with the architecture and make NSH RES0 if EL2 isn't advertised. Keep in mind the bit is ignored when constructing the backing perf event. While at it, build the event type mask using explicit field definitions instead of relying on ARMV8_PMU_EVTYPE_MASK. KVM probably should've been doing this in the first place, as it avoids changes to the aforementioned mask affecting sysreg emulation. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20231019185618.3442949-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-12KVM: arm64: timers: Correctly handle TGE flip with CNTPOFF_EL2Gravatar Marc Zyngier 1-0/+7
Contrary to common belief, HCR_EL2.TGE has a direct and immediate effect on the way the EL0 physical counter is offset. Flipping TGE from 1 to 0 while at EL2 immediately changes the way the counter compared to the CVAL limit. This means that we cannot directly save/restore the guest's view of CVAL, but that we instead must treat it as if CNTPOFF didn't exist. Only in the world switch, once we figure out that we do have CNTPOFF, can we must the offset back and forth depending on the polarity of TGE. Fixes: 2b4825a86940 ("KVM: arm64: timers: Use CNTPOFF_EL2 to offset the physical timer") Reported-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-09-30KVM: arm64: vgic: Make kvm_vgic_inject_irq() take a vcpu pointerGravatar Marc Zyngier 1-2/+2
Passing a vcpu_id to kvm_vgic_inject_irq() is silly for two reasons: - we often confuse vcpu_id and vcpu_idx - we eventually have to convert it back to a vcpu - we can't count Instead, pass a vcpu pointer, which is unambiguous. A NULL vcpu is also allowed for interrupts that are not private to a vcpu (such as SPIs). Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230927090911.3355209-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-09-21KVM: arm64: Get rid of vCPU-scoped feature bitmapGravatar Oliver Upton 2-2/+2
The vCPU-scoped feature bitmap was left in place a couple of releases ago in case the change to VM-scoped vCPU features broke anyone. Nobody has complained and the interop between VM and vCPU bitmaps is pretty gross. Throw it out. Link: https://lore.kernel.org/r/20230920195036.1169791-9-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-09-21KVM: arm64: Remove unused return value from kvm_reset_vcpu()Gravatar Oliver Upton 1-1/+1
Get rid of the return value for kvm_reset_vcpu() as there are no longer any cases where it returns a nonzero value. Link: https://lore.kernel.org/r/20230920195036.1169791-8-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-08-23KVM: arm64: pmu: Guard PMU emulation definitions with CONFIG_KVMGravatar Marc Zyngier 1-1/+1
Most of the internal definitions for PMU emulation are guarded with CONFIG_HW_PERF_EVENTS. However, this isn't enough, and leads to these definitions leaking if CONFIG_KVM isn't enabled. This leads to some compilation breakage in this exact configuration. Fix it by falling back to the dummy stubs if either perf or KVM isn't selected. Reported-by: Alexander Stein <alexander.stein@ew.tq-group.com> Tested-by: Alexander Stein <alexander.stein@ew.tq-group.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-08-22KVM: arm64: pmu: Resync EL0 state on counter rotationGravatar Marc Zyngier 1-0/+2
Huang Shijie reports that, when profiling a guest from the host with a number of events that exceeds the number of available counters, the reported counts are wildly inaccurate. Without the counter oversubscription, the reported counts are correct. Their investigation indicates that upon counter rotation (which takes place on the back of a timer interrupt), we fail to re-apply the guest EL0 enabling, leading to the counting of host events instead of guest events. In order to solve this, add yet another hook between the host PMU driver and KVM, re-applying the guest EL0 configuration if the right conditions apply (the host is VHE, we are in interrupt context, and we interrupted a running vcpu). This triggers a new vcpu request which will apply the correct configuration on guest reentry. With this, we have the correct counts, even when the counters are oversubscribed. Reported-by: Huang Shijie <shijie@os.amperecomputing.com> Suggested-by: Oliver Upton <oliver.upton@linux.dev> Tested_by: Huang Shijie <shijie@os.amperecomputing.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230809013953.7692-1-shijie@os.amperecomputing.com Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230820090108.177817-1-maz@kernel.org
2023-07-13KVM: arm64: vgic-v4: Make the doorbell request robust w.r.t preemptionGravatar Marc Zyngier 1-1/+1
Xiang reports that VMs occasionally fail to boot on GICv4.1 systems when running a preemptible kernel, as it is possible that a vCPU is blocked without requesting a doorbell interrupt. The issue is that any preemption that occurs between vgic_v4_put() and schedule() on the block path will mark the vPE as nonresident and *not* request a doorbell irq. This occurs because when the vcpu thread is resumed on its way to block, vcpu_load() will make the vPE resident again. Once the vcpu actually blocks, we don't request a doorbell anymore, and the vcpu won't be woken up on interrupt delivery. Fix it by tracking that we're entering WFI, and key the doorbell request on that flag. This allows us not to make the vPE resident when going through a preempt/schedule cycle, meaning we don't lose any state. Cc: stable@vger.kernel.org Fixes: 8e01d9a396e6 ("KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put") Reported-by: Xiang Chen <chenxiang66@hisilicon.com> Suggested-by: Zenghui Yu <yuzenghui@huawei.com> Tested-by: Xiang Chen <chenxiang66@hisilicon.com> Co-developed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20230713070657.3873244-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-07-01Merge tag 'kvm-x86-generic-6.5' of https://github.com/kvm-x86/linux into HEADGravatar Paolo Bonzini 1-6/+0
Common KVM changes for 6.5: - Fix unprotected vcpu->pid dereference via debugfs - Fix KVM_BUG() and KVM_BUG_ON() macros with 64-bit conditionals - Refactor failure path in kvm_io_bus_unregister_dev() to simplify the code - Misc cleanups
2023-06-15KVM: arm64: Rip out the vestiges of the 'old' ID register schemeGravatar Oliver Upton 1-2/+6
There's no longer a need for the baggage of the old scheme for handling configurable ID register fields. Rip it all out in favor of the generalized infrastructure. Link: https://lore.kernel.org/r/20230609190054.1542113-12-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-13KVM: destruct kvm_io_device while unregistering it from kvm_io_busGravatar Wei Wang 1-6/+0
Current usage of kvm_io_device requires users to destruct it with an extra call of kvm_iodevice_destructor after the device gets unregistered from kvm_io_bus. This is not necessary and can cause errors if a user forgot to make the extra call. Simplify the usage by combining kvm_iodevice_destructor into kvm_io_bus_unregister_dev. This reduces LOCs a bit for users and can avoid the leakage of destructing the device explicitly. Signed-off-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20230207123713.3905-2-wei.w.wang@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-12KVM: arm64: Rewrite IMPDEF PMU version as NIGravatar Oliver Upton 1-1/+1
KVM allows userspace to write an IMPDEF PMU version to the corresponding 32bit and 64bit ID register fields for the sake of backwards compatibility with kernels that lacked commit 3d0dba5764b9 ("KVM: arm64: PMU: Move the ID_AA64DFR0_EL1.PMUver limit to VM creation"). Plumbing that IMPDEF PMU version through to the gues is getting in the way of progress, and really doesn't any sense in the first place. Bite the bullet and reinterpret the IMPDEF PMU version as NI (0) for userspace writes. Additionally, spill the dirty details into a comment. Link: https://lore.kernel.org/r/20230609190054.1542113-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-05-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmGravatar Linus Torvalds 3-7/+34
Pull kvm updates from Paolo Bonzini: "s390: - More phys_to_virt conversions - Improvement of AP management for VSIE (nested virtualization) ARM64: - Numerous fixes for the pathological lock inversion issue that plagued KVM/arm64 since... forever. - New framework allowing SMCCC-compliant hypercalls to be forwarded to userspace, hopefully paving the way for some more features being moved to VMMs rather than be implemented in the kernel. - Large rework of the timer code to allow a VM-wide offset to be applied to both virtual and physical counters as well as a per-timer, per-vcpu offset that complements the global one. This last part allows the NV timer code to be implemented on top. - A small set of fixes to make sure that we don't change anything affecting the EL1&0 translation regime just after having having taken an exception to EL2 until we have executed a DSB. This ensures that speculative walks started in EL1&0 have completed. - The usual selftest fixes and improvements. x86: - Optimize CR0.WP toggling by avoiding an MMU reload when TDP is enabled, and by giving the guest control of CR0.WP when EPT is enabled on VMX (VMX-only because SVM doesn't support per-bit controls) - Add CR0/CR4 helpers to query single bits, and clean up related code where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long" return as a bool - Move AMD_PSFD to cpufeatures.h and purge KVM's definition - Avoid unnecessary writes+flushes when the guest is only adding new PTEs - Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s optimizations when emulating invalidations - Clean up the range-based flushing APIs - Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a single A/D bit using a LOCK AND instead of XCHG, and skip all of the "handle changed SPTE" overhead associated with writing the entire entry - Track the number of "tail" entries in a pte_list_desc to avoid having to walk (potentially) all descriptors during insertion and deletion, which gets quite expensive if the guest is spamming fork() - Disallow virtualizing legacy LBRs if architectural LBRs are available, the two are mutually exclusive in hardware - Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES) after KVM_RUN, similar to CPUID features - Overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES - Apply PMU filters to emulated events and add test coverage to the pmu_event_filter selftest - AMD SVM: - Add support for virtual NMIs - Fixes for edge cases related to virtual interrupts - Intel AMX: - Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is not being reported due to userspace not opting in via prctl() - Fix a bug in emulation of ENCLS in compatibility mode - Allow emulation of NOP and PAUSE for L2 - AMX selftests improvements - Misc cleanups MIPS: - Constify MIPS's internal callbacks (a leftover from the hardware enabling rework that landed in 6.3) Generic: - Drop unnecessary casts from "void *" throughout kvm_main.c - Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the struct size by 8 bytes on 64-bit kernels by utilizing a padding hole Documentation: - Fix goof introduced by the conversion to rST" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (211 commits) KVM: s390: pci: fix virtual-physical confusion on module unload/load KVM: s390: vsie: clarifications on setting the APCB KVM: s390: interrupt: fix virtual-physical confusion for next alert GISA KVM: arm64: Have kvm_psci_vcpu_on() use WRITE_ONCE() to update mp_state KVM: arm64: Acquire mp_state_lock in kvm_arch_vcpu_ioctl_vcpu_init() KVM: selftests: Test the PMU event "Instructions retired" KVM: selftests: Copy full counter values from guest in PMU event filter test KVM: selftests: Use error codes to signal errors in PMU event filter test KVM: selftests: Print detailed info in PMU event filter asserts KVM: selftests: Add helpers for PMC asserts in PMU event filter test KVM: selftests: Add a common helper for the PMU event filter guest code KVM: selftests: Fix spelling mistake "perrmited" -> "permitted" KVM: arm64: vhe: Drop extra isb() on guest exit KVM: arm64: vhe: Synchronise with page table walker on MMU update KVM: arm64: pkvm: Document the side effects of kvm_flush_dcache_to_poc() KVM: arm64: nvhe: Synchronise with page table walker on TLBI KVM: arm64: Handle 32bit CNTPCTSS traps KVM: arm64: nvhe: Synchronise with page table walker on vcpu run KVM: arm64: vgic: Don't acquire its_lock before config_lock KVM: selftests: Add test to verify KVM's supported XCR0 ...
2023-04-21Merge branch kvm-arm64/smccc-filtering into kvmarm-master/nextGravatar Marc Zyngier 1-1/+5
* kvm-arm64/smccc-filtering: : . : SMCCC call filtering and forwarding to userspace, courtesy of : Oliver Upton. From the cover letter: : : "The Arm SMCCC is rather prescriptive in regards to the allocation of : SMCCC function ID ranges. Many of the hypercall ranges have an : associated specification from Arm (FF-A, PSCI, SDEI, etc.) with some : room for vendor-specific implementations. : : The ever-expanding SMCCC surface leaves a lot of work within KVM for : providing new features. Furthermore, KVM implements its own : vendor-specific ABI, with little room for other implementations (like : Hyper-V, for example). Rather than cramming it all into the kernel we : should provide a way for userspace to handle hypercalls." : . KVM: selftests: Fix spelling mistake "KVM_HYPERCAL_EXIT_SMC" -> "KVM_HYPERCALL_EXIT_SMC" KVM: arm64: Test that SMC64 arch calls are reserved KVM: arm64: Prevent userspace from handling SMC64 arch range KVM: arm64: Expose SMC/HVC width to userspace KVM: selftests: Add test for SMCCC filter KVM: selftests: Add a helper for SMCCC calls with SMC instruction KVM: arm64: Let errors from SMCCC emulation to reach userspace KVM: arm64: Return NOT_SUPPORTED to guest for unknown PSCI version KVM: arm64: Introduce support for userspace SMCCC filtering KVM: arm64: Add support for KVM_EXIT_HYPERCALL KVM: arm64: Use a maple tree to represent the SMCCC filter KVM: arm64: Refactor hvc filtering to support different actions KVM: arm64: Start handling SMCs from EL1 KVM: arm64: Rename SMC/HVC call handler to reflect reality KVM: arm64: Add vm fd device attribute accessors KVM: arm64: Add a helper to check if a VM has ran once KVM: x86: Redefine 'longmode' as a flag for KVM_EXIT_HYPERCALL Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-04-21Merge branch kvm-arm64/timer-vm-offsets into kvmarm-master/nextGravatar Marc Zyngier 2-6/+29
* kvm-arm64/timer-vm-offsets: (21 commits) : . : This series aims at satisfying multiple goals: : : - allow a VMM to atomically restore a timer offset for a whole VM : instead of updating the offset each time a vcpu get its counter : written : : - allow a VMM to save/restore the physical timer context, something : that we cannot do at the moment due to the lack of offsetting : : - provide a framework that is suitable for NV support, where we get : both global and per timer, per vcpu offsetting, and manage : interrupts in a less braindead way. : : Conflict resolution involves using the new per-vcpu config lock instead : of the home-grown timer lock. : . KVM: arm64: Handle 32bit CNTPCTSS traps KVM: arm64: selftests: Augment existing timer test to handle variable offset KVM: arm64: selftests: Deal with spurious timer interrupts KVM: arm64: selftests: Add physical timer registers to the sysreg list KVM: arm64: nv: timers: Support hyp timer emulation KVM: arm64: nv: timers: Add a per-timer, per-vcpu offset KVM: arm64: Document KVM_ARM_SET_CNT_OFFSETS and co KVM: arm64: timers: Abstract the number of valid timers per vcpu KVM: arm64: timers: Fast-track CNTPCT_EL0 trap handling KVM: arm64: Elide kern_hyp_va() in VHE-specific parts of the hypervisor KVM: arm64: timers: Move the timer IRQs into arch_timer_vm_data KVM: arm64: timers: Abstract per-timer IRQ access KVM: arm64: timers: Rationalise per-vcpu timer init KVM: arm64: timers: Allow save/restoring of the physical timer KVM: arm64: timers: Allow userspace to set the global counter offset KVM: arm64: Expose {un,}lock_all_vcpus() to the rest of KVM KVM: arm64: timers: Allow physical offset without CNTPOFF_EL2 KVM: arm64: timers: Use CNTPOFF_EL2 to offset the physical timer arm64: Add HAS_ECV_CNTPOFF capability arm64: Add CNTPOFF_EL2 register definition ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-04-05KVM: arm64: Introduce support for userspace SMCCC filteringGravatar Oliver Upton 1-0/+3
As the SMCCC (and related specifications) march towards an 'everything and the kitchen sink' interface for interacting with a system it becomes less likely that KVM will support every related feature. We could do better by letting userspace have a crack at it instead. Allow userspace to define an 'SMCCC filter' that applies to both HVCs and SMCs initiated by the guest. Supporting both conduits with this interface is important for a couple of reasons. Guest SMC usage is table stakes for a nested guest, as HVCs are always taken to the virtual EL2. Additionally, guests may want to interact with a service on the secure side which can now be proxied by userspace. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230404154050.2270077-10-oliver.upton@linux.dev
2023-04-05KVM: arm64: Use a maple tree to represent the SMCCC filterGravatar Oliver Upton 1-0/+1
Maple tree is an efficient B-tree implementation that is intended for storing non-overlapping intervals. Such a data structure is a good fit for the SMCCC filter as it is desirable to sparsely allocate the 32 bit function ID space. To that end, add a maple tree to kvm_arch and correctly init/teardown along with the VM. Wire in a test against the hypercall filter for HVCs which does nothing until the controls are exposed to userspace. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230404154050.2270077-8-oliver.upton@linux.dev
2023-04-05KVM: arm64: Rename SMC/HVC call handler to reflect realityGravatar Oliver Upton 1-1/+1
KVM handles SMCCC calls from virtual EL2 that use the SMC instruction since commit bd36b1a9eb5a ("KVM: arm64: nv: Handle SMCs taken from virtual EL2"). Thus, the function name of the handler no longer reflects reality. Normalize the name on SMCCC, since that's the only hypercall interface KVM supports in the first place. No fuctional change intended. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230404154050.2270077-5-oliver.upton@linux.dev
2023-03-30KVM: arm64: nv: timers: Support hyp timer emulationGravatar Marc Zyngier 2-2/+8
Emulating EL2 also means emulating the EL2 timers. To do so, we expand our timer framework to deal with at most 4 timers. At any given time, two timers are using the HW timers, and the two others are purely emulated. The role of deciding which is which at any given time is left to a mapping function which is called every time we need to make such a decision. Reviewed-by: Colton Lewis <coltonlewis@google.com> Co-developed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230330174800.2677007-18-maz@kernel.org
2023-03-30KVM: arm64: nv: timers: Add a per-timer, per-vcpu offsetGravatar Marc Zyngier 1-0/+5
Being able to set a global offset isn't enough. With NV, we also need to a per-vcpu, per-timer offset (for example, CNTVCT_EL0 being offset by CNTVOFF_EL2). Use a similar method as the VM-wide offset to have a timer point to the shadow register that contains the offset value. Reviewed-by: Colton Lewis <coltonlewis@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230330174800.2677007-17-maz@kernel.org
2023-03-30KVM: arm64: timers: Move the timer IRQs into arch_timer_vm_dataGravatar Marc Zyngier 1-4/+14
Having the timer IRQs duplicated into each vcpu isn't great, and becomes absolutely awful with NV. So let's move these into the per-VM arch_timer_vm_data structure. This simplifies a lot of code, but requires us to introduce a mutex so that we can reason about userspace trying to change an interrupt number while another vcpu is running, something that wasn't really well handled so far. Reviewed-by: Colton Lewis <coltonlewis@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230330174800.2677007-12-maz@kernel.org
2023-03-30KVM: arm64: timers: Abstract per-timer IRQ accessGravatar Marc Zyngier 1-0/+2
As we are about to move the location of the per-timer IRQ into the VM structure, abstract the location of the IRQ behind an accessor. This will make the repainting sligntly less painful. Reviewed-by: Colton Lewis <coltonlewis@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230330174800.2677007-11-maz@kernel.org
2023-03-30KVM: arm64: timers: Rationalise per-vcpu timer initGravatar Marc Zyngier 1-1/+0
The way we initialise our timer contexts may be satisfactory for two timers, but will be getting pretty annoying with four. Cleanup the whole thing by removing the code duplication and getting rid of unused IRQ configuration elements. Reviewed-by: Colton Lewis <coltonlewis@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230330174800.2677007-10-maz@kernel.org
2023-03-30KVM: arm64: timers: Use CNTPOFF_EL2 to offset the physical timerGravatar Marc Zyngier 1-0/+2
With ECV and CNTPOFF_EL2, it is very easy to offer an offset for the physical timer. So let's do just that. Nothing can set the offset yet, so this should have no effect whatsoever (famous last words...). Reviewed-by: Colton Lewis <coltonlewis@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230330174800.2677007-5-maz@kernel.org
2023-03-30KVM: arm64: timers: Use a per-vcpu, per-timer accumulator for fractional nsGravatar Marc Zyngier 1-0/+1
Instead of accumulating the fractional ns value generated every time we compute a ns delta in a global variable, use a per-vcpu, per-timer variable. This keeps the fractional ns local to the timer instead of contributing to any odd, unrelated timer. Reviewed-by: Colton Lewis <coltonlewis@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230330174800.2677007-2-maz@kernel.org
2023-03-27arm64: perf: Move PMUv3 driver to drivers/perfGravatar Marc Zyngier 1-1/+1
Having the ARM PMUv3 driver sitting in arch/arm64/kernel is getting in the way of being able to use perf on ARMv8 cores running a 32bit kernel, such as 32bit KVM guests. This patch moves it into drivers/perf/arm_pmuv3.c, with an include file in include/linux/perf/arm_pmuv3.h. The only thing left in arch/arm64 is some mundane perf stuff. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Zaid Al-Bassam <zalbassam@google.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230317195027.3746949-2-zalbassam@google.com Signed-off-by: Will Deacon <will@kernel.org>