Skip to content
Snippets Groups Projects
  1. Dec 21, 2018
  2. Dec 19, 2018
    • Marc Zyngier's avatar
      arm/arm64: KVM: Add ARM_EXCEPTION_IS_TRAP macro · 58466766
      Marc Zyngier authored
      
      32 and 64bit use different symbols to identify the traps.
      32bit has a fine grained approach (prefetch abort, data abort and HVC),
      while 64bit is pretty happy with just "trap".
      
      This has been fine so far, except that we now need to decode some
      of that in tracepoints that are common to both architectures.
      
      Introduce ARM_EXCEPTION_IS_TRAP which abstracts the trap symbols
      and make the tracepoint use it.
      
      Acked-by: default avatarChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      58466766
    • Christoffer Dall's avatar
      KVM: arm/arm64: Fix unintended stage 2 PMD mappings · 6794ad54
      Christoffer Dall authored
      
      There are two things we need to take care of when we create block
      mappings in the stage 2 page tables:
      
        (1) The alignment within a PMD between the host address range and the
        guest IPA range must be the same, since otherwise we end up mapping
        pages with the wrong offset.
      
        (2) The head and tail of a memory slot may not cover a full block
        size, and we have to take care to not map those with block
        descriptors, since we could expose memory to the guest that the host
        did not intend to expose.
      
      So far, we have been taking care of (1), but not (2), and our commentary
      describing (1) was somewhat confusing.
      
      This commit attempts to factor out the checks of both into a common
      function, and if we don't pass the check, we won't attempt any PMD
      mappings for neither hugetlbfs nor THP.
      
      Note that we used to only check the alignment for THP, not for
      hugetlbfs, but as far as I can tell the check needs to be applied to
      both scenarios.
      
      Cc: Ralph Palutke <ralph.palutke@fau.de>
      Cc: Lukas Braun <koomi@moshbit.net>
      Reported-by: default avatarLukas Braun <koomi@moshbit.net>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      6794ad54
    • Marc Zyngier's avatar
      arm/arm64: KVM: vgic: Force VM halt when changing the active state of GICv3 PPIs/SGIs · 107352a2
      Marc Zyngier authored
      
      We currently only halt the guest when a vCPU messes with the active
      state of an SPI. This is perfectly fine for GICv2, but isn't enough
      for GICv3, where all vCPUs can access the state of any other vCPU.
      
      Let's broaden the condition to include any GICv3 interrupt that
      has an active state (i.e. all but LPIs).
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      107352a2
    • Christoffer Dall's avatar
      KVM: arm/arm64: arch_timer: Simplify kvm_timer_vcpu_terminate · 6e14ef1d
      Christoffer Dall authored
      
      kvm_timer_vcpu_terminate can only be called in two scenarios:
      
       1. As part of cleanup during a failed VCPU create
       2. As part of freeing the whole VM (struct kvm refcount == 0)
      
      In the first case, we cannot have programmed any timers or mapped any
      IRQs, and therefore we do not have to cancel anything or unmap anything.
      
      In the second case, the VCPU will have gone through kvm_timer_vcpu_put,
      which will have canceled the emulated physical timer's hrtimer, and we
      do not need to that here as well.  We also do not care if the irq is
      recorded as mapped or not in the VGIC data structure, because the whole
      VM is going away.  That leaves us only with having to ensure that we
      cancel the bg_timer if we were blocking the last time we called
      kvm_timer_vcpu_put().
      
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      6e14ef1d
    • Christoffer Dall's avatar
      KVM: arm/arm64: Remove arch timer workqueue · 8a411b06
      Christoffer Dall authored
      
      The use of a work queue in the hrtimer expire function for the bg_timer
      is a leftover from the time when we would inject interrupts when the
      bg_timer expired.
      
      Since we are no longer doing that, we can instead call
      kvm_vcpu_wake_up() directly from the hrtimer function and remove all
      workqueue functionality from the arch timer code.
      
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      8a411b06
    • Christoffer Dall's avatar
      KVM: arm/arm64: Fixup the kvm_exit tracepoint · 71a7e47f
      Christoffer Dall authored
      
      The kvm_exit tracepoint strangely always reported exits as being IRQs.
      This seems to be because either the __print_symbolic or the tracepoint
      macros use a variable named idx.
      
      Take this chance to update the fields in the tracepoint to reflect the
      concepts in the arm64 architecture that we pass to the tracepoint and
      move the exception type table to the same location and header files as
      the exits code.
      
      We also clear out the exception code to 0 for IRQ exits (which
      translates to UNKNOWN in text) to make it slighyly less confusing to
      parse the trace output.
      
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      71a7e47f
    • Christoffer Dall's avatar
      KVM: arm/arm64: vgic: Consider priority and active state for pending irq · 9009782a
      Christoffer Dall authored
      
      When checking if there are any pending IRQs for the VM, consider the
      active state and priority of the IRQs as well.
      
      Otherwise we could be continuously scheduling a guest hypervisor without
      it seeing an IRQ.
      
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      9009782a
    • Gustavo A. R. Silva's avatar
      KVM: arm/arm64: vgic: Fix off-by-one bug in vgic_get_irq() · c23b2e6f
      Gustavo A. R. Silva authored
      
      When using the nospec API, it should be taken into account that:
      
      "...if the CPU speculates past the bounds check then
       * array_index_nospec() will clamp the index within the range of [0,
       * size)."
      
      The above is part of the header for macro array_index_nospec() in
      linux/nospec.h
      
      Now, in this particular case, if intid evaluates to exactly VGIC_MAX_SPI
      or to exaclty VGIC_MAX_PRIVATE, the array_index_nospec() macro ends up
      returning VGIC_MAX_SPI - 1 or VGIC_MAX_PRIVATE - 1 respectively, instead
      of VGIC_MAX_SPI or VGIC_MAX_PRIVATE, which, based on the original logic:
      
      	/* SGIs and PPIs */
      	if (intid <= VGIC_MAX_PRIVATE)
       		return &vcpu->arch.vgic_cpu.private_irqs[intid];
      
       	/* SPIs */
      	if (intid <= VGIC_MAX_SPI)
       		return &kvm->arch.vgic.spis[intid - VGIC_NR_PRIVATE_IRQS];
      
      are valid values for intid.
      
      Fix this by calling array_index_nospec() macro with VGIC_MAX_PRIVATE + 1
      and VGIC_MAX_SPI + 1 as arguments for its parameter size.
      
      Fixes: 41b87599 ("KVM: arm/arm64: vgic: fix possible spectre-v1 in vgic_get_irq()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      [dropped the SPI part which was fixed separately]
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      c23b2e6f
  3. Dec 18, 2018
  4. Dec 14, 2018
    • Paolo Bonzini's avatar
      kvm: introduce manual dirty log reprotect · 2a31b9db
      Paolo Bonzini authored
      
      There are two problems with KVM_GET_DIRTY_LOG.  First, and less important,
      it can take kvm->mmu_lock for an extended period of time.  Second, its user
      can actually see many false positives in some cases.  The latter is due
      to a benign race like this:
      
        1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
           them.
        2. The guest modifies the pages, causing them to be marked ditry.
        3. Userspace actually copies the pages.
        4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
           they were not written to since (3).
      
      This is especially a problem for large guests, where the time between
      (1) and (3) can be substantial.  This patch introduces a new
      capability which, when enabled, makes KVM_GET_DIRTY_LOG not
      write-protect the pages it returns.  Instead, userspace has to
      explicitly clear the dirty log bits just before using the content
      of the page.  The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
      64-page granularity rather than requiring to sync a full memslot;
      this way, the mmu_lock is taken for small amounts of time, and
      only a small amount of time will pass between write protection
      of pages and the sending of their content.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2a31b9db
    • Paolo Bonzini's avatar
      kvm: rename last argument to kvm_get_dirty_log_protect · 8fe65a82
      Paolo Bonzini authored
      
      When manual dirty log reprotect will be enabled, kvm_get_dirty_log_protect's
      pointer argument will always be false on exit, because no TLB flush is needed
      until the manual re-protection operation.  Rename it from "is_dirty" to "flush",
      which more accurately tells the caller what they have to do with it.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8fe65a82
    • Paolo Bonzini's avatar
      kvm: make KVM_CAP_ENABLE_CAP_VM architecture agnostic · e5d83c74
      Paolo Bonzini authored
      
      The first such capability to be handled in virt/kvm/ will be manual
      dirty page reprotection.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e5d83c74
  5. Oct 26, 2018
  6. Oct 18, 2018
  7. Oct 17, 2018
    • Mark Rutland's avatar
      KVM: arm64: Fix caching of host MDCR_EL2 value · da5a3ce6
      Mark Rutland authored
      
      At boot time, KVM stashes the host MDCR_EL2 value, but only does this
      when the kernel is not running in hyp mode (i.e. is non-VHE). In these
      cases, the stashed value of MDCR_EL2.HPMN happens to be zero, which can
      lead to CONSTRAINED UNPREDICTABLE behaviour.
      
      Since we use this value to derive the MDCR_EL2 value when switching
      to/from a guest, after a guest have been run, the performance counters
      do not behave as expected. This has been observed to result in accesses
      via PMXEVTYPER_EL0 and PMXEVCNTR_EL0 not affecting the relevant
      counters, resulting in events not being counted. In these cases, only
      the fixed-purpose cycle counter appears to work as expected.
      
      Fix this by always stashing the host MDCR_EL2 value, regardless of VHE.
      
      Cc: Christopher Dall <christoffer.dall@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: stable@vger.kernel.org
      Fixes: 1e947bad ("arm64: KVM: Skip HYP setup when already running in HYP")
      Tested-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      da5a3ce6
  8. Oct 16, 2018
  9. Oct 03, 2018
Loading