1. 13 May, 2020 7 commits
    • Sean Christopherson's avatar
      KVM: VMX: Add proper cache tracking for CR0 · bd31fe49
      Sean Christopherson authored
      
      
      Move CR0 caching into the standard register caching mechanism in order
      to take advantage of the availability checks provided by regs_avail.
      This avoids multiple VMREADs in the (uncommon) case where kvm_read_cr0()
      is called multiple times in a single VM-Exit, and more importantly
      eliminates a kvm_x86_ops hook, saves a retpoline on SVM when reading
      CR0, and squashes the confusing naming discrepancy of "cache_reg" vs.
      "decache_cr0_guest_bits".
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200502043234.12481-8-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bd31fe49
    • Sean Christopherson's avatar
      KVM: VMX: Add proper cache tracking for CR4 · f98c1e77
      Sean Christopherson authored
      
      
      Move CR4 caching into the standard register caching mechanism in order
      to take advantage of the availability checks provided by regs_avail.
      This avoids multiple VMREADs and retpolines (when configured) during
      nested VMX transitions as kvm_read_cr4_bits() is invoked multiple times
      on each transition, e.g. when stuffing CR0 and CR3.
      
      As an added bonus, this eliminates a kvm_x86_ops hook, saves a retpoline
      on SVM when reading CR4, and squashes the confusing naming discrepancy
      of "cache_reg" vs. "decache_cr4_guest_bits".
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200502043234.12481-7-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f98c1e77
    • Sean Christopherson's avatar
      KVM: x86: Save L1 TSC offset in 'struct kvm_vcpu_arch' · 56ba77a4
      Sean Christopherson authored
      
      
      Save L1's TSC offset in 'struct kvm_vcpu_arch' and drop the kvm_x86_ops
      hook read_l1_tsc_offset().  This avoids a retpoline (when configured)
      when reading L1's effective TSC, which is done at least once on every
      VM-Exit.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200502043234.12481-2-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      56ba77a4
    • Paolo Bonzini's avatar
      KVM: x86: Replace late check_nested_events() hack with more precise fix · c300ab9f
      Paolo Bonzini authored
      
      
      Add an argument to interrupt_allowed and nmi_allowed, to checking if
      interrupt injection is blocked.  Use the hook to handle the case where
      an interrupt arrives between check_nested_events() and the injection
      logic.  Drop the retry of check_nested_events() that hack-a-fixed the
      same condition.
      
      Blocking injection is also a bit of a hack, e.g. KVM should do exiting
      and non-exiting interrupt processing in a single pass, but it's a more
      precise hack.  The old comment is also misleading, e.g. KVM_REQ_EVENT is
      purely an optimization, setting it on every run loop (which KVM doesn't
      do) should not affect functionality, only performance.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200423022550.15113-13-sean.j.christopherson@intel.com>
      [Extend to SVM, add SMI and NMI.  Even though NMI and SMI cannot come
       asynchronously right now, making the fix generic is easy and removes a
       special case. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c300ab9f
    • Sean Christopherson's avatar
      KVM: x86: Make return for {interrupt_nmi,smi}_allowed() a bool instead of int · 88c604b6
      Sean Christopherson authored
      
      
      Return an actual bool for kvm_x86_ops' {interrupt_nmi}_allowed() hook to
      better reflect the return semantics, and to avoid creating an even
      bigger mess when the related VMX code is refactored in upcoming patches.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200423022550.15113-5-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      88c604b6
    • Sean Christopherson's avatar
      KVM: nVMX: Open a window for pending nested VMX preemption timer · d2060bd4
      Sean Christopherson authored
      Add a kvm_x86_ops hook to detect a nested pending "hypervisor timer" and
      use it to effectively open a window for servicing the expired timer.
      Like pending SMIs on VMX, opening a window simply means requesting an
      immediate exit.
      
      This fixes a bug where an expired VMX preemption timer (for L2) will be
      delayed and/or lost if a pending exception is injected into L2.  The
      pending exception is rightly prioritized by vmx_check_nested_events()
      and injected into L2, with the preemption timer left pending.  Because
      no window opened, L2 is free to run uninterrupted.
      
      Fixes: f4124500
      
       ("KVM: nVMX: Fully emulate preemption timer")
      Reported-by: default avatarJim Mattson <jmattson@google.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Peter Shier <pshier@google.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200423022550.15113-3-sean.j.christopherson@intel.com>
      [Check it in kvm_vcpu_has_events too, to ensure that the preemption
       timer is serviced promptly even if the vCPU is halted and L1 is not
       intercepting HLT. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d2060bd4
    • Babu Moger's avatar
      KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c · 37486135
      Babu Moger authored
      
      
      Though rdpkru and wrpkru are contingent upon CR4.PKE, the PKRU
      resource isn't. It can be read with XSAVE and written with XRSTOR.
      So, if we don't set the guest PKRU value here(kvm_load_guest_xsave_state),
      the guest can read the host value.
      
      In case of kvm_load_host_xsave_state, guest with CR4.PKE clear could
      potentially use XRSTOR to change the host PKRU value.
      
      While at it, move pkru state save/restore to common code and the
      host_pkru field to kvm_vcpu_arch.  This will let SVM support protection keys.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarBabu Moger <babu.moger@amd.com>
      Message-Id: <158932794619.44260.14508381096663848853.stgit@naples-babu.amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      37486135
  2. 08 May, 2020 2 commits
    • Paolo Bonzini's avatar
      KVM: x86, SVM: isolate vcpu->arch.dr6 from vmcb->save.dr6 · d67668e9
      Paolo Bonzini authored
      
      
      There are two issues with KVM_EXIT_DEBUG on AMD, whose root cause is the
      different handling of DR6 on intercepted #DB exceptions on Intel and AMD.
      
      On Intel, #DB exceptions transmit the DR6 value via the exit qualification
      field of the VMCS, and the exit qualification only contains the description
      of the precise event that caused a vmexit.
      
      On AMD, instead the DR6 field of the VMCB is filled in as if the #DB exception
      was to be injected into the guest.  This has two effects when guest debugging
      is in use:
      
      * the guest DR6 is clobbered
      
      * the kvm_run->debug.arch.dr6 field can accumulate more debug events, rather
      than just the last one that happened (the testcase in the next patch covers
      this issue).
      
      This patch fixes both issues by emulating, so to speak, the Intel behavior
      on AMD processors.  The important observation is that (after the previous
      patches) the VMCB value of DR6 is only ever observable from the guest is
      KVM_DEBUGREG_WONT_EXIT is set.  Therefore we can actually set vmcb->save.dr6
      to any value we want as long as KVM_DEBUGREG_WONT_EXIT is clear, which it
      will be if guest debugging is enabled.
      
      Therefore it is possible to enter the guest with an all-zero DR6,
      reconstruct the #DB payload from the DR6 we get at exit time, and let
      kvm_deliver_exception_payload move the newly set bits into vcpu->arch.dr6.
      Some extra bits may be included in the payload if KVM_DEBUGREG_WONT_EXIT
      is set, but this is harmless.
      
      This may not be the most optimized way to deal with this, but it is
      simple and, being confined within SVM code, it gets rid of the set_dr6
      callback and kvm_update_dr6.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d67668e9
    • Paolo Bonzini's avatar
      KVM: SVM: keep DR6 synchronized with vcpu->arch.dr6 · 5679b803
      Paolo Bonzini authored
      
      
      kvm_x86_ops.set_dr6 is only ever called with vcpu->arch.dr6 as the
      second argument.  Ensure that the VMCB value is synchronized to
      vcpu->arch.dr6 on #DB (both "normal" and nested) and nested vmentry, so
      that the current value of DR6 is always available in vcpu->arch.dr6.
      The get_dr6 callback can just access vcpu->arch.dr6 and becomes redundant.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5679b803
  3. 07 May, 2020 1 commit
  4. 04 May, 2020 1 commit
  5. 23 Apr, 2020 1 commit
    • Paolo Bonzini's avatar
      KVM: x86: move nested-related kvm_x86_ops to a separate struct · 33b22172
      Paolo Bonzini authored
      
      
      Clean up some of the patching of kvm_x86_ops, by moving kvm_x86_ops related to
      nested virtualization into a separate struct.
      
      As a result, these ops will always be non-NULL on VMX.  This is not a problem:
      
      * check_nested_events is only called if is_guest_mode(vcpu) returns true
      
      * get_nested_state treats VMXOFF state the same as nested being disabled
      
      * set_nested_state fails if you attempt to set nested state while
        nesting is disabled
      
      * nested_enable_evmcs could already be called on a CPU without VMX enabled
        in CPUID.
      
      * nested_get_evmcs_version was fixed in the previous patch
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      33b22172
  6. 21 Apr, 2020 10 commits
  7. 20 Apr, 2020 2 commits
    • Sean Christopherson's avatar
      KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook · e64419d9
      Sean Christopherson authored
      
      
      Add a dedicated hook to handle flushing TLB entries on behalf of the
      guest, i.e. for a paravirtualized TLB flush, and use it directly instead
      of bouncing through kvm_vcpu_flush_tlb().
      
      For VMX, change the effective implementation implementation to never do
      INVEPT and flush only the current context, i.e. to always flush via
      INVVPID(SINGLE_CONTEXT).  The INVEPT performed by __vmx_flush_tlb() when
      @invalidate_gpa=false and enable_vpid=0 is unnecessary, as it will only
      flush guest-physical mappings; linear and combined mappings are flushed
      by VM-Enter when VPID is disabled, and changes in the guest pages tables
      do not affect guest-physical mappings.
      
      When EPT and VPID are enabled, doing INVVPID is not required (by Intel's
      architecture) to invalidate guest-physical mappings, i.e. TLB entries
      that cache guest-physical mappings can live across INVVPID as the
      mappings are associated with an EPTP, not a VPID.  The intent of
      @invalidate_gpa is to inform vmx_flush_tlb() that it must "invalidate
      gpa mappings", i.e. do INVEPT and not simply INVVPID.  Other than nested
      VPID handling, which now calls vpid_sync_context() directly, the only
      scenario where KVM can safely do INVVPID instead of INVEPT (when EPT is
      enabled) is if KVM is flushing TLB entries from the guest's perspective,
      i.e. is only required to invalidate linear mappings.
      
      For SVM, flushing TLB entries from the guest's perspective can be done
      by flushing the current ASID, as changes to the guest's page tables are
      associated only with the current ASID.
      
      Adding a dedicated ->tlb_flush_guest() paves the way toward removing
      @invalidate_gpa, which is a potentially dangerous control flag as its
      meaning is not exactly crystal clear, even for those who are familiar
      with the subtleties of what mappings Intel CPUs are/aren't allowed to
      keep across various invalidation scenarios.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200320212833.3507-15-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e64419d9
    • Paolo Bonzini's avatar
      KVM: x86: introduce kvm_mmu_invalidate_gva · 5efac074
      Paolo Bonzini authored
      
      
      Wrap the combination of mmu->invlpg and kvm_x86_ops->tlb_flush_gva
      into a new function.  This function also lets us specify the host PGD to
      invalidate and also the MMU, both of which will be useful in fixing and
      simplifying kvm_inject_emulated_page_fault.
      
      A nested guest's MMU however has g_context->invlpg == NULL.  Instead of
      setting it to nonpaging_invlpg, make kvm_mmu_invalidate_gva the only
      entry point to mmu->invlpg and make a NULL invlpg pointer equivalent
      to nonpaging_invlpg, saving a retpoline.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5efac074
  8. 15 Apr, 2020 1 commit
  9. 31 Mar, 2020 3 commits
  10. 16 Mar, 2020 12 commits