1. 16 Mar, 2020 10 commits
  2. 23 Feb, 2020 3 commits
  3. 21 Feb, 2020 1 commit
    • Vitaly Kuznetsov's avatar
      KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only when... · a4443267
      Vitaly Kuznetsov authored
      
      KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only when apicv is globally disabled
      
      When apicv is disabled on a vCPU (e.g. by enabling KVM_CAP_HYPERV_SYNIC*),
      nothing happens to VMX MSRs on the already existing vCPUs, however, all new
      ones are created with PIN_BASED_POSTED_INTR filtered out. This is very
      confusing and results in the following picture inside the guest:
      
      $ rdmsr -ax 0x48d
      ff00000016
      7f00000016
      7f00000016
      7f00000016
      
      This is observed with QEMU and 4-vCPU guest: QEMU creates vCPU0, does
      KVM_CAP_HYPERV_SYNIC2 and then creates the remaining three.
      
      L1 hypervisor may only check CPU0's controls to find out what features
      are available and it will be very confused later. Switch to setting
      PIN_BASED_POSTED_INTR control based on global 'enable_apicv' setting.
      
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a4443267
  4. 17 Feb, 2020 1 commit
  5. 12 Feb, 2020 2 commits
  6. 05 Feb, 2020 3 commits
  7. 27 Jan, 2020 3 commits
    • Krish Sadhukhan's avatar
      KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests · b91991bf
      Krish Sadhukhan authored
      
      
      According to section "Checks on Guest Control Registers, Debug Registers, and
      and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry
      of nested guests:
      
          If the "load debug controls" VM-entry control is 1, bits 63:32 in the DR7
          field must be 0.
      
      In KVM, GUEST_DR7 is set prior to the vmcs02 VM-entry by kvm_set_dr() and the
      latter synthesizes a #GP if any bit in the high dword in the former is set.
      Hence this field needs to be checked in software.
      
      Signed-off-by: default avatarKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Reviewed-by: default avatarKarl Heubaum <karl.heubaum@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b91991bf
    • Sean Christopherson's avatar
      KVM: x86: Perform non-canonical checks in 32-bit KVM · de761ea7
      Sean Christopherson authored
      
      
      Remove the CONFIG_X86_64 condition from the low level non-canonical
      helpers to effectively enable non-canonical checks on 32-bit KVM.
      Non-canonical checks are performed by hardware if the CPU *supports*
      64-bit mode, whether or not the CPU is actually in 64-bit mode is
      irrelevant.
      
      For the most part, skipping non-canonical checks on 32-bit KVM is ok-ish
      because 32-bit KVM always (hopefully) drops bits 63:32 of whatever value
      it's checking before propagating it to hardware, and architecturally,
      the expected behavior for the guest is a bit of a grey area since the
      vCPU itself doesn't support 64-bit mode.  I.e. a 32-bit KVM guest can
      observe the missed checks in several paths, e.g. INVVPID and VM-Enter,
      but it's debatable whether or not the missed checks constitute a bug
      because technically the vCPU doesn't support 64-bit mode.
      
      The primary motivation for enabling the non-canonical checks is defense
      in depth.  As mentioned above, a guest can trigger a missed check via
      INVVPID or VM-Enter.  INVVPID is straightforward as it takes a 64-bit
      virtual address as part of its 128-bit INVVPID descriptor and fails if
      the address is non-canonical, even if INVVPID is executed in 32-bit PM.
      Nested VM-Enter is a bit more convoluted as it requires the guest to
      write natural width VMCS fields via memory accesses and then VMPTRLD the
      VMCS, but it's still possible.  In both cases, KVM is saved from a true
      bug only because its flows that propagate values to hardware (correctly)
      take "unsigned long" parameters and so drop bits 63:32 of the bad value.
      
      Explicitly performing the non-canonical checks makes it less likely that
      a bad value will be propagated to hardware, e.g. in the INVVPID case,
      if __invvpid() didn't implicitly drop bits 63:32 then KVM would BUG() on
      the resulting unexpected INVVPID failure due to hardware rejecting the
      non-canonical address.
      
      The only downside to enabling the non-canonical checks is that it adds a
      relatively small amount of overhead, but the affected flows are not hot
      paths, i.e. the overhead is negligible.
      
      Note, KVM technically could gate the non-canonical checks on 32-bit KVM
      with static_cpu_has(X86_FEATURE_LM), but on bare metal that's an even
      bigger waste of code for everyone except the 0.00000000000001% of the
      population running on Yonah, and nested 32-bit on 64-bit already fudges
      things with respect to 64-bit CPU behavior.
      
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      [Also do so in nested_vmx_check_host_state as reported by Krish. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      de761ea7
    • Oliver Upton's avatar
      KVM: nVMX: WARN on failure to set IA32_PERF_GLOBAL_CTRL · d1968421
      Oliver Upton authored
      
      
      Writes to MSR_CORE_PERF_GLOBAL_CONTROL should never fail if the VM-exit
      and VM-entry controls are exposed to L1. Promote the checks to perform a
      full WARN if kvm_set_msr() fails and remove the now unused macro
      SET_MSR_OR_WARN().
      
      Suggested-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarOliver Upton <oupton@google.com>
      Reviewed-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d1968421
  8. 21 Jan, 2020 3 commits
  9. 13 Jan, 2020 1 commit
    • Sean Christopherson's avatar
      x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSR · 32ad73db
      Sean Christopherson authored
      
      
      As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL
      are quite a mouthful, especially the VMX bits which must differentiate
      between enabling VMX inside and outside SMX (TXT) operation.  Rename the
      MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to
      make them a little friendlier on the eyes.
      
      Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name
      to match Intel's SDM, but a future patch will add a dedicated Kconfig,
      file and functions for the MSR. Using the full name for those assets is
      rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its
      nomenclature is consistent throughout the kernel.
      
      Opportunistically, fix a few other annoyances with the defines:
      
        - Relocate the bit defines so that they immediately follow the MSR
          define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL.
        - Add whitespace around the block of feature control defines to make
          it clear they're all related.
        - Use BIT() instead of manually encoding the bit shift.
        - Use "VMX" instead of "VMXON" to match the SDM.
        - Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to
          be consistent with the kernel's verbiage used for all other feature
          control bits.  Note, the SDM refers to the LMCE bit as LMCE_ON,
          likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN.  Ignore
          the (literal) one-off usage of _ON, the SDM is simply "wrong".
      
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
      32ad73db
  10. 08 Jan, 2020 6 commits
  11. 21 Nov, 2019 2 commits
    • Liran Alon's avatar
      KVM: nVMX: Remove unnecessary TLB flushes on L1<->L2 switches when L1 use apic-access-page · 0155b2b9
      Liran Alon authored
      According to Intel SDM section 28.3.3.3/28.3.3.4 Guidelines for Use
      of the INVVPID/INVEPT Instruction, the hypervisor needs to execute
      INVVPID/INVEPT X in case CPU executes VMEntry with VPID/EPTP X and
      either: "Virtualize APIC accesses" VM-execution control was changed
      from 0 to 1, OR the value of apic_access_page was changed.
      
      In the nested case, the burden falls on L1, unless L0 enables EPT in
      vmcs02 but L1 enables neither EPT nor VPID in vmcs12.  For this reason
      prepare_vmcs02() and load_vmcs12_host_state() have special code to
      request a TLB flush in case L1 does not use EPT but it uses
      "virtualize APIC accesses".
      
      This special case however is not necessary. On a nested vmentry the
      physical TLB will already be flushed except if all the following apply:
      
      * L0 uses VPID
      
      * L1 uses VPID
      
      * L0 can guarantee TLB entries populated while running L1 are tagged
      differently than TLB entries populated while running L2.
      
      If the first condition is false, the processor will flush the TLB
      on vmentry to L2.  If the second or third condition are false,
      prepare_vmcs02() will request KVM_REQ_TLB_FLUSH.  However, even
      if both are true, no extra TLB flush is needed to handle the APIC
      access page:
      
      * if L1 doesn't use VPID, the second condition doesn't hold and the
      TLB will be flushed anyway.
      
      * if L1 uses VPID, it has to flush the TLB itself with INVVPID and
      section 28.3.3.3 doesn't apply to L0.
      
      * even INVEPT is not needed because, if L0 uses EPT, it uses different
      EPTP when running L2 than L1 (because guest_mode is part of mmu-role).
      In this case SDM section 28.3.3.4 doesn't apply.
      
      Similarly, examining nested_vmx_vmexit()->load_vmcs12_host_state(),
      one could note that L0 won't flush TLB only in cases where SDM sections
      28.3.3.3 and 28.3.3.4 don't apply.  In particular, if L0 uses different
      VPIDs for L1 and L2 (i.e. vmx->vpid != vmx->nested.vpid02), section
      28.3.3.3 doesn't apply.
      
      Thus, remove this flush from prepare_vmcs02() and nested_vmx_vmexit().
      
      Side-note: This patch can be viewed as removing parts of commit
      fb6c8198 ("kvm: vmx: Flush TLB when the APIC-access address changes”)
      that is not relevant anymore since commit
      1313cc2b
      
       ("kvm: mmu: Add guest_mode to kvm_mmu_page_role”).
      i.e. The first commit assumes that if L0 use EPT and L1 doesn’t use EPT,
      then L0 will use same EPTP for both L0 and L1. Which indeed required
      L0 to execute INVEPT before entering L2 guest. This assumption is
      not true anymore since when guest_mode was added to mmu-role.
      
      Reviewed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0155b2b9
    • Liran Alon's avatar
      KVM: nVMX: Do not mark vmcs02->apic_access_page as dirty when unpinning · b11494bc
      Liran Alon authored
      
      
      vmcs->apic_access_page is simply a token that the hypervisor puts into
      the PFN of a 4KB EPTE (or PTE if using shadow-paging) that triggers
      APIC-access VMExit or APIC virtualization logic whenever a CPU running
      in VMX non-root mode read/write from/to this PFN.
      
      As every write either triggers an APIC-access VMExit or write is
      performed on vmcs->virtual_apic_page, the PFN pointed to by
      vmcs->apic_access_page should never actually be touched by CPU.
      
      Therefore, there is no need to mark vmcs02->apic_access_page as dirty
      after unpin it on L2->L1 emulated VMExit or when L1 exit VMX operation.
      
      Reviewed-by: default avatarKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Reviewed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b11494bc
  12. 20 Nov, 2019 2 commits
  13. 15 Nov, 2019 3 commits