1. 01 Jun, 2017 1 commit
    • Roman Pen's avatar
      KVM: SVM: do not zero out segment attributes if segment is unusable or not present · d9c1b543
      Roman Pen authored
      This is a fix for the problem [1], where VMCB.CPL was set to 0 and interrupt
      was taken on userspace stack.  The root cause lies in the specific AMD CPU
      behaviour which manifests itself as unusable segment attributes on SYSRET.
      The corresponding work around for the kernel is the following:
       ("x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue")
      In other turn virtualization side treated unusable segment incorrectly and
      restored CPL from SS attributes, which were zeroed out few lines above.
      In current patch it is assured only that P bit is cleared in VMCB.save state
      and segment attributes are not zeroed out if segment is not presented or is
      unusable, therefore CPL can be safely restored from DPL field.
      This is only one part of the fix, since QEMU side should be fixed accordingly
      not to zero out attributes on its side.  Corresponding patch will follow.
      [1] Message id: CAJrWOzD6Xq==b-zYCDdFLgSRMPM-NkNuTSDFEtX=7MreT45i7Q@mail.gmail.com
      Signed-off-by: default avatarRoman Pen <roman.penyaev@profitbricks.com>
      Signed-off-by: default avatarMikhail Sennikovskii <mikhail.sennikovskii@profitbricks.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim KrÄmář <rkrcmar@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
  2. 30 May, 2017 1 commit
  3. 18 May, 2017 1 commit
  4. 21 Apr, 2017 1 commit
    • Michael S. Tsirkin's avatar
      kvm: better MWAIT emulation for guests · 668fffa3
      Michael S. Tsirkin authored
      Guests that are heavy on futexes end up IPI'ing each other a lot. That
      can lead to significant slowdowns and latency increase for those guests
      when running within KVM.
      If only a single guest is needed on a host, we have a lot of spare host
      CPU time we can throw at the problem. Modern CPUs implement a feature
      called "MWAIT" which allows guests to wake up sleeping remote CPUs without
      an IPI - thus without an exit - at the expense of never going out of guest
      The decision whether this is something sensible to use should be up to the
      VM admin, so to user space. We can however allow MWAIT execution on systems
      that support it properly hardware wise.
      This patch adds a CAP to user space and a KVM cpuid leaf to indicate
      availability of native MWAIT execution. With that enabled, the worst a
      guest can do is waste as many cycles as a "jmp ." would do, so it's not
      a privilege problem.
      We consciously do *not* expose the feature in our CPUID bitmap, as most
      people will want to benefit from sleeping vCPUs to allow for over commit.
      Reported-by: default avatar"Gabriel L. Somlo" <gsomlo@gmail.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      [agraf: fix amd, change commit message]
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
  5. 07 Apr, 2017 1 commit
    • Borislav Petkov's avatar
      kvm/svm: Setup MCG_CAP on AMD properly · 74f16909
      Borislav Petkov authored
      MCG_CAP[63:9] bits are reserved on AMD. However, on an AMD guest, this
      MSR returns 0x100010a. More specifically, bit 24 is set, which is simply
      wrong. That bit is MCG_SER_P and is present only on Intel. Thus, clean
      up the reserved bits in order not to confuse guests.
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
  6. 20 Mar, 2017 1 commit
    • Dmitry Vyukov's avatar
      kvm: fix usage of uninit spinlock in avic_vm_destroy() · 3863dff0
      Dmitry Vyukov authored
      If avic is not enabled, avic_vm_init() does nothing and returns early.
      However, avic_vm_destroy() still tries to destroy what hasn't been created.
      The only bad consequence of this now is that avic_vm_destroy() uses
      svm_vm_data_hash_lock that hasn't been initialized (and is not meant
      to be used at all if avic is not enabled).
      Return early from avic_vm_destroy() if avic is not enabled.
      It has nothing to destroy.
      Signed-off-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: syzkaller@googlegroups.com
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
  7. 16 Mar, 2017 1 commit
    • Thomas Garnier's avatar
      x86: Make the GDT remapping read-only on 64-bit · 45fc8757
      Thomas Garnier authored
      This patch makes the GDT remapped pages read-only, to prevent accidental
      (or intentional) corruption of this key data structure.
      This change is done only on 64-bit, because 32-bit needs it to be writable
      for TSS switches.
      The native_load_tr_desc function was adapted to correctly handle a
      read-only GDT. The LTR instruction always writes to the GDT TSS entry.
      This generates a page fault if the GDT is read-only. This change checks
      if the current GDT is a remap and swap GDTs as needed. This function was
      tested by booting multiple machines and checking hibernation works
      KVM SVM and VMX were adapted to use the writeable GDT. On VMX, the
      per-cpu variable was removed for functions to fetch the original GDT.
      Instead of reloading the previous GDT, VMX will reload the fixmap GDT as
      expected. For testing, VMs were started and restored on multiple
      Signed-off-by: default avatarThomas Garnier <thgarnie@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Luis R . Rodriguez <mcgrof@kernel.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rafael J . Wysocki <rjw@rjwysocki.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: kasan-dev@googlegroups.com
      Cc: kernel-hardening@lists.openwall.com
      Cc: kvm@vger.kernel.org
      Cc: lguest@lists.ozlabs.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-pm@vger.kernel.org
      Cc: xen-devel@lists.xenproject.org
      Cc: zijun_hu <zijun_hu@htc.com>
      Link: http://lkml.kernel.org/r/20170314170508.100882-3-thgarnie@google.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
  8. 17 Feb, 2017 1 commit
  9. 15 Feb, 2017 2 commits
  10. 09 Jan, 2017 1 commit
  11. 08 Dec, 2016 2 commits
    • Kyle Huey's avatar
      KVM: x86: Add kvm_skip_emulated_instruction and use it. · 6affcbed
      Kyle Huey authored
      kvm_skip_emulated_instruction calls both
      kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
      skipping the emulated instruction and generating a trap if necessary.
      Replacing skip_emulated_instruction calls with
      kvm_skip_emulated_instruction is straightforward, except for:
      - ICEBP, which is already inside a trap, so avoid triggering another trap.
      - Instructions that can trigger exits to userspace, such as the IO insns,
        MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
        KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
        IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
        take precedence. The singlestep will be triggered again on the next
        instruction, which is the current behavior.
      - Task switch instructions which would require additional handling (e.g.
        the task switch bit) and are instead left alone.
      - Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
        which do not trigger singlestep traps as mentioned previously.
      Signed-off-by: default avatarKyle Huey <khuey@kylehuey.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
    • Kyle Huey's avatar
      KVM: x86: Add a return value to kvm_emulate_cpuid · 6a908b62
      Kyle Huey authored
      Once skipping the emulated instruction can potentially trigger an exit to
      userspace (via KVM_GUESTDBG_SINGLESTEP) kvm_emulate_cpuid will need to
      propagate a return value.
      Signed-off-by: default avatarKyle Huey <khuey@kylehuey.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
  12. 24 Nov, 2016 2 commits
    • Tom Lendacky's avatar
      kvm: svm: Add kvm_fast_pio_in support · 8370c3d0
      Tom Lendacky authored
      Update the I/O interception support to add the kvm_fast_pio_in function
      to speed up the in instruction similar to the out instruction.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
    • Tom Lendacky's avatar
      kvm: svm: Add support for additional SVM NPF error codes · 14727754
      Tom Lendacky authored
      AMD hardware adds two additional bits to aid in nested page fault handling.
      Bit 32 - NPF occurred while translating the guest's final physical address
      Bit 33 - NPF occurred while translating the guest page tables
      The guest page tables fault indicator can be used as an aid for nested
      virtualization. Using V0 for the host, V1 for the first level guest and
      V2 for the second level guest, when both V1 and V2 are using nested paging
      there are currently a number of unnecessary instruction emulations. When
      V2 is launched shadow paging is used in V1 for the nested tables of V2. As
      a result, KVM marks these pages as RO in the host nested page tables. When
      V2 exits and we resume V1, these pages are still marked RO.
      Every nested walk for a guest page table is treated as a user-level write
      access and this causes a lot of NPFs because the V1 page tables are marked
      RO in the V0 nested tables. While executing V1, when these NPFs occur KVM
      sees a write to a read-only page, emulates the V1 instruction and unprotects
      the page (marking it RW). This patch looks for cases where we get a NPF due
      to a guest page table walk where the page was marked RO. It immediately
      unprotects the page and resumes the guest, leading to far fewer instruction
      emulations when nested virtualization is used.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
  13. 02 Nov, 2016 1 commit
    • Paolo Bonzini's avatar
      KVM: x86: drop TSC offsetting kvm_x86_ops to fix KVM_GET/SET_CLOCK · ea26e4ec
      Paolo Bonzini authored
      Since commit a545ab6a ("kvm: x86: add tsc_offset field to struct
      kvm_vcpu_arch", 2016-09-07) the offset between host and L1 TSC is
      cached and need not be fished out of the VMCS or VMCB.  This means
      that we can implement adjust_tsc_offset_guest and read_l1_tsc
      entirely in generic code.  The simplification is particularly
      significant for VMX code, where vmx->nested.vmcs01_tsc_offset
      was duplicating what is now in vcpu->arch.tsc_offset.  Therefore
      the vmcs01_tsc_offset can be dropped completely.
      More importantly, this fixes KVM_GET_CLOCK/KVM_SET_CLOCK
      which, after commit 108b249c ("KVM: x86: introduce get_kvmclock_ns",
      2016-09-01) called read_l1_tsc while the VMCS was not loaded.
      It thus returned bogus values on Intel CPUs.
      Fixes: 108b249c
      Reported-by: default avatarRoman Kagan <rkagan@virtuozzo.com>
      Reviewed-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
  14. 20 Sep, 2016 1 commit
  15. 16 Sep, 2016 1 commit
  16. 08 Sep, 2016 3 commits
  17. 10 Aug, 2016 1 commit
    • Kees Cook's avatar
      x86: Apply more __ro_after_init and const · 404f6aac
      Kees Cook authored
      Guided by grsecurity's analogous __read_only markings in arch/x86,
      this applies several uses of __ro_after_init to structures that are
      only updated during __init, and const for some structures that are
      never updated.  Additionally extends __init markings to some functions
      that are only used during __init, and cleans up some missing C99 style
      static initializers.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brad Spengler <spender@grsecurity.net>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Brown <david.brown@linaro.org>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Emese Revfy <re.emese@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathias Krause <minipli@googlemail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: PaX Team <pageexec@freemail.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20160808232906.GA29731@www.outflux.net
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
  18. 16 Jul, 2016 1 commit
    • Radim Krčmář's avatar
      Revert "KVM: SVM: fix trashing of MSR_TSC_AUX" · 6a907cd0
      Radim Krčmář authored
      This reverts commit 9770404a.
      The reverted patch is not needed as only userspace uses RDTSCP and
      MSR_TSC_AUX is in host_save_user_msrs[] and therefore properly saved in
      svm_vcpu_load() and restored in svm_vcpu_put() before every switch to
      The reverted patch did not allow the kernel to use RDTSCP in the future,
      because of missed trashing in svm_set_msr() and 64-bit ifdef.
      This reverts commit 2b23c3a6.
      2b23c3a6 ("KVM: SVM: do not set MSR_TSC_AUX on 32-bit builds") is a
      build fix for 9770404a and reverting them separately would only
      break more bisections.
      Cc: stable@vger.kernel.org
  19. 14 Jul, 2016 2 commits
  20. 01 Jul, 2016 1 commit
    • Paolo Bonzini's avatar
      KVM: x86: use guest_exit_irqoff · f2485b3e
      Paolo Bonzini authored
      This gains a few clock cycles per vmexit.  On Intel there is no need
      anymore to enable the interrupts in vmx_handle_external_intr, since
      we are using the "acknowledge interrupt on exit" feature.  AMD
      needs to do that, and must be careful to avoid the interrupt shadow.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
  21. 15 Jun, 2016 2 commits
  22. 14 Jun, 2016 1 commit
  23. 24 May, 2016 1 commit
  24. 18 May, 2016 8 commits
  25. 29 Apr, 2016 1 commit
  26. 22 Mar, 2016 1 commit