1. 23 Sep, 2017 1 commit
    • Josh Poimboeuf's avatar
      x86/asm: Fix inline asm call constraints for Clang · f5caf621
      Josh Poimboeuf authored
      For inline asm statements which have a CALL instruction, we list the
      stack pointer as a constraint to convince GCC to ensure the frame
      pointer is set up first:
      
        static inline void foo()
        {
      	register void *__sp asm(_ASM_SP);
      	asm("call bar" : "+r" (__sp))
        }
      
      Unfortunately, that pattern causes Clang to corrupt the stack pointer.
      
      The fix is easy: convert the stack pointer register variable to a global
      variable.
      
      It should be noted that the end result is different based on the GCC
      version.  With GCC 6.4, this patch has exactly the same result as
      before:
      
      	defconfig	defconfig-nofp	distro		distro-nofp
       before	9820389		9491555		8816046		8516940
       after	9820389		9491555		8816046		8516940
      
      With GCC 7.2, however, GCC's behavior has changed.  It now changes its
      behavior based on the conversion of the register variable to a global.
      That somehow convinces it to *always* set up the frame pointer before
      inserting *any* inline asm.  (Therefore, listing the variable as an
      output constraint is a no-op and is no longer necessary.)  It's a bit
      overkill, but the performance impact should be negligible.  And in fact,
      there's a nice improvement with frame pointers disabled:
      
      	defconfig	defconfig-nofp	distro		distro-nofp
       before	9796316		9468236		9076191		8790305
       after	9796957		9464267		9076381		8785949
      
      So in summary, while listing the stack pointer as an output constraint
      is no longer necessary for newer versions of GCC, it's still needed for
      older versions.
      Suggested-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reported-by: default avatarMatthias Kaehlcke <mka@chromium.org>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dmitriy Vyukov <dvyukov@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/3db862e970c432ae823cf515c52b54fec8270e0e.1505942196.git.jpoimboe@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f5caf621
  2. 15 Sep, 2017 5 commits
  3. 13 Sep, 2017 2 commits
  4. 29 Aug, 2017 1 commit
    • Thomas Gleixner's avatar
      x86/idt: Unify gate_struct handling for 32/64-bit kernels · 64b163fa
      Thomas Gleixner authored
      The first 32 bits of gate struct are the same for 32 and 64 bit kernels.
      
      The 32-bit version uses desc_struct and no designated data structure,
      so we need different accessors for 32 and 64 bit kernels.
      
      Aside of that the macros which are necessary to build the 32-bit
      gate descriptor are horrible to read.
      
      Unify the gate structs and switch all code fiddling with it over.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/20170828064957.861974317@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      64b163fa
  5. 25 Aug, 2017 2 commits
  6. 24 Aug, 2017 9 commits
    • Wanpeng Li's avatar
      KVM: nVMX: Fix trying to cancel vmlauch/vmresume · bfcf83b1
      Wanpeng Li authored
      ------------[ cut here ]------------
      WARNING: CPU: 7 PID: 3861 at /home/kernel/ssd/kvm/arch/x86/kvm//vmx.c:11299 nested_vmx_vmexit+0x176e/0x1980 [kvm_intel]
      CPU: 7 PID: 3861 Comm: qemu-system-x86 Tainted: G        W  OE   4.13.0-rc4+ #11
      RIP: 0010:nested_vmx_vmexit+0x176e/0x1980 [kvm_intel]
      Call Trace:
       ? kvm_multiple_exception+0x149/0x170 [kvm]
       ? handle_emulation_failure+0x79/0x230 [kvm]
       ? load_vmcs12_host_state+0xa80/0xa80 [kvm_intel]
       ? check_chain_key+0x137/0x1e0
       ? reexecute_instruction.part.168+0x130/0x130 [kvm]
       nested_vmx_inject_exception_vmexit+0xb7/0x100 [kvm_intel]
       ? nested_vmx_inject_exception_vmexit+0xb7/0x100 [kvm_intel]
       vmx_queue_exception+0x197/0x300 [kvm_intel]
       kvm_arch_vcpu_ioctl_run+0x1b0c/0x2c90 [kvm]
       ? kvm_arch_vcpu_runnable+0x220/0x220 [kvm]
       ? preempt_count_sub+0x18/0xc0
       ? restart_apic_timer+0x17d/0x300 [kvm]
       ? kvm_lapic_restart_hv_timer+0x37/0x50 [kvm]
       ? kvm_arch_vcpu_load+0x1d8/0x350 [kvm]
       kvm_vcpu_ioctl+0x4e4/0x910 [kvm]
       ? kvm_vcpu_ioctl+0x4e4/0x910 [kvm]
       ? kvm_dev_ioctl+0xbe0/0xbe0 [kvm]
      
      The flag "nested_run_pending", which can override the decision of which should run
      next, L1 or L2. nested_run_pending=1 means that we *must* run L2 next, not L1. This
      is necessary in particular when L1 did a VMLAUNCH of L2 and therefore expects L2 to
      be run (and perhaps be injected with an event it specified, etc.). Nested_run_pending
      is especially intended to avoid switching  to L1 in the injection decision-point.
      
      This can be handled just like the other cases in vmx_check_nested_events, instead of
      having a special case in vmx_queue_exception.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bfcf83b1
    • Wanpeng Li's avatar
      KVM: X86: Fix loss of exception which has not yet been injected · 664f8e26
      Wanpeng Li authored
      vmx_complete_interrupts() assumes that the exception is always injected,
      so it can be dropped by kvm_clear_exception_queue().  However,
      an exception cannot be injected immediately if it is: 1) originally
      destined to a nested guest; 2) trapped to cause a vmexit; 3) happening
      right after VMLAUNCH/VMRESUME, i.e. when nested_run_pending is true.
      
      This patch applies to exceptions the same algorithm that is used for
      NMIs, replacing exception.reinject with "exception.injected" (equivalent
      to nmi_injected).
      
      exception.pending now represents an exception that is queued and whose
      side effects (e.g., update RFLAGS.RF or DR7) have not been applied yet.
      If exception.pending is true, the exception might result in a nested
      vmexit instead, too (in which case the side effects must not be applied).
      
      exception.injected instead represents an exception that is going to be
      injected into the guest at the next vmentry.
      Reported-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      664f8e26
    • Wanpeng Li's avatar
      KVM: VMX: use kvm_event_needs_reinjection · 274bba52
      Wanpeng Li authored
      Use kvm_event_needs_reinjection() encapsulation.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      274bba52
    • Yu Zhang's avatar
      KVM: MMU: Expose the LA57 feature to VM. · fd8cb433
      Yu Zhang authored
      This patch exposes 5 level page table feature to the VM.
      At the same time, the canonical virtual address checking is
      extended to support both 48-bits and 57-bits address width.
      Signed-off-by: default avatarYu Zhang <yu.c.zhang@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fd8cb433
    • Yu Zhang's avatar
      KVM: MMU: Add 5 level EPT & Shadow page table support. · 855feb67
      Yu Zhang authored
      Extends the shadow paging code, so that 5 level shadow page
      table can be constructed if VM is running in 5 level paging
      mode.
      
      Also extends the ept code, so that 5 level ept table can be
      constructed if maxphysaddr of VM exceeds 48 bits. Unlike the
      shadow logic, KVM should still use 4 level ept table for a VM
      whose physical address width is less than 48 bits, even when
      the VM is running in 5 level paging mode.
      Signed-off-by: default avatarYu Zhang <yu.c.zhang@linux.intel.com>
      [Unconditionally reset the MMU context in kvm_cpuid_update.
       Changing MAXPHYADDR invalidates the reserved bit bitmasks.
       - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      855feb67
    • Paolo Bonzini's avatar
      kvm: vmx: Raise #UD on unsupported XSAVES/XRSTORS · 3db13480
      Paolo Bonzini authored
      A guest may not be configured to support XSAVES/XRSTORS, even when the host
      does. If the guest does not support XSAVES/XRSTORS, clear the secondary
      execution control so that the processor will raise #UD.
      
      Also clear the "allowed-1" bit for XSAVES/XRSTORS exiting in the
      IA32_VMX_PROCBASED_CTLS2 MSR, and pass through VMCS12's control in
      the VMCS02.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3db13480
    • Jim Mattson's avatar
      kvm: vmx: Raise #UD on unsupported RDSEED · 75f4fc8d
      Jim Mattson authored
      A guest may not be configured to support RDSEED, even when the host
      does. If the guest does not support RDSEED, intercept the instruction
      and synthesize #UD. Also clear the "allowed-1" bit for RDSEED exiting
      in the IA32_VMX_PROCBASED_CTLS2 MSR.
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      75f4fc8d
    • Jim Mattson's avatar
      kvm: vmx: Raise #UD on unsupported RDRAND · 45ec368c
      Jim Mattson authored
      A guest may not be configured to support RDRAND, even when the host
      does. If the guest does not support RDRAND, intercept the instruction
      and synthesize #UD. Also clear the "allowed-1" bit for RDRAND exiting
      in the IA32_VMX_PROCBASED_CTLS2 MSR.
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      45ec368c
    • Paolo Bonzini's avatar
      KVM: VMX: cache secondary exec controls · 80154d77
      Paolo Bonzini authored
      Currently, secondary execution controls are divided in three groups:
      
      - static, depending mostly on the module arguments or the processor
        (vmx_secondary_exec_control)
      
      - static, depending on CPUID (vmx_cpuid_update)
      
      - dynamic, depending on nested VMX or local APIC state
      
      Because walking CPUID is expensive, prepare_vmcs02 is using only
      the first group.  This however is unnecessarily complicated.  Just
      cache the static secondary execution controls, and then prepare_vmcs02
      does not need to compute them every time.  Computation of all static
      secondary execution controls is now kept in a single function,
      vmx_compute_secondary_exec_control.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      80154d77
  7. 18 Aug, 2017 5 commits
  8. 11 Aug, 2017 1 commit
  9. 10 Aug, 2017 2 commits
    • Paolo Bonzini's avatar
      kvm: nVMX: Add support for fast unprotection of nested guest page tables · eebed243
      Paolo Bonzini authored
      This is the same as commit 14727754 ("kvm: svm: Add support for
      additional SVM NPF error codes", 2016-11-23), but for Intel processors.
      In this case, the exit qualification field's bit 8 says whether the
      EPT violation occurred while translating the guest's final physical
      address or rather while translating the guest page tables.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      eebed243
    • Wanpeng Li's avatar
      KVM: X86: Fix residual mmio emulation request to userspace · bbeac283
      Wanpeng Li authored
      Reported by syzkaller:
      
      The kvm-intel.unrestricted_guest=0
      
         WARNING: CPU: 5 PID: 1014 at /home/kernel/data/kvm/arch/x86/kvm//x86.c:7227 kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm]
         CPU: 5 PID: 1014 Comm: warn_test Tainted: G        W  OE   4.13.0-rc3+ #8
         RIP: 0010:kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm]
         Call Trace:
          ? put_pid+0x3a/0x50
          ? rcu_read_lock_sched_held+0x79/0x80
          ? kmem_cache_free+0x2f2/0x350
          kvm_vcpu_ioctl+0x340/0x700 [kvm]
          ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
          ? __fget+0xfc/0x210
          do_vfs_ioctl+0xa4/0x6a0
          ? __fget+0x11d/0x210
          SyS_ioctl+0x79/0x90
          entry_SYSCALL_64_fastpath+0x23/0xc2
          ? __this_cpu_preempt_check+0x13/0x20
      
      The syszkaller folks reported a residual mmio emulation request to userspace
      due to vm86 fails to emulate inject real mode interrupt(fails to read CS) and
      incurs a triple fault. The vCPU returns to userspace with vcpu->mmio_needed == true
      and KVM_EXIT_SHUTDOWN exit reason. However, the syszkaller testcase constructs
      several threads to launch the same vCPU, the thread which lauch this vCPU after
      the thread whichs get the vcpu->mmio_needed == true and KVM_EXIT_SHUTDOWN will
      trigger the warning.
      
         #define _GNU_SOURCE
         #include <pthread.h>
         #include <stdio.h>
         #include <stdlib.h>
         #include <string.h>
         #include <sys/wait.h>
         #include <sys/types.h>
         #include <sys/stat.h>
         #include <sys/mman.h>
         #include <fcntl.h>
         #include <unistd.h>
         #include <linux/kvm.h>
         #include <stdio.h>
      
         int kvmcpu;
         struct kvm_run *run;
      
         void* thr(void* arg)
         {
           int res;
           res = ioctl(kvmcpu, KVM_RUN, 0);
           printf("ret1=%d exit_reason=%d suberror=%d\n",
               res, run->exit_reason, run->internal.suberror);
           return 0;
         }
      
         void test()
         {
           int i, kvm, kvmvm;
           pthread_t th[4];
      
           kvm = open("/dev/kvm", O_RDWR);
           kvmvm = ioctl(kvm, KVM_CREATE_VM, 0);
           kvmcpu = ioctl(kvmvm, KVM_CREATE_VCPU, 0);
           run = (struct kvm_run*)mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, kvmcpu, 0);
           srand(getpid());
           for (i = 0; i < 4; i++) {
             pthread_create(&th[i], 0, thr, 0);
             usleep(rand() % 10000);
           }
           for (i = 0; i < 4; i++)
             pthread_join(th[i], 0);
         }
      
         int main()
         {
           for (;;) {
             int pid = fork();
             if (pid < 0)
               exit(1);
             if (pid == 0) {
               test();
               exit(0);
             }
             int status;
             while (waitpid(pid, &status, __WALL) != pid) {}
           }
           return 0;
         }
      
      This patch fixes it by resetting the vcpu->mmio_needed once we receive
      the triple fault to avoid the residue.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bbeac283
  10. 08 Aug, 2017 2 commits
  11. 07 Aug, 2017 8 commits
  12. 03 Aug, 2017 1 commit
    • Wanpeng Li's avatar
      KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit" · 6550c4df
      Wanpeng Li authored
      ------------[ cut here ]------------
       WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
       CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7
       RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
      Call Trace:
        vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
        ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
        kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm]
        ? vmx_vcpu_load+0x1be/0x220 [kvm_intel]
        ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
        kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? __fget+0xfc/0x210
        do_vfs_ioctl+0xa4/0x6a0
        ? __fget+0x11d/0x210
        SyS_ioctl+0x79/0x90
        do_syscall_64+0x8f/0x750
        ? trace_hardirqs_on_thunk+0x1a/0x1c
        entry_SYSCALL64_slow_path+0x25/0x25
      
      This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which
      means that tells the kernel to not make use of any IOAPICs that may be present
      in the system.
      
      Actually external_intr variable in nested_vmx_vmexit() is the req_int_win
      variable passed from vcpu_enter_guest() which means that the L0's userspace
      requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) &&
      L0's userspace reqeusts an irq window) is true, so there is no interrupt which
      L1 requires to inject to L2, we should not attempt to emualte "Acknowledge
      interrupt on exit" for the irq window requirement in this scenario.
      
      This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit"
      if there is no L1 requirement to inject an interrupt to L2.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      [Added code comment to make it obvious that the behavior is not correct.
       We should do a userspace exit with open interrupt window instead of the
       nested VM exit.  This patch still improves the behavior, so it was
       accepted as a (temporary) workaround.]
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      6550c4df
  13. 02 Aug, 2017 1 commit