1. 15 Jun, 2020 1 commit
  2. 01 Jun, 2020 1 commit
    • Gustavo A. R. Silva's avatar
      KVM: VMX: Replace zero-length array with flexible-array · f4a9fdd5
      Gustavo A. R. Silva authored
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      sizeof(flexible-array-member) triggers a warning because flexible array
      members have incomplete type[1]. There are some instances of code in
      which the sizeof operator is being incorrectly/erroneously applied to
      zero-length arrays and the result is zero. Such instances may be hiding
      some bugs. So, this work (flexible-array member conversions) will also
      help to get completely rid of those sorts of issues.
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732
      
       ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Message-Id: <20200507185618.GA14831@embeddedor>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f4a9fdd5
  3. 18 Jun, 2019 4 commits
    • Sean Christopherson's avatar
      KVM: VMX: Leave preemption timer running when it's disabled · 804939ea
      Sean Christopherson authored
      
      
      VMWRITEs to the major VMCS controls, pin controls included, are
      deceptively expensive.  CPUs with VMCS caching (Westmere and later) also
      optimize away consistency checks on VM-Entry, i.e. skip consistency
      checks if the relevant fields have not changed since the last successful
      VM-Entry (of the cached VMCS).  Because uops are a precious commodity,
      uCode's dirty VMCS field tracking isn't as precise as software would
      prefer.  Notably, writing any of the major VMCS fields effectively marks
      the entire VMCS dirty, i.e. causes the next VM-Entry to perform all
      consistency checks, which consumes several hundred cycles.
      
      As it pertains to KVM, toggling PIN_BASED_VMX_PREEMPTION_TIMER more than
      doubles the latency of the next VM-Entry (and again when/if the flag is
      toggled back).  In a non-nested scenario, running a "standard" guest
      with the preemption timer enabled, toggling the timer flag is uncommon
      but not rare, e.g. roughly 1 in 10 entries.  Disabling the preemption
      timer can change these numbers due to its use for "immediate exits",
      even when explicitly disabled by userspace.
      
      Nested virtualization in particular is painful, as the timer flag is set
      for the majority of VM-Enters, but prepare_vmcs02() initializes vmcs02's
      pin controls to *clear* the flag since its the timer's final state isn't
      known until vmx_vcpu_run().  I.e. the majority of nested VM-Enters end
      up unnecessarily writing pin controls *twice*.
      
      Rather than toggle the timer flag in pin controls, set the timer value
      itself to the largest allowed value to put it into a "soft disabled"
      state, and ignore any spurious preemption timer exits.
      
      Sadly, the timer is a 32-bit value and so theoretically it can fire
      before the head death of the universe, i.e. spurious exits are possible.
      But because KVM does *not* save the timer value on VM-Exit and because
      the timer runs at a slower rate than the TSC, the maximuma timer value
      is still sufficiently large for KVM's purposes.  E.g. on a modern CPU
      with a timer that runs at 1/32 the frequency of a 2.4ghz constant-rate
      TSC, the timer will fire after ~55 seconds of *uninterrupted* guest
      execution.  In other words, spurious VM-Exits are effectively only
      possible if the host is completely tickless on the logical CPU, the
      guest is not using the preemption timer, and the guest is not generating
      VM-Exits for any other reason.
      
      To be safe from bad/weird hardware, disable the preemption timer if its
      maximum delay is less than ten seconds.  Ten seconds is mostly arbitrary
      and was selected in no small part because it's a nice round number.
      For simplicity and paranoia, fall back to __kvm_request_immediate_exit()
      if the preemption timer is disabled by KVM or userspace.  Previously
      KVM continued to use the preemption timer to force immediate exits even
      when the timer was disabled by userspace.  Now that KVM leaves the timer
      running instead of truly disabling it, allow userspace to kill it
      entirely in the unlikely event the timer (or KVM) malfunctions.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      804939ea
    • Sean Christopherson's avatar
      KVM: VMX: Drop hv_timer_armed from 'struct loaded_vmcs' · 9d99cc49
      Sean Christopherson authored
      
      
      ... now that it is fully redundant with the pin controls shadow.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9d99cc49
    • Sean Christopherson's avatar
      KVM: nVMX: Shadow VMCS controls on a per-VMCS basis · 09e226cf
      Sean Christopherson authored
      
      
      ... to pave the way for not preserving the shadow copies across switches
      between vmcs01 and vmcs02, and eventually to avoid VMWRITEs to vmcs02
      when the desired value is unchanged across nested VM-Enters.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      09e226cf
    • Sean Christopherson's avatar
      KVM: VMX: Read cached VM-Exit reason to detect external interrupt · 49def500
      Sean Christopherson authored
      
      
      Generic x86 code invokes the kvm_x86_ops external interrupt handler on
      all VM-Exits regardless of the actual exit type.  Use the already-cached
      EXIT_REASON to determine if the VM-Exit was due to an interrupt, thus
      avoiding an extra VMREAD (to query VM_EXIT_INTR_INFO) for all other
      types of VM-Exit.
      
      In addition to avoiding the extra VMREAD, checking the EXIT_REASON
      instead of VM_EXIT_INTR_INFO makes it more obvious that
      vmx_handle_external_intr() is called for all VM-Exits, e.g. someone
      unfamiliar with the flow might wonder under what condition(s)
      VM_EXIT_INTR_INFO does not contain a valid interrupt, which is
      simply not possible since KVM always runs with "ack interrupt on exit".
      
      WARN once if VM_EXIT_INTR_INFO doesn't contain a valid interrupt on
      an EXTERNAL_INTERRUPT VM-Exit, as such a condition would indicate a
      hardware bug.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      49def500
  4. 12 Feb, 2019 1 commit
    • Sean Christopherson's avatar
      KVM: nVMX: Cache host_rsp on a per-VMCS basis · 5a878160
      Sean Christopherson authored
      
      
      Currently, host_rsp is cached on a per-vCPU basis, i.e. it's stored in
      struct vcpu_vmx.  In non-nested usage the caching is for all intents
      and purposes 100% effective, e.g. only the first VMLAUNCH needs to
      synchronize VMCS.HOST_RSP since the call stack to vmx_vcpu_run() is
      identical each and every time.  But when running a nested guest, KVM
      must invalidate the cache when switching the current VMCS as it can't
      guarantee the new VMCS has the same HOST_RSP as the previous VMCS.  In
      other words, the cache loses almost all of its efficacy when running a
      nested VM.
      
      Move host_rsp to struct vmcs_host_state, which is per-VMCS, so that it
      is cached on a per-VMCS basis and restores its 100% hit rate when
      nested VMs are in play.
      
      Note that the host_rsp cache for vmcs02 essentially "breaks" when
      nested early checks are enabled as nested_vmx_check_vmentry_hw() will
      see a different RSP at the time of its VM-Enter.  While it's possible
      to avoid even that VMCS.HOST_RSP synchronization, e.g. by employing a
      dedicated VM-Exit stack, there is little motivation for doing so as
      the overhead of two VMWRITEs (~55 cycles) is dwarfed by the overhead
      of the extra VMX transition (600+ cycles) and is a proverbial drop in
      the ocean relative to the total cost of a nested transtion (10s of
      thousands of cycles).
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5a878160
  5. 14 Dec, 2018 3 commits