1. 22 Jun, 2020 2 commits
    • Paolo Bonzini's avatar
      KVM: LAPIC: ensure APIC map is up to date on concurrent update requests · 44d52717
      Paolo Bonzini authored
      
      
      The following race can cause lost map update events:
      
               cpu1                            cpu2
      
                                      apic_map_dirty = true
        ------------------------------------------------------------
                                      kvm_recalculate_apic_map:
                                           pass check
                                               mutex_lock(&kvm->arch.apic_map_lock);
                                               if (!kvm->arch.apic_map_dirty)
                                           and in process of updating map
        -------------------------------------------------------------
          other calls to
             apic_map_dirty = true         might be too late for affected cpu
        -------------------------------------------------------------
                                           apic_map_dirty = false
        -------------------------------------------------------------
          kvm_recalculate_apic_map:
          bail out on
            if (!kvm->arch.apic_map_dirty)
      
      To fix it, record the beginning of an update of the APIC map in
      apic_map_dirty.  If another APIC map change switches apic_map_dirty
      back to DIRTY during the update, kvm_recalculate_apic_map should not
      make it CLEAN, and the other caller will go through the slow path.
      Reported-by: default avatarIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      44d52717
    • Igor Mammedov's avatar
      kvm: lapic: fix broken vcpu hotplug · af28dfac
      Igor Mammedov authored
      Guest fails to online hotplugged CPU with error
        smpboot: do_boot_cpu failed(-1) to wakeup CPU#4
      
      It's caused by the fact that kvm_apic_set_state(), which used to call
      recalculate_apic_map() unconditionally and pulled hotplugged CPU into
      apic map, is updating map conditionally on state changes.  In this case
      the APIC map is not considered dirty and the is not updated.
      
      Fix the issue by forcing unconditional update from kvm_apic_set_state(),
      like it used to be.
      
      Fixes: 4abaffce
      
       ("KVM: LAPIC: Recalculate apic map in batch")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarIgor Mammedov <imammedo@redhat.com>
      Message-Id: <20200622160830.426022-1-imammedo@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      af28dfac
  2. 27 May, 2020 1 commit
  3. 15 May, 2020 2 commits
  4. 13 May, 2020 1 commit
    • Davidlohr Bueso's avatar
      kvm: Replace vcpu->swait with rcuwait · da4ad88c
      Davidlohr Bueso authored
      The use of any sort of waitqueue (simple or regular) for
      wait/waking vcpus has always been an overkill and semantically
      wrong. Because this is per-vcpu (which is blocked) there is
      only ever a single waiting vcpu, thus no need for any sort of
      queue.
      
      As such, make use of the rcuwait primitive, with the following
      considerations:
      
        - rcuwait already provides the proper barriers that serialize
        concurrent waiter and waker.
      
        - Task wakeup is done in rcu read critical region, with a
        stable task pointer.
      
        - Because there is no concurrency among waiters, we need
        not worry about rcuwait_wait_event() calls corrupting
        the wait->task. As a consequence, this saves the locking
        done in swait when modifying the queue. This also applies
        to per-vcore wait for powerpc kvm-hv.
      
      The x86 tscdeadline_latency test mentioned in 8577370f
      
      
      ("KVM: Use simple waitqueue for vcpu->wq") shows that, on avg,
      latency is reduced by around 15-20% with this change.
      
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: kvmarm@lists.cs.columbia.edu
      Cc: linux-mips@vger.kernel.org
      Reviewed-by: default avatarMarc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Message-Id: <20200424054837.5138-6-dave@stgolabs.net>
      [Avoid extra logic changes. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      da4ad88c
  5. 15 Apr, 2020 1 commit
    • Peter Shier's avatar
      KVM: x86: Return updated timer current count register from KVM_GET_LAPIC · 24647e0a
      Peter Shier authored
      
      
      kvm_vcpu_ioctl_get_lapic (implements KVM_GET_LAPIC ioctl) does a bulk copy
      of the LAPIC registers but must take into account that the one-shot and
      periodic timer current count register is computed upon reads and is not
      present in register state. When restoring LAPIC state (e.g. after
      migration), restart timers from their their current count values at time of
      save.
      
      Note: When a one-shot timer expires, the code in arch/x86/kvm/lapic.c does
      not zero the value of the LAPIC initial count register (emulating HW
      behavior). If no other timer is run and pending prior to a subsequent
      KVM_GET_LAPIC call, the returned register set will include the expired
      one-shot initial count. On a subsequent KVM_SET_LAPIC call the code will
      see a non-zero initial count and start a new one-shot timer using the
      expired timer's count. This is a prior existing bug and will be addressed
      in a separate patch. Thanks to jmattson@google.com for this find.
      Signed-off-by: default avatarPeter Shier <pshier@google.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Reviewed-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Message-Id: <20181010225653.238911-1-pshier@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      24647e0a
  6. 07 Apr, 2020 1 commit
  7. 31 Mar, 2020 1 commit
  8. 26 Mar, 2020 1 commit
  9. 24 Mar, 2020 1 commit
  10. 23 Mar, 2020 1 commit
    • He Zhe's avatar
      KVM: LAPIC: Mark hrtimer for period or oneshot mode to expire in hard interrupt context · edec6e01
      He Zhe authored
      
      
      apic->lapic_timer.timer was initialized with HRTIMER_MODE_ABS_HARD but
      started later with HRTIMER_MODE_ABS, which may cause the following warning
      in PREEMPT_RT kernel.
      
      WARNING: CPU: 1 PID: 2957 at kernel/time/hrtimer.c:1129 hrtimer_start_range_ns+0x348/0x3f0
      CPU: 1 PID: 2957 Comm: qemu-system-x86 Not tainted 5.4.23-rt11 #1
      Hardware name: Supermicro SYS-E300-9A-8C/A2SDi-8C-HLN4F, BIOS 1.1a 09/18/2018
      RIP: 0010:hrtimer_start_range_ns+0x348/0x3f0
      Code: 4d b8 0f 94 c1 0f b6 c9 e8 35 f1 ff ff 4c 8b 45
            b0 e9 3b fd ff ff e8 d7 3f fa ff 48 98 4c 03 34
            c5 a0 26 bf 93 e9 a1 fd ff ff <0f> 0b e9 fd fc ff
            ff 65 8b 05 fa b7 90 6d 89 c0 48 0f a3 05 60 91
      RSP: 0018:ffffbc60026ffaf8 EFLAGS: 00010202
      RAX: 0000000000000001 RBX: ffff9d81657d4110 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000006cc7987bcf RDI: ffff9d81657d4110
      RBP: ffffbc60026ffb58 R08: 0000000000000001 R09: 0000000000000010
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000006cc7987bcf
      R13: 0000000000000000 R14: 0000006cc7987bcf R15: ffffbc60026d6a00
      FS: 00007f401daed700(0000) GS:ffff9d81ffa40000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000ffffffff CR3: 0000000fa7574000 CR4: 00000000003426e0
      Call Trace:
      ? kvm_release_pfn_clean+0x22/0x60 [kvm]
      start_sw_timer+0x85/0x230 [kvm]
      ? vmx_vmexit+0x1b/0x30 [kvm_intel]
      kvm_lapic_switch_to_sw_timer+0x72/0x80 [kvm]
      vmx_pre_block+0x1cb/0x260 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_vmexit+0x1b/0x30 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_vmexit+0x1b/0x30 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_vmexit+0x1b/0x30 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_vmexit+0x1b/0x30 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_vmexit+0x1b/0x30 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_vmexit+0x1b/0x30 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_vmexit+0x1b/0x30 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_sync_pir_to_irr+0x9e/0x100 [kvm_intel]
      ? kvm_apic_has_interrupt+0x46/0x80 [kvm]
      kvm_arch_vcpu_ioctl_run+0x85b/0x1fa0 [kvm]
      ? _raw_spin_unlock_irqrestore+0x18/0x50
      ? _copy_to_user+0x2c/0x30
      kvm_vcpu_ioctl+0x235/0x660 [kvm]
      ? rt_spin_unlock+0x2c/0x50
      do_vfs_ioctl+0x3e4/0x650
      ? __fget+0x7a/0xa0
      ksys_ioctl+0x67/0x90
      __x64_sys_ioctl+0x1a/0x20
      do_syscall_64+0x4d/0x120
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x7f4027cc54a7
      Code: 00 00 90 48 8b 05 e9 59 0c 00 64 c7 00 26 00 00
            00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00
            00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff
            73 01 c3 48 8b 0d b9 59 0c 00 f7 d8 64 89 01 48
      RSP: 002b:00007f401dae9858 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 00005558bd029690 RCX: 00007f4027cc54a7
      RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000d
      RBP: 00007f4028b72000 R08: 00005558bc829ad0 R09: 00000000ffffffff
      R10: 00005558bcf90ca0 R11: 0000000000000246 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 00005558bce1c840
      --[ end trace 0000000000000002 ]--
      Signed-off-by: default avatarHe Zhe <zhe.he@windriver.com>
      Message-Id: <1584687967-332859-1-git-send-email-zhe.he@windriver.com>
      Reviewed-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      edec6e01
  11. 16 Mar, 2020 2 commits
  12. 21 Feb, 2020 2 commits
  13. 12 Feb, 2020 1 commit
  14. 05 Feb, 2020 1 commit
  15. 27 Jan, 2020 3 commits
  16. 21 Jan, 2020 2 commits
  17. 08 Jan, 2020 3 commits
  18. 15 Nov, 2019 3 commits
  19. 22 Oct, 2019 1 commit
  20. 26 Sep, 2019 1 commit
  21. 24 Sep, 2019 1 commit
  22. 11 Sep, 2019 2 commits
    • Liran Alon's avatar
      KVM: x86: Fix INIT signal handling in various CPU states · 4b9852f4
      Liran Alon authored
      Commit cd7764fe
      
       ("KVM: x86: latch INITs while in system management mode")
      changed code to latch INIT while vCPU is in SMM and process latched INIT
      when leaving SMM. It left a subtle remark in commit message that similar
      treatment should also be done while vCPU is in VMX non-root-mode.
      
      However, INIT signals should actually be latched in various vCPU states:
      (*) For both Intel and AMD, INIT signals should be latched while vCPU
      is in SMM.
      (*) For Intel, INIT should also be latched while vCPU is in VMX
      operation and later processed when vCPU leaves VMX operation by
      executing VMXOFF.
      (*) For AMD, INIT should also be latched while vCPU runs with GIF=0
      or in guest-mode with intercept defined on INIT signal.
      
      To fix this:
      1) Add kvm_x86_ops->apic_init_signal_blocked() such that each CPU vendor
      can define the various CPU states in which INIT signals should be
      blocked and modify kvm_apic_accept_events() to use it.
      2) Modify vmx_check_nested_events() to check for pending INIT signal
      while vCPU in guest-mode. If so, emualte vmexit on
      EXIT_REASON_INIT_SIGNAL. Note that nSVM should have similar behaviour
      but is currently left as a TODO comment to implement in the future
      because nSVM don't yet implement svm_check_nested_events().
      
      Note: Currently KVM nVMX implementation don't support VMX wait-for-SIPI
      activity state as specified in MSR_IA32_VMX_MISC bits 6:8 exposed to
      guest (See nested_vmx_setup_ctls_msrs()).
      If and when support for this activity state will be implemented,
      kvm_check_nested_events() would need to avoid emulating vmexit on
      INIT signal in case activity-state is wait-for-SIPI. In addition,
      kvm_apic_accept_events() would need to be modified to avoid discarding
      SIPI in case VMX activity-state is wait-for-SIPI but instead delay
      SIPI processing to vmx_check_nested_events() that would clear
      pending APIC events and emulate vmexit on SIPI.
      Reviewed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Co-developed-by: default avatarNikita Leshenko <nikita.leshchenko@oracle.com>
      Signed-off-by: default avatarNikita Leshenko <nikita.leshchenko@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4b9852f4
    • Wanpeng Li's avatar
      KVM: LAPIC: Micro optimize IPI latency · 2b0911d1
      Wanpeng Li authored
      
      
      This patch optimizes the virtual IPI emulation sequence:
      
      write ICR2                     write ICR2
      write ICR                      read ICR2
      read ICR            ==>        send virtual IPI
      read ICR2                      write ICR
      send virtual IPI
      
      It can reduce kvm-unit-tests/vmexit.flat IPI testing latency(from sender
      send IPI to sender receive the ACK) from 3319 cycles to 3203 cycles on
      SKylake server.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2b0911d1
  23. 14 Aug, 2019 1 commit
    • Radim Krcmar's avatar
      kvm: x86: skip populating logical dest map if apic is not sw enabled · b14c876b
      Radim Krcmar authored
      
      
      recalculate_apic_map does not santize ldr and it's possible that
      multiple bits are set. In that case, a previous valid entry
      can potentially be overwritten by an invalid one.
      
      This condition is hit when booting a 32 bit, >8 CPU, RHEL6 guest and then
      triggering a crash to boot a kdump kernel. This is the sequence of
      events:
      1. Linux boots in bigsmp mode and enables PhysFlat, however, it still
      writes to the LDR which probably will never be used.
      2. However, when booting into kdump, the stale LDR values remain as
      they are not cleared by the guest and there isn't a apic reset.
      3. kdump boots with 1 cpu, and uses Logical Destination Mode but the
      logical map has been overwritten and points to an inactive vcpu.
      Signed-off-by: default avatarRadim Krcmar <rkrcmar@redhat.com>
      Signed-off-by: default avatarBandan Das <bsd@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b14c876b
  24. 05 Aug, 2019 1 commit
  25. 01 Aug, 2019 1 commit
  26. 20 Jul, 2019 1 commit
    • Wanpeng Li's avatar
      KVM: LAPIC: Inject timer interrupt via posted interrupt · 0c5f81da
      Wanpeng Li authored
      
      
      Dedicated instances are currently disturbed by unnecessary jitter due
      to the emulated lapic timers firing on the same pCPUs where the
      vCPUs reside.  There is no hardware virtual timer on Intel for guest
      like ARM, so both programming timer in guest and the emulated timer fires
      incur vmexits.  This patch tries to avoid vmexit when the emulated timer
      fires, at least in dedicated instance scenario when nohz_full is enabled.
      
      In that case, the emulated timers can be offload to the nearest busy
      housekeeping cpus since APICv has been found for several years in server
      processors. The guest timer interrupt can then be injected via posted interrupts,
      which are delivered by the housekeeping cpu once the emulated timer fires.
      
      The host should tuned so that vCPUs are placed on isolated physical
      processors, and with several pCPUs surplus for busy housekeeping.
      If disabled mwait/hlt/pause vmexits keep the vCPUs in non-root mode,
      ~3% redis performance benefit can be observed on Skylake server, and the
      number of external interrupt vmexits drops substantially.  Without patch
      
                  VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time
      EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )
      
      While with patch:
      
                  VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time
      EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0c5f81da
  27. 17 Jul, 2019 1 commit
    • Wanpeng Li's avatar
      KVM: LAPIC: Make lapic timer unpinned · 4d151bf3
      Wanpeng Li authored
      Commit 61abdbe0
      
       ("kvm: x86: make lapic hrtimer pinned") pinned the
      lapic timer to avoid to wait until the next kvm exit for the guest to
      see KVM_REQ_PENDING_TIMER set. There is another solution to give a kick
      after setting the KVM_REQ_PENDING_TIMER bit, make lapic timer unpinned
      will be used in follow up patches.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4d151bf3
  28. 15 Jul, 2019 1 commit