Skip to content
Snippets Groups Projects
  1. Apr 18, 2019
    • Sean Christopherson's avatar
      KVM: lapic: Allow user to disable adaptive tuning of timer advancement · c3941d9e
      Sean Christopherson authored
      
      The introduction of adaptive tuning of lapic timer advancement did not
      allow for the scenario where userspace would want to disable adaptive
      tuning but still employ timer advancement, e.g. for testing purposes or
      to handle a use case where adaptive tuning is unable to settle on a
      suitable time.  This is epecially pertinent now that KVM places a hard
      threshold on the maximum advancment time.
      
      Rework the timer semantics to accept signed values, with a value of '-1'
      being interpreted as "use adaptive tuning with KVM's internal default",
      and any other value being used as an explicit advancement time, e.g. a
      time of '0' effectively disables advancement.
      
      Note, this does not completely restore the original behavior of
      lapic_timer_advance_ns.  Prior to tracking the advancement per vCPU,
      which is necessary to support autotuning, userspace could adjust
      lapic_timer_advance_ns for *running* vCPU.  With per-vCPU tracking, the
      module params are snapshotted at vCPU creation, i.e. applying a new
      advancement effectively requires restarting a VM.
      
      Dynamically updating a running vCPU is possible, e.g. a helper could be
      added to retrieve the desired delay, choosing between the global module
      param and the per-VCPU value depending on whether or not auto-tuning is
      (globally) enabled, but introduces a great deal of complexity.  The
      wrapper itself is not complex, but understanding and documenting the
      effects of dynamically toggling auto-tuning and/or adjusting the timer
      advancement is nigh impossible since the behavior would be dependent on
      KVM's implementation as well as compiler optimizations.  In other words,
      providing stable behavior would require extremely careful consideration
      now and in the future.
      
      Given that the expected use of a manually-tuned timer advancement is to
      "tune once, run many", use the vastly simpler approach of recognizing
      changes to the module params only when creating a new vCPU.
      
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Reviewed-by: default avatarLiran Alon <liran.alon@oracle.com>
      Cc: stable@vger.kernel.org
      Fixes: 3b8a5df6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c3941d9e
    • Sean Christopherson's avatar
      KVM: lapic: Track lapic timer advance per vCPU · 39497d76
      Sean Christopherson authored
      
      Automatically adjusting the globally-shared timer advancement could
      corrupt the timer, e.g. if multiple vCPUs are concurrently adjusting
      the advancement value.  That could be partially fixed by using a local
      variable for the arithmetic, but it would still be susceptible to a
      race when setting timer_advance_adjust_done.
      
      And because virtual_tsc_khz and tsc_scaling_ratio are per-vCPU, the
      correct calibration for a given vCPU may not apply to all vCPUs.
      
      Furthermore, lapic_timer_advance_ns is marked __read_mostly, which is
      effectively violated when finding a stable advancement takes an extended
      amount of timer.
      
      Opportunistically change the definition of lapic_timer_advance_ns to
      a u32 so that it matches the style of struct kvm_timer.  Explicitly
      pass the param to kvm_create_lapic() so that it doesn't have to be
      exposed to lapic.c, thus reducing the probability of unintentionally
      using the global value instead of the per-vCPU value.
      
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Reviewed-by: default avatarLiran Alon <liran.alon@oracle.com>
      Cc: stable@vger.kernel.org
      Fixes: 3b8a5df6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      39497d76
    • Sean Christopherson's avatar
      KVM: lapic: Disable timer advancement if adaptive tuning goes haywire · 57bf67e7
      Sean Christopherson authored
      
      To minimize the latency of timer interrupts as observed by the guest,
      KVM adjusts the values it programs into the host timers to account for
      the host's overhead of programming and handling the timer event.  Now
      that the timer advancement is automatically tuned during runtime, it's
      effectively unbounded by default, e.g. if KVM is running as L1 the
      advancement can measure in hundreds of milliseconds.
      
      Disable timer advancement if adaptive tuning yields an advancement of
      more than 5000ns, as large advancements can break reasonable assumptions
      of the guest, e.g. that a timer configured to fire after 1ms won't
      arrive on the next instruction.  Although KVM busy waits to mitigate the
      case of a timer event arriving too early, complications can arise when
      shifting the interrupt too far, e.g. kvm-unit-test's vmx.interrupt test
      will fail when its "host" exits on interrupts as KVM may inject the INTR
      before the guest executes STI+HLT.   Arguably the unit test is "broken"
      in the sense that delaying a timer interrupt by 1ms doesn't technically
      guarantee the interrupt will arrive after STI+HLT, but it's a reasonable
      assumption that KVM should support.
      
      Furthermore, an unbounded advancement also effectively unbounds the time
      spent busy waiting, e.g. if the guest programs a timer with a very large
      delay.
      
      5000ns is a somewhat arbitrary threshold.  When running on bare metal,
      which is the intended use case, timer advancement is expected to be in
      the general vicinity of 1000ns.  5000ns is high enough that false
      positives are unlikely, while not being so high as to negatively affect
      the host's performance/stability.
      
      Note, a future patch will enable userspace to disable KVM's adaptive
      tuning, which will allow priveleged userspace will to specifying an
      advancement value in excess of this arbitrary threshold in order to
      satisfy an abnormal use case.
      
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Cc: stable@vger.kernel.org
      Fixes: 3b8a5df6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      57bf67e7
    • Vitaly Kuznetsov's avatar
      x86: kvm: hyper-v: deal with buggy TLB flush requests from WS2012 · da66761c
      Vitaly Kuznetsov authored
      
      It was reported that with some special Multi Processor Group configuration,
      e.g:
       bcdedit.exe /set groupsize 1
       bcdedit.exe /set maxgroup on
       bcdedit.exe /set groupaware on
      for a 16-vCPU guest WS2012 shows BSOD on boot when PV TLB flush mechanism
      is in use.
      
      Tracing kvm_hv_flush_tlb immediately reveals the issue:
      
       kvm_hv_flush_tlb: processor_mask 0x0 address_space 0x0 flags 0x2
      
      The only flag set in this request is HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES,
      however, processor_mask is 0x0 and no HV_FLUSH_ALL_PROCESSORS is specified.
      We don't flush anything and apparently it's not what Windows expects.
      
      TLFS doesn't say anything about such requests and newer Windows versions
      seem to be unaffected. This all feels like a WS2012 bug, which is, however,
      easy to workaround in KVM: let's flush everything when we see an empty
      flush request, over-flushing doesn't hurt.
      
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      da66761c
    • Liran Alon's avatar
      KVM: x86: Consider LAPIC TSC-Deadline timer expired if deadline too short · c09d65d9
      Liran Alon authored
      
      If guest sets MSR_IA32_TSCDEADLINE to value such that in host
      time-domain it's shorter than lapic_timer_advance_ns, we can
      reach a case that we call hrtimer_start() with expiration time set at
      the past.
      
      Because lapic_timer.timer is init with HRTIMER_MODE_ABS_PINNED, it
      is not allowed to run in softirq and therefore will never expire.
      
      To avoid such a scenario, verify that deadline expiration time is set on
      host time-domain further than (now + lapic_timer_advance_ns).
      
      A future patch can also consider adding a min_timer_deadline_ns module parameter,
      similar to min_timer_period_us to avoid races that amount of ns it takes
      to run logic could still call hrtimer_start() with expiration timer set
      at the past.
      
      Reviewed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c09d65d9
    • Paolo Bonzini's avatar
      Merge tag 'kvm-ppc-fixes-5.1-1' of... · 78671ab4
      Paolo Bonzini authored
      Merge tag 'kvm-ppc-fixes-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
      
      KVM/PPC fixes for 5.1
      
      - Fix host hang in the HTM assist code for POWER9
      - Take srcu read lock around memslot lookup
      78671ab4
  2. Apr 16, 2019
  3. Apr 15, 2019
    • Sean Christopherson's avatar
      KVM: x86/mmu: Fix an inverted list_empty() check when zapping sptes · cfd32acf
      Sean Christopherson authored
      
      A recently introduced helper for handling zap vs. remote flush
      incorrectly bails early, effectively leaking defunct shadow pages.
      Manifests as a slab BUG when exiting KVM due to the shadow pages
      being alive when their associated cache is destroyed.
      
      ==========================================================================
      BUG kvm_mmu_page_header: Objects remaining in kvm_mmu_page_header on ...
      --------------------------------------------------------------------------
      Disabling lock debugging due to kernel taint
      INFO: Slab 0x00000000fc436387 objects=26 used=23 fp=0x00000000d023caee ...
      CPU: 6 PID: 4315 Comm: rmmod Tainted: G    B             5.1.0-rc2+ #19
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
      Call Trace:
       dump_stack+0x46/0x5b
       slab_err+0xad/0xd0
       ? on_each_cpu_mask+0x3c/0x50
       ? ksm_migrate_page+0x60/0x60
       ? on_each_cpu_cond_mask+0x7c/0xa0
       ? __kmalloc+0x1ca/0x1e0
       __kmem_cache_shutdown+0x13a/0x310
       shutdown_cache+0xf/0x130
       kmem_cache_destroy+0x1d5/0x200
       kvm_mmu_module_exit+0xa/0x30 [kvm]
       kvm_arch_exit+0x45/0x60 [kvm]
       kvm_exit+0x6f/0x80 [kvm]
       vmx_exit+0x1a/0x50 [kvm_intel]
       __x64_sys_delete_module+0x153/0x1f0
       ? exit_to_usermode_loop+0x88/0xc0
       do_syscall_64+0x4f/0x100
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: a2113634 ("KVM: x86/mmu: Split remote_flush+zap case out of kvm_mmu_flush_or_zap()")
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cfd32acf
  4. Apr 10, 2019
  5. Apr 09, 2019
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 869e3305
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Off by one and bounds checking fixes in NFC, from Dan Carpenter.
      
       2) There have been many weird regressions in r8169 since we turned ASPM
          support on, some are still not understood nor completely resolved.
          Let's turn this back off for now. From Heiner Kallweit.
      
       3) Signess fixes for ethtool speed value handling, from Michael
          Zhivich.
      
       4) Handle timestamps properly in macb driver, from Paul Thomas.
      
       5) Two erspan fixes, it's the usual "skb ->data potentially reallocated
          and we're holding a stale protocol header pointer". From Lorenzo
          Bianconi.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        bnxt_en: Reset device on RX buffer errors.
        bnxt_en: Improve RX consumer index validity check.
        net: macb driver, check for SKBTX_HW_TSTAMP
        qlogic: qlcnic: fix use of SPEED_UNKNOWN ethtool constant
        broadcom: tg3: fix use of SPEED_UNKNOWN ethtool constant
        ethtool: avoid signed-unsigned comparison in ethtool_validate_speed()
        net: ip6_gre: fix possible use-after-free in ip6erspan_rcv
        net: ip_gre: fix possible use-after-free in erspan_rcv
        r8169: disable ASPM again
        MAINTAINERS: ieee802154: update documentation file pattern
        net: vrf: Fix ping failed when vrf mtu is set to 0
        selftests: add a tc matchall test case
        nfc: nci: Potential off by one in ->pipes[] array
        NFC: nci: Add some bounds checking in nci_hci_cmd_received()
      869e3305
    • Linus Torvalds's avatar
      Merge branch 'fixes-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security · a556810d
      Linus Torvalds authored
      Pull TPM fixes from James Morris:
       "From Jarkko: These are critical fixes for v5.1. Contains also couple
        of new selftests for v5.1 features (partial reads in /dev/tpm0)"
      
      * 'fixes-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
        selftests/tpm2: Open tpm dev in unbuffered mode
        selftests/tpm2: Extend tests to cover partial reads
        KEYS: trusted: fix -Wvarags warning
        tpm: Fix the type of the return value in calc_tpm2_event_size()
        KEYS: trusted: allow trusted.ko to initialize w/o a TPM
        tpm: fix an invalid condition in tpm_common_poll
        tpm: turn on TPM on suspend for TPM 1.x
      a556810d
    • Linus Torvalds's avatar
      Merge tag 'xtensa-20190408' of git://github.com/jcmvbkbc/linux-xtensa · 10d43397
      Linus Torvalds authored
      Pull xtensa fixes from Max Filippov:
      
       - fix syscall number passed to trace_sys_exit
      
       - fix syscall number initialization in start_thread
      
       - fix level interpretation in the return_address
      
       - fix format string warning in init_pmd
      
      * tag 'xtensa-20190408' of git://github.com/jcmvbkbc/linux-xtensa:
        xtensa: fix format string warning in init_pmd
        xtensa: fix return_address
        xtensa: fix initialization of pt_regs::syscall in start_thread
        xtensa: use actual syscall number in do_syscall_trace_leave
      10d43397
  6. Apr 08, 2019
Loading