1. 23 Jun, 2020 1 commit
    • Sean Christopherson's avatar
      KVM: VMX: Stop context switching MSR_IA32_UMWAIT_CONTROL · bf09fb6c
      Sean Christopherson authored
      Remove support for context switching between the guest's and host's
      desired UMWAIT_CONTROL.  Propagating the guest's value to hardware isn't
      required for correct functionality, e.g. KVM intercepts reads and writes
      to the MSR, and the latency effects of the settings controlled by the
      MSR are not architecturally visible.
      As a general rule, KVM should not allow the guest to control power
      management settings unless explicitly enabled by userspace, e.g. see
      KVM_CAP_X86_DISABLE_EXITS.  E.g. Intel's SDM explicitly states that C0.2
      can improve the performance of SMT siblings.  A devious guest could
      disable C0.2 so as to improve the performance of their workloads at the
      detriment to workloads running in the host or on other VMs.
      Wholesale removal of UMWAIT_CONTROL context switching also fixes a race
      condition where updates from the host may cause KVM to enter the guest
      with the incorrect value.  Because updates are are propagated to all
      CPUs via IPI (SMP function callback), the value in hardware may be
      stale with respect to the cached value and KVM could enter the guest
      with the wrong value in hardware.  As above, the guest can't observe the
      bad value, but it's a weird and confusing wart in the implementation.
      Removal also fixes the unnecessary usage of VMX's atomic load/store MSR
      lists.  Using the lists is only necessary for MSRs that are required for
      correct functionality immediately upon VM-Enter/VM-Exit, e.g. EFER on
      old hardware, or for MSRs that need to-the-uop precision, e.g. perf
      related MSRs.  For UMWAIT_CONTROL, the effects are only visible in the
      kernel via TPAUSE/delay(), and KVM doesn't do any form of delay in
      vcpu_vmx_run().  Using the atomic lists is undesirable as they are more
      expensive than direct RDMSR/WRMSR.
      Furthermore, even if giving the guest control of the MSR is legitimate,
      e.g. in pass-through scenarios, it's not clear that the benefits would
      outweigh the overhead.  E.g. saving and restoring an MSR across a VMX
      roundtrip costs ~250 cycles, and if the guest diverged from the host
      that cost would be paid on every run of the guest.  In other words, if
      there is a legitimate use case then it should be enabled by a new
      per-VM capability.
      Note, KVM still needs to emulate MSR_IA32_UMWAIT_CONTROL so that it can
      correctly expose other WAITPKG features to the guest, e.g. TPAUSE,
      Fixes: 6e3ba4ab
       ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL")
      Cc: stable@vger.kernel.org
      Cc: Jingqi Liu <jingqi.liu@intel.com>
      Cc: Tao Xu <tao3.xu@intel.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200623005135.10414-1-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
  2. 07 May, 2020 2 commits
  3. 17 Feb, 2020 1 commit
  4. 08 Oct, 2019 1 commit
  5. 26 Jun, 2019 1 commit
  6. 06 Mar, 2019 1 commit
    • Thomas Gleixner's avatar
      x86/speculation/mds: Conditionally clear CPU buffers on idle entry · 07f07f55
      Thomas Gleixner authored
      Add a static key which controls the invocation of the CPU buffer clear
      mechanism on idle entry. This is independent of other MDS mitigations
      because the idle entry invocation to mitigate the potential leakage due to
      store buffer repartitioning is only necessary on SMT systems.
      Add the actual invocations to the different halt/mwait variants which
      covers all usage sites. mwaitx is not patched as it's not available on
      Intel CPUs.
      The buffer clear is only invoked before entering the C-State to prevent
      that stale data from the idling CPU is spilled to the Hyper-Thread sibling
      after the Store buffer got repartitioned and all entries are available to
      the non idle sibling.
      When coming out of idle the store buffer is partitioned again so each
      sibling has half of it available. Now CPU which returned from idle could be
      speculatively exposed to contents of the sibling, but the buffers are
      flushed either on exit to user space or on VMENTER.
      When later on conditional buffer clearing is implemented on top of this,
      then there is no action required either because before returning to user
      space the context switch will set the condition flag which causes a flush
      on the return to user path.
      Note, that the buffer clearing on idle is only sensible on CPUs which are
      solely affected by MSBDS and not any other variant of MDS because the other
      MDS variants cannot be mitigated when SMT is enabled, so the buffer
      clearing on idle would be a window dressing exercise.
      This intentionally does not handle the case in the acpi/processor_idle
      driver which uses the legacy IO port interface for C-State transitions for
      two reasons:
       - The acpi/processor_idle driver was replaced by the intel_idle driver
         almost a decade ago. Anything Nehalem upwards supports it and defaults
         to that new driver.
       - The legacy IO port interface is likely to be used on older and therefore
         unaffected CPUs or on systems which do not receive microcode updates
         anymore, so there is no point in adding that.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Reviewed-by: default avatarJon Masters <jcm@redhat.com>
      Tested-by: default avatarJon Masters <jcm@redhat.com>
  7. 02 Nov, 2017 1 commit
    • Greg Kroah-Hartman's avatar
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman authored
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      How this work was done:
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      Further patches will be generated in subsequent months to fix up cases
      where non-standard...
  8. 02 Mar, 2017 1 commit
  9. 20 Jul, 2016 1 commit
  10. 30 Jan, 2016 1 commit
  11. 22 Aug, 2015 1 commit
    • Huang Rui's avatar
      x86/asm: Add MONITORX/MWAITX instruction support · f9675674
      Huang Rui authored
      AMD Carrizo processors (Family 15h, Models 60h-6fh) added a new
      feature called MWAITX (MWAIT with extensions) as an extension to
      This new instruction controls a configurable timer which causes
      the core to exit wait state on timer expiration, in addition to
      "normal" MWAIT condition of reading from a monitored VA.
      Compared to MONITOR/MWAIT, there are minor differences in opcode
      and input parameters:
      MWAITX ECX[1]: enable timer if set
      MWAITX EBX[31:0]: max wait time expressed in SW P0 clocks ==
      TSC. The software P0 frequency is the same as the TSC frequency.
                      MWAIT                           MWAITX
      opcode          0f 01 c9           |            0f 01 fb
      ECX[0]                  value of RFLAGS.IF seen by instruction
      ECX[1]          unused/#GP if set  |            enable timer if set
      ECX[31:2]                     unused/#GP if set
      EAX                           unused (reserve for hint)
      EBX[31:0]       unused             |            max wait time (SW P0 == TSC)
                      MONITOR                         MONITORX
      opcode          0f 01 c8           |            0f 01 fa
      EAX                     (logical) address to monitor
      ECX                     #GP if not zero
      Max timeout = EBX/(TSC frequency)
      Signed-off-by: default avatarHuang Rui <ray.huang@amd.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Aaron Lu <aaron.lu@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andreas Herrmann <herrmann.der.user@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dirk Brandewie <dirk.j.brandewie@intel.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <bitbucket@online.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Li <tony.li@amd.com>
      Link: http://lkml.kernel.org/r/1439201994-28067-3-git-send-email-bp@alien8.de
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
  12. 16 Mar, 2015 1 commit
    • Len Brown's avatar
      sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power... · b253149b
      Len Brown authored
      sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
      In Linux-3.9 we removed the mwait_idle() loop:
        69fb3676 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
      The reasoning was that modern machines should be sufficiently
      happy during the boot process using the default_idle() HALT
      loop, until cpuidle loads and either acpi_idle or intel_idle
      invoke the newer MWAIT-with-hints idle loop.
      But two machines reported problems:
       1. Certain Core2-era machines support MWAIT-C1 and HALT only.
          MWAIT-C1 is preferred for optimal power and performance.
          But if they support just C1, cpuidle never loads and
          so they use the boot-time default idle loop forever.
       2. Some laptops will boot-hang if HALT is used,
          but will boot successfully if MWAIT is used.
          This appears to be a hidden assumption in BIOS SMI,
          that is presumably valid on the proprietary OS
          where the BIOS was validated.
      So here we effectively revert the patch above, restoring
      the mwait_idle() loop.  However, we don't bother restoring
      the idle=mwait cmdline parameter, since it appears to add
      no value.
      Maintainer notes:
        For 3.9, simply revert 69fb3676
        for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
        context For 3.11, 3.12, 3.13, this patch applies cleanly
      Tested-by: default avatarMike Galbraith <bitbucket@online.de>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      Acked-by: default avatarMike Galbraith <bitbucket@online.de>
      Cc: <stable@vger.kernel.org> # 3.9+
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ian Malone <ibmalone@gmail.com>
      Cc: Josh Boyer <jwboyer@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
      [ Ported to recent kernels. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
  13. 18 Jun, 2014 1 commit
  14. 13 Jan, 2014 1 commit
  15. 19 Dec, 2013 2 commits
  16. 09 Feb, 2013 2 commits
    • Len Brown's avatar
      intel_idle: remove assumption of one C-state per MWAIT flag · e022e7eb
      Len Brown authored
      Remove the assumption that cstate_tables are
      indexed by MWAIT flag values.  Each entry
      identifies itself via its own flags value.
      This change is needed to support multiple states
      that share the same MWAIT flags.
      Note that this can have an effect on what state is described
      by 'N' on cmdline intel_idle.max_cstate=N on some systems.
      intel_idle.max_cstate=0 still disables the driver
      intel_idle.max_cstate=1 still results in just C1(E)
      However, "place holders" in the sparse C-state name-space
      (eg. Atom) have been removed.
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
    • Len Brown's avatar
      intel_idle: remove use and definition of MWAIT_MAX_NUM_CSTATES · 137ecc77
      Len Brown authored
      Cosmetic only.
      They are both 8, so this patch has no functional change.
      The reason to change is that intel_idle will soon be able
      to export more than the 8 "major" states supported by MWAIT.
      When we hit that limit, it is important to know
      where the limit comes from.
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
  17. 17 Sep, 2010 1 commit