1. 19 Jan, 2018 2 commits
  2. 17 Jan, 2018 14 commits
  3. 16 Jan, 2018 6 commits
  4. 14 Jan, 2018 12 commits
    • Tom Lendacky's avatar
      x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros · 28d437d5
      Tom Lendacky authored
      
      
      The PAUSE instruction is currently used in the retpoline and RSB filling
      macros as a speculation trap.  The use of PAUSE was originally suggested
      because it showed a very, very small difference in the amount of
      cycles/time used to execute the retpoline as compared to LFENCE.  On AMD,
      the PAUSE instruction is not a serializing instruction, so the pause/jmp
      loop will use excess power as it is speculated over waiting for return
      to mispredict to the correct target.
      
      The RSB filling macro is applicable to AMD, and, if software is unable to
      verify that LFENCE is serializing on AMD (possible when running under a
      hypervisor), the generic retpoline support will be used and, so, is also
      applicable to AMD.  Keep the current usage of PAUSE for Intel, but add an
      LFENCE instruction to the speculation trap for AMD.
      
      The same sequence has been adopted by GCC for the GCC generated retpolines.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Acked-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Acked-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
      Cc: Kees Cook <keescook@google.com>
      Link: https://lkml.kernel.org/r/20180113232730.31060.36287.stgit@tlendack-t1.amdoffice.net
      28d437d5
    • David Woodhouse's avatar
      x86/retpoline: Fill RSB on context switch for affected CPUs · c995efd5
      David Woodhouse authored
      
      
      On context switch from a shallow call stack to a deeper one, as the CPU
      does 'ret' up the deeper side it may encounter RSB entries (predictions for
      where the 'ret' goes to) which were populated in userspace.
      
      This is problematic if neither SMEP nor KPTI (the latter of which marks
      userspace pages as NX for the kernel) are active, as malicious code in
      userspace may then be executed speculatively.
      
      Overwrite the CPU's return prediction stack with calls which are predicted
      to return to an infinite loop, to "capture" speculation if this
      happens. This is required both for retpoline, and also in conjunction with
      IBRS for !SMEP && !KPTI.
      
      On Skylake+ the problem is slightly different, and an *underflow* of the
      RSB may cause errant branch predictions to occur. So there it's not so much
      overwrite, as *filling* the RSB to attempt to prevent it getting
      empty. This is only a partial solution for Skylake+ since there are many
      other conditions which may result in the RSB becoming empty. The full
      solution on Skylake+ is to use IBRS, which will prevent the problem even
      when the RSB becomes empty. With IBRS, the RSB-stuffing will not be
      required on context switch.
      
      [ tglx: Added missing vendor check and slighty massaged comments and
        	changelog ]
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: thomas.lendacky@amd.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: https://lkml.kernel.org/r/1515779365-9032-1-git-send-email-dwmw@amazon.co.uk
      c995efd5
    • Andrey Ryabinin's avatar
      x86/kasan: Panic if there is not enough memory to boot · 0d39e266
      Andrey Ryabinin authored
      
      
      Currently KASAN doesn't panic in case it don't have enough memory
      to boot. Instead, it crashes in some random place:
      
       kernel BUG at arch/x86/mm/physaddr.c:27!
      
       RIP: 0010:__phys_addr+0x268/0x276
       Call Trace:
        kasan_populate_shadow+0x3f2/0x497
        kasan_init+0x12e/0x2b2
        setup_arch+0x2825/0x2a2c
        start_kernel+0xc8/0x15f4
        x86_64_start_reservations+0x2a/0x2c
        x86_64_start_kernel+0x72/0x75
        secondary_startup_64+0xa5/0xb0
      
      Use memblock_virt_alloc_try_nid() for allocations without failure
      fallback. It will panic with an out of memory message.
      Reported-by: default avatarkernel test robot <xiaolong.ye@intel.com>
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: kasan-dev@googlegroups.com
      Cc: Alexander Potapenko <glider@google.com>
      Cc: lkp@01.org
      Link: https://lkml.kernel.org/r/20180110153602.18919-1-aryabinin@virtuozzo.com
      0d39e266
    • Thomas Gleixner's avatar
      x86/retpoline: Remove compile time warning · b8b9ce4b
      Thomas Gleixner authored
      Remove the compile time warning when CONFIG_RETPOLINE=y and the compiler
      does not have retpoline support. Linus rationale for this is:
      
        It's wrong because it will just make people turn off RETPOLINE, and the
        asm updates - and return stack clearing - that are independent of the
        compiler are likely the most important parts because they are likely the
        ones easiest to target.
      
        And it's annoying because most people won't be able to do anything about
        it. The number of people building their own compiler? Very small. So if
        their distro hasn't got a compiler yet (and pretty much nobody does), the
        warning is just annoying crap.
      
        It is already properly reported as part of the sysfs interface. The
        compile-time warning only encourages bad things.
      
      Fixes: 76b04384
      
       ("x86/retpoline: Add initial retpoline support")
      Requested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: thomas.lendacky@amd.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
      Link: https://lkml.kernel.org/r/CA+55aFzWgquv4i6Mab6bASqYXg3ErV3XDFEYf=GEcCDQg5uAtw@mail.gmail.com
      b8b9ce4b
    • Andi Kleen's avatar
      x86/idt: Mark IDT tables __initconst · 327867fa
      Andi Kleen authored
      const variables must use __initconst, not __initdata.
      
      Fix this up for the IDT tables, which got it consistently wrong.
      
      Fixes: 16bc18d8
      
       ("x86/idt: Move 32-bit idt_descr to C code")
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20171222001821.2157-7-andi@firstfloor.org
      327867fa
    • Ville Syrjälä's avatar
      Revert "x86/apic: Remove init_bsp_APIC()" · fc90ccfd
      Ville Syrjälä authored
      This reverts commit b371ae0d
      
      . It causes
      boot hangs on old P3/P4 systems when the local APIC is enforced in UP mode.
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: yinghai@kernel.org
      Cc: bhe@redhat.com
      Link: https://lkml.kernel.org/r/20171128145350.21560-1-ville.syrjala@linux.intel.com
      fc90ccfd
    • Eric W. Biederman's avatar
      x86/mm/pkeys: Fix fill_sig_info_pkey · beacd6f7
      Eric W. Biederman authored
      SEGV_PKUERR is a signal specific si_code which happens to have the same
      numeric value as several others: BUS_MCEERR_AR, ILL_ILLTRP, FPE_FLTOVF,
      TRAP_HWBKPT, CLD_TRAPPED, POLL_ERR, SEGV_THREAD_ID, as such it is not safe
      to just test the si_code the signal number must also be tested to prevent a
      false positive in fill_sig_info_pkey.
      
      This error was by inspection, and BUS_MCEERR_AR appears to be a real
      candidate for confusion.  So pass in si_signo and check for SIG_SEGV to
      verify that it is actually a SEGV_PKUERR
      
      Fixes: 019132ff
      
       ("x86/mm/pkeys: Fill in pkey field in siginfo")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20180112203135.4669-2-ebiederm@xmission.com
      beacd6f7
    • Len Brown's avatar
      x86/tsc: Print tsc_khz, when it differs from cpu_khz · 4b5b2127
      Len Brown authored
      
      
      If CPU and TSC frequency are the same the printout of the CPU frequency is
      valid for the TSC as well:
      
            tsc: Detected 2900.000 MHz processor
      
      If the TSC frequency is different there is no information in dmesg. Add a
      conditional printout:
      
        tsc: Detected 2904.000 MHz TSC
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: peterz@infradead.org
      Link: https://lkml.kernel.org/r/537b342debcd8e8aebc8d631015dcdf9f9ba8a26.1513920414.git.len.brown@intel.com
      4b5b2127
    • Len Brown's avatar
      x86/tsc: Fix erroneous TSC rate on Skylake Xeon · b5112030
      Len Brown authored
      The INTEL_FAM6_SKYLAKE_X hardcoded crystal_khz value of 25MHZ is
      problematic:
      
       - SKX workstations (with same model # as server variants) use a 24 MHz
         crystal.  This results in a -4.0% time drift rate on SKX workstations.
      
       - SKX servers subject the crystal to an EMI reduction circuit that reduces its
         actual frequency by (approximately) -0.25%.  This results in -1 second per
         10 minute time drift as compared to network time.
      
      This issue can also trigger a timer and power problem, on configurations
      that use the LAPIC timer (versus the TSC deadline timer).  Clock ticks
      scheduled with the LAPIC timer arrive a few usec before the time they are
      expected (according to the slow TSC).  This causes Linux to poll-idle, when
      it should be in an idle power saving state.  The idle and clock code do not
      graciously recover from this error, sometimes resulting in significant
      polling and measurable power impact.
      
      Stop using native_calibrate_tsc() for INTEL_FAM6_SKYLAKE_X.
      native_calibrate_tsc() will return 0, boot will run with tsc_khz = cpu_khz,
      and the TSC refined calibration will update tsc_khz to correct for the
      difference.
      
      [ tglx: Sanitized change log ]
      
      Fixes: 6baf3d61
      
       ("x86/tsc: Add additional Intel CPU models to the crystal quirk list")
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: peterz@infradead.org
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/ff6dcea166e8ff8f2f6a03c17beab2cb436aa779.1513920414.git.len.brown@intel.com
      b5112030
    • Len Brown's avatar
      x86/tsc: Future-proof native_calibrate_tsc() · da4ae6c4
      Len Brown authored
      If the crystal frequency cannot be determined via CPUID(15).crystal_khz or
      the built-in table then native_calibrate_tsc() will still set the
      X86_FEATURE_TSC_KNOWN_FREQ flag which prevents the refined TSC calibration.
      
      As a consequence such systems use cpu_khz for the TSC frequency which is
      incorrect when cpu_khz != tsc_khz resulting in time drift.
      
      Return early when the crystal frequency cannot be retrieved without setting
      the X86_FEATURE_TSC_KNOWN_FREQ flag. This ensures that the refined TSC
      calibration is invoked.
      
      [ tglx: Steam-blastered changelog. Sigh ]
      
      Fixes: 4ca4df0b
      
       ("x86/tsc: Mark TSC frequency determined by CPUID as known")
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: peterz@infradead.org
      Cc: Bin Gao <bin.gao@intel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/0fe2503aa7d7fc69137141fc705541a78101d2b9.1513920414.git.len.brown@intel.com
      da4ae6c4
    • Peter Zijlstra's avatar
      x86,perf: Disable intel_bts when PTI · 99a9dc98
      Peter Zijlstra authored
      The intel_bts driver does not use the 'normal' BTS buffer which is exposed
      through the cpu_entry_area but instead uses the memory allocated for the
      perf AUX buffer.
      
      This obviously comes apart when using PTI because then the kernel mapping;
      which includes that AUX buffer memory; disappears. Fixing this requires to
      expose a mapping which is visible in all context and that's not trivial.
      
      As a quick fix disable this driver when PTI is enabled to prevent
      malfunction.
      
      Fixes: 385ce0ea
      
       ("x86/mm/pti: Add Kconfig")
      Reported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Reported-by: default avatarRobert Święcki <robert@swiecki.net>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: greg@kroah.com
      Cc: hughd@google.com
      Cc: luto@amacapital.net
      Cc: Vince Weaver <vince@deater.net>
      Cc: torvalds@linux-foundation.org
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20180114102713.GB6166@worktop.programming.kicks-ass.net
      99a9dc98
    • Thomas Gleixner's avatar
      x86/pti: Fix !PCID and sanitize defines · f10ee3dc
      Thomas Gleixner authored
      The switch to the user space page tables in the low level ASM code sets
      unconditionally bit 12 and bit 11 of CR3. Bit 12 is switching the base
      address of the page directory to the user part, bit 11 is switching the
      PCID to the PCID associated with the user page tables.
      
      This fails on a machine which lacks PCID support because bit 11 is set in
      CR3. Bit 11 is reserved when PCID is inactive.
      
      While the Intel SDM claims that the reserved bits are ignored when PCID is
      disabled, the AMD APM states that they should be cleared.
      
      This went unnoticed as the AMD APM was not checked when the code was
      developed and reviewed and test systems with Intel CPUs never failed to
      boot. The report is against a Centos 6 host where the guest fails to boot,
      so it's not yet clear whether this is a virt issue or can happen on real
      hardware too, but thats irrelevant as the AMD APM clearly ask for clearing
      the reserved bits.
      
      Make sure that on non PCID machines bit 11 is not set by the page table
      switching code.
      
      Andy suggested to rename the related bits and masks so they are clearly
      describing what they should be used for, which is done as well for clarity.
      
      That split could have been done with alternatives but the macro hell is
      horrible and ugly. This can be done on top if someone cares to remove the
      extra orq. For now it's a straight forward fix.
      
      Fixes: 6fd166aa
      
       ("x86/mm: Use/Fix PCID to optimize user/kernel switches")
      Reported-by: default avatarLaura Abbott <labbott@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable <stable@vger.kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801140009150.2371@nanos
      f10ee3dc
  5. 12 Jan, 2018 3 commits
  6. 11 Jan, 2018 3 commits