1. 06 Apr, 2018 6 commits
    • Randy Dunlap's avatar
      headers: untangle kmemleak.h from mm.h · 514c6032
      Randy Dunlap authored
      Currently <linux/slab.h> #includes <linux/kmemleak.h> for no obvious
      reason.  It looks like it's only a convenience, so remove kmemleak.h
      from slab.h and add <linux/kmemleak.h> to any users of kmemleak_* that
      don't already #include it.  Also remove <linux/kmemleak.h> from source
      files that do not use it.
      This is tested on i386 allmodconfig and x86_64 allmodconfig.  It would
      be good to run it through the 0day bot for other $ARCHes.  I have
      neither the horsepower nor the storage space for the other $ARCHes.
      Update: This patch has been extensively build-tested by both the 0day
      bot & kisskb/ozlabs build farms.  Both of them reported 2 build failures
      for which patches are included here (in v2).
      [ slab.h is the second most used header file after module.h; kernel.h is
        right there with slab.h. There could be some minor error in the
        counting due to some #includes having comments after them and I didn't
        combine all of those. ]
      [akpm@linux-foundation.org: security/keys/big_key.c needs vmalloc.h, per sfr]
      Link: http://lkml.kernel.org/r/e4309f98-3749-93e1-4bb7-d9501a39d015@infradead.org
      Link: http://kisskb.ellerman.id.au/kisskb/head/13396/Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Reported-by: Michael Ellerman <mpe@ellerman.id.au>	[2 build failures]
      Reported-by: Fengguang Wu <fengguang.wu@intel.com>	[2 build failures]
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Wei Yongjun <weiyongjun1@huawei.com>
      Cc: Luis R. Rodriguez <mcgrof@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
      Cc: John Johansen <john.johansen@canonical.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Huang Ying's avatar
      mm: fix races between swapoff and flush dcache · cb9f753a
      Huang Ying authored
      Thanks to commit 4b3ef9da ("mm/swap: split swap cache into 64MB
      trunks"), after swapoff the address_space associated with the swap
      device will be freed.  So page_mapping() users which may touch the
      address_space need some kind of mechanism to prevent the address_space
      from being freed during accessing.
      The dcache flushing functions (flush_dcache_page(), etc) in architecture
      specific code may access the address_space of swap device for anonymous
      pages in swap cache via page_mapping() function.  But in some cases
      there are no mechanisms to prevent the swap device from being swapoff,
      for example,
        CPU1					CPU2
        __get_user_pages()			swapoff()
            mapping = page_mapping()
              ...				  exit_swap_address_space()
              ...				    kvfree(spaces)
      The address space may be accessed after being freed.
      But from cachetlb.txt and Russell King, flush_dcache_page() only care
      about file cache pages, for anonymous pages, flush_anon_page() should be
      used.  The implementation of flush_dcache_page() in all architectures
      follows this too.  They will check whether page_mapping() is NULL and
      whether mapping_mapped() is true to determine whether to flush the
      dcache immediately.  And they will use interval tree (mapping->i_mmap)
      to find all user space mappings.  While mapping_mapped() and
      mapping->i_mmap isn't used by anonymous pages in swap cache at all.
      So, to fix the race between swapoff and flush dcache, __page_mapping()
      is add to return the address_space for file cache pages and NULL
      otherwise.  All page_mapping() invoking in flush dcache functions are
      replaced with page_mapping_file().
      [akpm@linux-foundation.org: simplify page_mapping_file(), per Mike]
      Link: http://lkml.kernel.org/r/20180305083634.15174-1-ying.huang@intel.comSigned-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Dan Williams's avatar
      mm, powerpc: use vma_kernel_pagesize() in vma_mmu_pagesize() · 09135cc5
      Dan Williams authored
      Patch series "mm, smaps: MMUPageSize for device-dax", v3.
      Similar to commit 31383c68 ("mm, hugetlbfs: introduce ->split() to
      vm_operations_struct") here is another occasion where we want
      special-case hugetlbfs/hstate enabling to also apply to device-dax.
      This prompts the question what other hstate conversions we might do
      beyond ->split() and ->pagesize(), but this appears to be the last of
      the usages of hstate_vma() in generic/non-hugetlbfs specific code paths.
      This patch (of 3):
      The current powerpc definition of vma_mmu_pagesize() open codes looking
      up the page size via hstate.  It is identical to the generic
      vma_kernel_pagesize() implementation.
      Now, vma_kernel_pagesize() is growing support for determining the page
      size of Device-DAX vmas in addition to the existing Hugetlbfs page size
      Ideally, if the powerpc vma_mmu_pagesize() used vma_kernel_pagesize() it
      would automatically benefit from any new vma-type support that is added
      to vma_kernel_pagesize().  However, the powerpc vma_mmu_pagesize() is
      prevented from calling vma_kernel_pagesize() due to a circular header
      dependency that requires vma_mmu_pagesize() to be defined before
      including <linux/hugetlb.h>.
      Break this circular dependency by defining the default vma_mmu_pagesize()
      as a __weak symbol to be overridden by the powerpc version.
      Link: http://lkml.kernel.org/r/151996254179.27922.2213728278535578744.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Jane Chu <jane.chu@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Pavel Tatashin's avatar
      x86/mm/memory_hotplug: determine block size based on the end of boot memory · 078eb6aa
      Pavel Tatashin authored
      Memory sections are combined into "memory block" chunks.  These chunks
      are the units upon which memory can be added and removed.
      On x86, the new memory may be added after the end of the boot memory,
      therefore, if block size does not align with end of boot memory, memory
      hot-plugging/hot-removing can be broken.
      Memory sections are combined into "memory block" chunks.  These chunks
      are the units upon which memory can be added and removed.
      On x86 the new memory may be added after the end of the boot memory,
      therefore, if block size does not align with end of boot memory, memory
      hotplugging/hotremoving can be broken.
      Currently, whenever machine is booted with more than 64G the block size
      is unconditionally increased to 2G from the base 128M.  This is done in
      order to reduce number of memory device files in sysfs:
      We must use the largest allowed block size that aligns to the next
      address to be able to hotplug the next block of memory.
      So, when memory is larger or equal to 64G, we check the end address and
      find the largest block size that is still power of two but smaller or
      equal to 2G.
      Before, the fix:
      Run qemu with:
      -m 64G,slots=2,maxmem=66G -object memory-backend-ram,id=mem1,size=2G
      (qemu) device_add pc-dimm,id=dimm1,memdev=mem1
      Block size [0x80000000] unaligned hotplug range: start 0x1040000000,
      							size 0x80000000
      acpi PNP0C80:00: add_memory failed
      acpi PNP0C80:00: acpi_memory_enable_device() error
      acpi PNP0C80:00: Enumeration failure
      With the fix memory is added successfully as the block size is set to
      1G, and therefore aligns with start address 0x1040000000.
      [pasha.tatashin@oracle.com: v4]
        Link: http://lkml.kernel.org/r/20180215165920.8570-3-pasha.tatashin@oracle.com
      Link: http://lkml.kernel.org/r/20180213193159.14606-3-pasha.tatashin@oracle.comSigned-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Baoquan He <bhe@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Anshuman Khandual's avatar
      mm/migrate: rename migration reason MR_CMA to MR_CONTIG_RANGE · 31025351
      Anshuman Khandual authored
      alloc_contig_range() initiates compaction and eventual migration for the
      purpose of either CMA or HugeTLB allocations.  At present, the reason
      code remains the same MR_CMA for either of these cases.  Let's make it
      MR_CONTIG_RANGE which will appropriately reflect the reason code in both
      these cases.
      Link: http://lkml.kernel.org/r/20180202091518.18798-1-khandual@linux.vnet.ibm.comSigned-off-by: default avatarAnshuman Khandual <khandual@linux.vnet.ibm.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Huacai Chen's avatar
      zboot: fix stack protector in compressed boot phase · 7bbaf27d
      Huacai Chen authored
      Calling __stack_chk_guard_setup() in decompress_kernel() is too late
      that stack checking always fails for decompress_kernel() itself.  So
      remove __stack_chk_guard_setup() and initialize __stack_chk_guard before
      we call decompress_kernel().
      Original code comes from ARM but also used for MIPS and SH, so fix them
      together.  If without this fix, compressed booting of these archs will
      fail because stack checking is enabled by default (>=4.16).
      Link: http://lkml.kernel.org/r/1522226933-29317-1-git-send-email-chenhc@lemote.com
      Fixes: 8779657d ("stackprotector: Introduce CONFIG_CC_STACKPROTECTOR_STRONG")
      Signed-off-by: default avatarHuacai Chen <chenhc@lemote.com>
      Acked-by: default avatarJames Hogan <jhogan@kernel.org>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarRich Felker <dalias@libc.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  2. 05 Apr, 2018 15 commits
  3. 04 Apr, 2018 9 commits
  4. 03 Apr, 2018 10 commits
    • Michael Ellerman's avatar
      powerpc/64s/idle: Consolidate power9_offline_stop()/power9_idle_stop() · d0b791c0
      Michael Ellerman authored
      Commit 3d4fbffd ("powerpc/64s/idle: POWER9 implement a separate
      idle stop function for hotplug") that added power9_offline_stop() was
      written before commit 7672691a ("powerpc/powernv: Provide a way to
      force a core into SMT4 mode").
      When merging the former I failed to notice that it caused us to skip
      the force-SMT4 logic for offline CPUs. The result is that offlined
      CPUs will not correctly participate in the force-SMT4 logic, which
      presumably will result in badness (not tested).
      Reconcile the two commits by making power9_offline_stop() a pre-cursor
      to power9_idle_stop(), so that they share the force-SMT4 logic.
      This is based on an original commit from Nick, all breakage is my own.
      Fixes: 3d4fbffd ("powerpc/64s/idle: POWER9 implement a separate idle stop function for hotplug")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
    • Palmer Dabbelt's avatar
      Palmer Dabbelt authored
      The device tree code looks for CONFIG_CMDLINE_FORCE, but we were using
      CONFIG_CMDLINE_OVERRIDE.  It looks like this was just a hold over from
      before our device tree conversion -- in fact, we'd already removed the
      support for CONFIG_CMDLINE_OVERRIDE from our arch-specific code so it
      didn't even work any more.
      Thanks to Mortiz and Trung for finding the original bug, and for Michael
      for suggeting a better fix.
      CC: Trung Tran <trung.tran@ettus.com>
      CC: Michael J Clark <mjc@sifive.com>
      Reviewed-by: default avatarMoritz Fischer <mdf@kernel.org>
      Signed-off-by: default avatarPalmer Dabbelt <palmer@sifive.com>
    • David S. Miller's avatar
      sparc64: Make atomic_xchg() an inline function rather than a macro. · d13864b6
      David S. Miller authored
      This avoids a lot of -Wunused warnings such as:
      kernel/debug/debug_core.c: In function ‘kgdb_cpu_enter’:
      ./arch/sparc/include/asm/cmpxchg_64.h:55:22: warning: value computed is not used [-Wunused-value]
       #define xchg(ptr,x) ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
      ./arch/sparc/include/asm/atomic_64.h:86:30: note: in expansion of macro ‘xchg’
       #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
      kernel/debug/debug_core.c:508:4: note: in expansion of macro ‘atomic_xchg’
          atomic_xchg(&kgdb_active, cpu);
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Nicholas Piggin's avatar
      powerpc/powernv: Always stop secondaries before reboot/shutdown · f2748bdf
      Nicholas Piggin authored
      Currently powernv reboot and shutdown requests just leave secondaries
      to do their own things. This is undesirable because they can trigger
      any number of watchdogs while waiting for reboot, but also we don't
      know what else they might be doing -- they might be causing trouble,
      trampling memory, etc.
      The opal scheduled flash update code already ran into watchdog problems
      due to flashing taking a long time, and it was fixed with 2196c6f1
      ("powerpc/powernv: Return secondary CPUs to firmware before FW update"),
      which returns secondaries to opal. It's been found that regular reboots
      can take over 10 seconds, which can result in the hard lockup watchdog
        reboot: Restarting system
        [  360.038896709,5] OPAL: Reboot request...
        Watchdog CPU:0 Hard LOCKUP
        Watchdog CPU:44 detected Hard LOCKUP other CPUS:16
        Watchdog CPU:16 Hard LOCKUP
        watchdog: BUG: soft lockup - CPU#16 stuck for 3s! [swapper/16:0]
      This patch removes the special case for flash update, and calls
      smp_send_stop in all cases before calling reboot/shutdown.
      smp_send_stop could return CPUs to OPAL, the main reason not to is
      that the request could come from a NMI that interrupts OPAL code,
      so re-entry to OPAL can cause a number of problems. Putting
      secondaries into simple spin loops improves the chances of a
      successful reboot.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: default avatarVasant Hegde <hegdevasant@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    • Nicholas Piggin's avatar
      powerpc: hard disable irqs in smp_send_stop loop · 855bfe0d
      Nicholas Piggin authored
      The hard lockup watchdog can fire under local_irq_disable
      on platforms with irq soft masking.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    • Nicholas Piggin's avatar
      powerpc: use NMI IPI for smp_send_stop · 6bed3237
      Nicholas Piggin authored
      Use the NMI IPI rather than smp_call_function for smp_send_stop.
      Have stopped CPUs hard disable interrupts rather than just soft
      This function is used in crash/panic/shutdown paths to bring other
      CPUs down as quickly and reliably as possible, and minimizing their
      potential to cause trouble.
      Avoiding the Linux smp_call_function infrastructure and (if supported)
      using true NMI IPIs makes this more robust.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    • Nicholas Piggin's avatar
      powerpc/powernv: Fix SMT4 forcing idle code · a2b5e056
      Nicholas Piggin authored
      The PSSCR value is not stored to PACA_REQ_PSSCR if the CPU does not
      have the XER[SO] bug.
      Fix this by storing up-front, outside the workaround code. The initial
      test is not required because it is a slow path.
      The workaround is made to depend on CONFIG_KVM_BOOK3S_HV_POSSIBLE, to
      match pnv_power9_force_smt4_catch() where it is used. Drop the comment
      on pnv_power9_force_smt4_catch() as it's no longer true.
      Fixes: 7672691a ("powerpc/powernv: Provide a way to force a core into SMT4 mode")
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    • Mauricio Faria de Oliveira's avatar
      powerpc/pseries: Restore default security feature flags on setup · 6232774f
      Mauricio Faria de Oliveira authored
      After migration the security feature flags might have changed (e.g.,
      destination system with unpatched firmware), but some flags are not
      set/clear again in init_cpu_char_feature_flags() because it assumes
      the security flags to be the defaults.
      Additionally, if the H_GET_CPU_CHARACTERISTICS hypercall fails then
      init_cpu_char_feature_flags() does not run again, which potentially
      might leave the system in an insecure or sub-optimal configuration.
      So, just restore the security feature flags to the defaults assumed
      by init_cpu_char_feature_flags() so it can set/clear them correctly,
      and to ensure safe settings are in place in case the hypercall fail.
      Fixes: f636c147 ("powerpc/pseries: Set or clear security feature flags")
      Depends-on: 19887d6a28e2 ("powerpc: Move default security feature flags")
      Signed-off-by: default avatarMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    • Mauricio Faria de Oliveira's avatar
      powerpc: Move default security feature flags · e7347a86
      Mauricio Faria de Oliveira authored
      This moves the definition of the default security feature flags
      (i.e., enabled by default) closer to the security feature flags.
      This can be used to restore current flags to the default flags.
      Signed-off-by: default avatarMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    • Nicholas Piggin's avatar
      powerpc: Don't write to DABR on >= Power8 if DAWR is disabled · 252988cb
      Nicholas Piggin authored
      flush_thread() calls __set_breakpoint() via set_debug_reg_defaults()
      without checking ppc_breakpoint_available(). On Power8 or later CPUs
      which have the DAWR feature disabled that will cause a write to the
      DABR which is incorrect as those CPUs don't have a DABR.
      Fix it two ways, by checking ppc_breakpoint_available() in
      set_debug_reg_defaults(), and also by reworking __set_breakpoint() to
      only write to DABR on Power7 or earlier.
      Fixes: 96541531 ("powerpc: Disable DAWR in the base POWER9 CPU features")
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      [mpe: Rework the logic in __set_breakpoint()]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>