1. 22 Mar, 2018 11 commits
  2. 21 Mar, 2018 29 commits
    • Stephen Rothwell's avatar
      55108726
    • Stephen Rothwell's avatar
      firmware: explicitly include vmalloc.h · ef534d35
      Stephen Rothwell authored
      After some other include file changes, fixes:
      
      drivers/base/firmware_loader/fallback.c: In function 'map_fw_priv_pages':
      drivers/base/firmware_loader/fallback.c:232:2: error: implicit declaration of function 'vunmap'; did you mean 'kunmap'? [-Werror=implicit-function-declaration]
        vunmap(fw_priv->data);
        ^~~~~~
        kunmap
      drivers/base/firmware_loader/fallback.c:233:18: error: implicit declaration of function 'vmap'; did you mean 'kmap'? [-Werror=implicit-function-declaration]
        fw_priv->data = vmap(fw_priv->pages, fw_priv->nr_pages, 0,
                        ^~~~
                        kmap
      drivers/base/firmware_loader/fallback.c:233:16: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
        fw_priv->data = vmap(fw_priv->pages, fw_priv->nr_pages, 0,
                      ^
      drivers/base/firmware_loader/fallback.c: In function 'firmware_loading_store':
      drivers/base/firmware_loader/fallback.c:274:4: error: implicit declaration of function 'vfree'; did you mean 'kvfree'? [-Werror=implicit-function-declaration]
          vfree(fw_priv->pages);
          ^~~~~
          kvfree
      drivers/base/firmware_loader/fallback.c: In function 'fw_realloc_pages':
      drivers/base/firmware_loader/fallback.c:405:15: error: implicit declaration of function 'vmalloc'; did you mean 'kvmalloc'? [-Werror=implicit-function-declaration]
         new_pages = vmalloc(new_array_size * sizeof(void *));
                     ^~~~~~~
                     kvmalloc
      drivers/base/firmware_loader/fallback.c:405:13: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
         new_pages = vmalloc(new_array_size * sizeof(void *));
                   ^
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      ef534d35
    • Stephen Rothwell's avatar
      Merge branch 'akpm/master' · fa8e9955
      Stephen Rothwell authored
      fa8e9955
    • Pavel Tatashin's avatar
      sparc64: NG4 memset 32 bits overflow · 83edb2ba
      Pavel Tatashin authored
      Early in boot Linux patches memset and memcpy to branch to platform
      optimized versions of these routines.  The NG4 (Niagra 4) versions are
      currently used on all platforms starting from T4.  Recently, there were M7
      optimized routines added into UEK4 but not into mainline yet.  So, even
      with M7 optimized routines NG4 are still going to be used on T4, T5, M5,
      and M6 processors.
      
      While investigating how to improve initialization time of dentry_hashtable
      which is 8G long on M6 ldom with 7T of main memory, I noticed that
      memset() does not reset all the memory in this array, after studying the
      code, I realized that NG4memset() branches use %icc register instead of
      %xcc to check compare, so if value of length is over 32-bit long, which is
      true for 8G array, these routines fail to work properly.
      
      The fix is to replace all %icc with %xcc in these routines.  (Alternative
      is to use %ncc, but this is misleading, as the code already has sparcv9
      only instructions, and cannot be compiled on 32-bit).
      
      This is important to fix this bug, because even older T4-4 can have 2T of
      memory, and there are large memory proportional data structures in kernel
      which can be larger than 4G in size.  The failing of memset() is silent
      and corruption is hard to detect.
      
      Link: http://lkml.kernel.org/r/1488432825-92126-2-git-send-email-pasha.tatashin@oracle.comSigned-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: default avatarBabu Moger <babu.moger@oracle.com>
      Cc: Babu Moger <babu.moger@amd.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      83edb2ba
    • Andi Kleen's avatar
      drivers/media/platform/sti/delta/delta-ipc.c: fix read buffer overflow · 93c6f636
      Andi Kleen authored
      The single caller passes a string to delta_ipc_open, which copies with a
      fixed size larger than the string.  So it copies some random data after
      the original string the ro segment.
      
      If the string was at the end of a page it may fault.
      
      Just copy the string with a normal strcpy after clearing the field.
      
      Found by a LTO build (which errors out)
      because the compiler inlines the functions and can resolve
      the string sizes and triggers the compile time checks in memcpy.
      
      In function `memcpy',
          inlined from `delta_ipc_open.constprop' at linux/drivers/media/platform/sti/delta/delta-ipc.c:178:0,
          inlined from `delta_mjpeg_ipc_open' at linux/drivers/media/platform/sti/delta/delta-mjpeg-dec.c:227:0,
          inlined from `delta_mjpeg_decode' at linux/drivers/media/platform/sti/delta/delta-mjpeg-dec.c:403:0:
      /home/andi/lsrc/linux/include/linux/string.h:337:0: error: call to `__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter
          __read_overflow2();
      
      Link: http://lkml.kernel.org/r/20171222001212.1850-1-andi@firstfloor.orgSigned-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Hugues FRUCHET <hugues.fruchet@st.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      93c6f636
    • Andrew Morton's avatar
      headers-untangle-kmemleakh-from-mmh-fix · fcdf579f
      Andrew Morton authored
      security/keys/big_key.c needs vmalloc.h, per sfr
      
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      fcdf579f
    • Randy Dunlap's avatar
      headers: untangle kmemleak.h from mm.h · 394bd586
      Randy Dunlap authored
      Currently <linux/slab.h> #includes <linux/kmemleak.h> for no obvious
      reason.  It looks like it's only a convenience, so remove kmemleak.h from
      slab.h and add <linux/kmemleak.h> to any users of kmemleak_* that don't
      already #include it.  Also remove <linux/kmemleak.h> from source files
      that do not use it.
      
      This is tested on i386 allmodconfig and x86_64 allmodconfig.  It would be
      good to run it through the 0day bot for other $ARCHes.  I have neither the
      horsepower nor the storage space for the other $ARCHes.
      
      Update: This patch has been extensively build-tested by both the 0day bot
      & kisskb/ozlabs build farms.  Both of them reported 2 build failures for
      which patches are included here (in v2).
      
      [slab.h is the second most used header file after module.h; kernel.h is
      right there with slab.h.  There could be some minor error in the counting
      due to some #includes having comments after them and I didn't combine all
      of those.]
      
      Link: http://lkml.kernel.org/r/e4309f98-3749-93e1-4bb7-d9501a39d015@infradead.org
      Link: http://kisskb.ellerman.id.au/kisskb/head/13396/Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Reported-by: Michael Ellerman <mpe@ellerman.id.au>	[2 build failures]
      Reported-by: Fengguang Wu <fengguang.wu@intel.com>	[2 build failures]
      Cc: Wei Yongjun <weiyongjun1@huawei.com>
      Cc: Luis R. Rodriguez <mcgrof@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
      Cc: John Johansen <john.johansen@canonical.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      394bd586
    • Andrew Morton's avatar
      fs-fsnotify-account-fsnotify-metadata-to-kmemcg-fix · 4969e6c4
      Andrew Morton authored
      fix CONFIG_MEMCG=n build
      Reported-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Amir Goldstein <amir73il@gmail.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      4969e6c4
    • Shakeel Butt's avatar
      fs: fsnotify: account fsnotify metadata to kmemcg · 1f212573
      Shakeel Butt authored
      A lot of memory can be consumed by the events generated for the huge or
      unlimited queues if there is either no or slow listener.  This can cause
      system level memory pressure or OOMs.  So, it's better to account the
      fsnotify kmem caches to the memcg of the listener.
      
      There are seven fsnotify kmem caches and among them allocations from
      dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and
      inotify_inode_mark_cachep happens in the context of syscall from the
      listener.  So, SLAB_ACCOUNT is enough for these caches.
      
      The objects from fsnotify_mark_connector_cachep are not accounted as they
      are small compared to the notification mark or events and it is unclear
      whom to account connector to since it is shared by all events attached to
      the inode.
      
      The allocations from the event caches happen in the context of the event
      producer.  For such caches we will need to remote charge the allocations
      to the listener's memcg.  Thus we save the memcg reference in the
      fsnotify_group structure of the listener.
      
      This patch has also moved the members of fsnotify_group to keep the size
      same, at least for 64 bit build, even with additional member by filling
      the holes.
      
      Link: http://lkml.kernel.org/r/20180305182951.34462-3-shakeelb@google.comSigned-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Cc: Amir Goldstein <amir73il@gmail.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      1f212573
    • Shakeel Butt's avatar
      mm: memcg: remote memcg charging for kmem allocations · c4e09540
      Shakeel Butt authored
      Patch series "Directed kmem charging", v4.
      
      This patchset introduces memcg variant memory allocation functions.  The
      caller can explicitly pass the memcg to charge for kmem allocations.
      Currently the kernel, for __GFP_ACCOUNT memory allocation requests,
      extract the memcg of the current task to charge for the kmem allocation.
      This patch series introduces kmem allocation functions where the caller
      can pass the pointer to the remote memcg.  The remote memcg will be
      charged for the allocation instead of the memcg of the caller.  However
      the caller must have a reference to the remote memcg.
      
      This patch (of 2):
      
      Introduce the memcg variant for kmalloc[_node] and
      kmem_cache_alloc[_node].  For kmem_cache_alloc, the kernel switches the
      root kmem cache with the memcg specific kmem cache for __GFP_ACCOUNT
      allocations to charge those allocations to the memcg.  However, the memcg
      to charge is extracted from the current task_struct.  This patch
      introduces the variant of kmem cache allocation functions where the memcg
      can be provided explicitly by the caller instead of deducing the memcg
      from the current task.
      
      The kmalloc allocations are underlying served using the kmem caches unless
      the size of the allocation request is larger than KMALLOC_MAX_CACHE_SIZE,
      in which case, the kmem caches are bypassed and the request is routed
      directly to page allocator.  So, for __GFP_ACCOUNT kmalloc allocations,
      the memcg of current task is charged.  This patch introduces memcg variant
      of kmalloc functions to allow callers to provide memcg for charging.
      
      These functions are useful for use-cases where the allocations should be
      charged to the memcg different from the memcg of the caller.  One such
      concrete use-case is the allocations for fsnotify event objects where the
      objects should be charged to the listener instead of the producer.
      
      One requirement to call these functions is that the caller must have the
      reference to the memcg.  Using kmalloc_memcg and kmem_cache_alloc_memcg
      implicitly assumes that the caller is requesting a __GFP_ACCOUNT
      allocation.
      
      Link: http://lkml.kernel.org/r/20180305182951.34462-2-shakeelb@google.comSigned-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Amir Goldstein <amir73il@gmail.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      c4e09540
    • Masahiro Yamada's avatar
      linux/const.h: refactor _BITUL and _BITULL a bit · d4ebeb6b
      Masahiro Yamada authored
      Minor cleanups available by _UL and _ULL.
      
      Link: http://lkml.kernel.org/r/1519301715-31798-5-git-send-email-yamada.masahiro@socionext.comSigned-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Russell King <rmk+kernel@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      d4ebeb6b
    • Masahiro Yamada's avatar
      linux/const.h: move UL() macro to include/linux/const.h · 19eca450
      Masahiro Yamada authored
      ARM, ARM64 and UniCore32 duplicate the definition of UL():
      
        #define UL(x) _AC(x, UL)
      
      This is not actually arch-specific, so it will be useful to move it to a
      common header.  Currently, we only have the uapi variant for
      linux/const.h, so I am creating include/linux/const.h.
      
      I also added _UL(), _ULL() and ULL() because _AC() is mostly used in
      the form either _AC(..., UL) or _AC(..., ULL).  I expect they will be
      replaced in follow-up cleanups.  The underscore-prefixed ones should
      be used for exported headers.
      
      Link: http://lkml.kernel.org/r/1519301715-31798-4-git-send-email-yamada.masahiro@socionext.comSigned-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: default avatarGuan Xuetao <gxt@mprc.pku.edu.cn>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      19eca450
    • Masahiro Yamada's avatar
      linux/const.h: prefix include guard of uapi/linux/const.h with _UAPI · 3d0a7277
      Masahiro Yamada authored
      Patch series "linux/const.h: cleanups of macros such as UL(), _BITUL(),
      BIT() etc", v3.
      
      ARM, ARM64, UniCore32 define UL() as a shorthand of _AC(..., UL).  More
      architectures may introduce it in the future.
      
      UL() is arch-agnostic, and useful. So let's move it to
      include/linux/const.h
      
      Currently, <asm/memory.h> must be included to use UL().  It pulls in more
      bloats just for defining some bit macros.
      
      I posted V2 one year ago.
      
      The previous posts are:
      https://patchwork.kernel.org/patch/9498273/
      https://patchwork.kernel.org/patch/9498275/
      https://patchwork.kernel.org/patch/9498269/
      https://patchwork.kernel.org/patch/9498271/
      
      At that time, what blocked this series was a comment from
      David Howells:
        You need to be very careful doing this.  Some userspace stuff
        depends on the guard macro names on the kernel header files.
      
      (https://patchwork.kernel.org/patch/9498275/)
      
      Looking at the code closer, I noticed this is not a problem.
      
      See the following line.
      https://github.com/torvalds/linux/blob/v4.16-rc2/scripts/headers_install.sh#L40
      
      scripts/headers_install.sh rips off _UAPI prefix from guard macro names.
      
      I ran "make headers_install" and confirmed the result is what I expect.
      
      So, we can prefix the include guard of include/uapi/linux/const.h,
      and add a new include/linux/const.h.
      
      This patch (of 4):
      
      I am going to add include/linux/const.h for the kernel space.
      
      Add _UAPI to the include guard of include/uapi/linux/const.h to
      prepare for that.
      
      Please notice the guard name of the exported one will be kept as-is.
      So, this commit has no impact to the userspace even if some userspace
      stuff depends on the guard macro names.
      
      scripts/headers_install.sh processes exported headers by SED, and
      rips off "_UAPI" from guard macro names.
      
        #ifndef _UAPI_LINUX_CONST_H
        #define _UAPI_LINUX_CONST_H
      
      will be turned into
      
        #ifndef _LINUX_CONST_H
        #define _LINUX_CONST_H
      
      Link: http://lkml.kernel.org/r/1519301715-31798-2-git-send-email-yamada.masahiro@socionext.comSigned-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Russell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      3d0a7277
    • Pavel Tatashin's avatar
      xen, mm: allow deferred page initialization for xen pv domains · 30c00983
      Pavel Tatashin authored
      Juergen Gross noticed that f7f99100 ("mm: stop zeroing memory during
      allocation in vmemmap") broke XEN PV domains when deferred struct page
      initialization is enabled.
      
      This is because the xen's PagePinned() flag is getting erased from struct
      pages when they are initialized later in boot.
      
      Juergen fixed this problem by disabling deferred pages on xen pv domains.
      It is desirable, however, to have this feature available as it reduces
      boot time.  This fix re-enables the feature for pv-dmains, and fixes the
      problem the following way:
      
      The fix is to delay setting PagePinned flag until struct pages for all
      allocated memory are initialized, i.e.  until after free_all_bootmem().
      
      A new x86_init.hyper op init_after_bootmem() is called to let xen know
      that boot allocator is done, and hence struct pages for all the allocated
      memory are now initialized.  If deferred page initialization is enabled,
      the rest of struct pages are going to be initialized later in boot once
      page_alloc_init_late() is called.
      
      xen_after_bootmem() walks page table's pages and marks them pinned.
      
      Link: http://lkml.kernel.org/r/20180226160112.24724-2-pasha.tatashin@oracle.comSigned-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Tested-by: default avatarJuergen Gross <jgross@suse.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Mathias Krause <minipli@googlemail.com>
      Cc: Jinbum Park <jinb.park7@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Jia Zhang <zhang.jia@linux.alibaba.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      30c00983
    • Michal Hocko's avatar
      elf: enforce MAP_FIXED on overlaying elf segments · 9f86d8a6
      Michal Hocko authored
      Anshuman has reported that with "fs, elf: drop MAP_FIXED usage from
      elf_map" applied, some ELF binaries in his environment fail to start with
      
       [   23.423642] 9148 (sed): Uhuuh, elf segment at 0000000010030000 requested but the memory is mapped already
       [   23.423706] requested [10030000, 10040000] mapped [10030000, 10040000] 100073 anon
      
      The reason is that the above binary has overlapping elf segments:
        LOAD           0x0000000000000000 0x0000000010000000 0x0000000010000000
                       0x0000000000013a8c 0x0000000000013a8c  R E    10000
        LOAD           0x000000000001fd40 0x000000001002fd40 0x000000001002fd40
                       0x00000000000002c0 0x00000000000005e8  RW     10000
        LOAD           0x0000000000020328 0x0000000010030328 0x0000000010030328
                       0x0000000000000384 0x00000000000094a0  RW     10000
      
      That binary has two RW LOAD segments, the first crosses a page border into
      the second
      
      0x1002fd40 (LOAD2-vaddr) + 0x5e8 (LOAD2-memlen) == 0x10030328 (LOAD3-vaddr)
      
      Handle this situation by enforcing MAP_FIXED when we establish a temporary
      brk VMA to handle overlapping segments.  All other mappings will still use
      MAP_FIXED_NOREPLACE.
      
      Link: http://lkml.kernel.org/r/20180213100440.GM3443@dhcp22.suse.czSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarAnshuman Khandual <khandual@linux.vnet.ibm.com>
      Reviewed-by: default avatarKhalid Aziz <khalid.aziz@oracle.com>
      Cc: Andrei Vagin <avagin@openvz.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Joel Stanley <joel@jms.id.au>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Mark Brown <broonie@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      9f86d8a6
    • Michal Hocko's avatar
      fs, elf: drop MAP_FIXED usage from elf_map · daebc11b
      Michal Hocko authored
      Both load_elf_interp and load_elf_binary rely on elf_map to map segments
      on a controlled address and they use MAP_FIXED to enforce that.  This is
      however dangerous thing prone to silent data corruption which can be even
      exploitable.  Let's take CVE-2017-1000253 as an example.  At the time
      (before eab09532 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"))
      ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from
      the stack top on 32b (legacy) memory layout (only 1GB away).  Therefore we
      could end up mapping over the existing stack with some luck.
      
      The issue has been fixed since then (a87938b2 ("fs/binfmt_elf.c: fix
      bug in loading of PIE binaries")), ELF_ET_DYN_BASE moved moved much
      further from the stack (eab09532 and later by c715b72c ("mm:
      revert x86_64 and arm64 ELF_ET_DYN_BASE base changes")) and excessive
      stack consumption early during execve fully stopped by da029c11
      ("exec: Limit arg stack to at most 75% of _STK_LIM").  So we should be
      safe and any attack should be impractical.  On the other hand this is just
      too subtle assumption so it can break quite easily and hard to spot.
      
      I believe that the MAP_FIXED usage in load_elf_binary (et.  al) is still
      fundamentally dangerous.  Moreover it shouldn't be even needed.  We are at
      the early process stage and so there shouldn't be unrelated mappings
      (except for stack and loader) existing so mmap for a given address should
      succeed even without MAP_FIXED.  Something is terribly wrong if this is
      not the case and we should rather fail than silently corrupt the
      underlying mapping.
      
      Address this issue by changing MAP_FIXED to the newly added
      MAP_FIXED_NOREPLACE.  This will mean that mmap will fail if there is an
      existing mapping clashing with the requested one without clobbering it.
      
      [mhocko@suse.com: fix build]
      [akpm@linux-foundation.org: coding-style fixes]
      [avagin@openvz.org: don't use the same value for MAP_FIXED_NOREPLACE and MAP_SYNC]
        Link: http://lkml.kernel.org/r/20171218184916.24445-1-avagin@openvz.org
      Link: http://lkml.kernel.org/r/20171213092550.2774-3-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrei Vagin <avagin@openvz.org>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarKhalid Aziz <khalid.aziz@oracle.com>
      Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Joel Stanley <joel@jms.id.au>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      daebc11b
    • Michal Hocko's avatar
      mm: introduce MAP_FIXED_NOREPLACE · 3be92363
      Michal Hocko authored
      Patch series "mm: introduce MAP_FIXED_NOREPLACE", v2.
      
      This has started as a follow up discussion [3][4] resulting in the runtime
      failure caused by hardening patch [5] which removes MAP_FIXED from the elf
      loader because MAP_FIXED is inherently dangerous as it might silently
      clobber an existing underlying mapping (e.g.  stack).  The reason for the
      failure is that some architectures enforce an alignment for the given
      address hint without MAP_FIXED used (e.g.  for shared or file backed
      mappings).
      
      One way around this would be excluding those archs which do alignment
      tricks from the hardening [6].  The patch is really trivial but it has
      been objected, rightfully so, that this screams for a more generic
      solution.  We basically want a non-destructive MAP_FIXED.
      
      The first patch introduced MAP_FIXED_NOREPLACE which enforces the given
      address but unlike MAP_FIXED it fails with EEXIST if the given range
      conflicts with an existing one.  The flag is introduced as a completely
      new one rather than a MAP_FIXED extension because of the backward
      compatibility.  We really want a never-clobber semantic even on older
      kernels which do not recognize the flag.  Unfortunately mmap sucks wrt.
      flags evaluation because we do not EINVAL on unknown flags.  On those
      kernels we would simply use the traditional hint based semantic so the
      caller can still get a different address (which sucks) but at least not
      silently corrupt an existing mapping.  I do not see a good way around
      that.  Except we won't export expose the new semantic to the userspace at
      all.
      
      It seems there are users who would like to have something like that.
      Jemalloc has been mentioned by Michael Ellerman [7]
      
      Florian Weimer has mentioned the following:
      : glibc ld.so currently maps DSOs without hints.  This means that the kernel
      : will map right next to each other, and the offsets between them a completely
      : predictable.  We would like to change that and supply a random address in a
      : window of the address space.  If there is a conflict, we do not want the
      : kernel to pick a non-random address. Instead, we would try again with a
      : random address.
      
      John Hubbard has mentioned CUDA example
      : a) Searches /proc/<pid>/maps for a "suitable" region of available
      : VA space.  "Suitable" generally means it has to have a base address
      : within a certain limited range (a particular device model might
      : have odd limitations, for example), it has to be large enough, and
      : alignment has to be large enough (again, various devices may have
      : constraints that lead us to do this).
      :
      : This is of course subject to races with other threads in the process.
      :
      : Let's say it finds a region starting at va.
      :
      : b) Next it does:
      :     p = mmap(va, ...)
      :
      : *without* setting MAP_FIXED, of course (so va is just a hint), to
      : attempt to safely reserve that region. If p != va, then in most cases,
      : this is a failure (almost certainly due to another thread getting a
      : mapping from that region before we did), and so this layer now has to
      : call munmap(), before returning a "failure: retry" to upper layers.
      :
      :     IMPROVEMENT: --> if instead, we could call this:
      :
      :             p = mmap(va, ... MAP_FIXED_NOREPLACE ...)
      :
      :         , then we could skip the munmap() call upon failure. This
      :         is a small thing, but it is useful here. (Thanks to Piotr
      :         Jaroszynski and Mark Hairgrove for helping me get that detail
      :         exactly right, btw.)
      :
      : c) After that, CUDA suballocates from p, via:
      :
      :      q = mmap(sub_region_start, ... MAP_FIXED ...)
      :
      : Interestingly enough, "freeing" is also done via MAP_FIXED, and
      : setting PROT_NONE to the subregion. Anyway, I just included (c) for
      : general interest.
      
      Atomic address range probing in the multithreaded programs in general
      sounds like an interesting thing to me.
      
      The second patch simply replaces MAP_FIXED use in elf loader by
      MAP_FIXED_NOREPLACE.  I believe other places which rely on MAP_FIXED
      should follow.  Actually real MAP_FIXED usages should be docummented
      properly and they should be more of an exception.
      
      [1] http://lkml.kernel.org/r/20171116101900.13621-1-mhocko@kernel.org
      [2] http://lkml.kernel.org/r/20171129144219.22867-1-mhocko@kernel.org
      [3] http://lkml.kernel.org/r/20171107162217.382cd754@canb.auug.org.au
      [4] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com
      [5] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko@kernel.org
      [6] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y@dhcp22.suse.cz
      [7] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au
      
      This patch (of 2):
      
      MAP_FIXED is used quite often to enforce mapping at the particular range.
      The main problem of this flag is, however, that it is inherently dangerous
      because it unmaps existing mappings covered by the requested range.  This
      can cause silent memory corruptions.  Some of them even with serious
      security implications.  While the current semantic might be really
      desiderable in many cases there are others which would want to enforce the
      given range but rather see a failure than a silent memory corruption on a
      clashing range.  Please note that there is no guarantee that a given range
      is obeyed by the mmap even when it is free - e.g.  arch specific code is
      allowed to apply an alignment.
      
      Introduce a new MAP_FIXED_NOREPLACE flag for mmap to achieve this
      behavior.  It has the same semantic as MAP_FIXED wrt.  the given address
      request with a single exception that it fails with EEXIST if the requested
      address is already covered by an existing mapping.  We still do rely on
      get_unmaped_area to handle all the arch specific MAP_FIXED treatment and
      check for a conflicting vma after it returns.
      
      The flag is introduced as a completely new one rather than a MAP_FIXED
      extension because of the backward compatibility.  We really want a
      never-clobber semantic even on older kernels which do not recognize the
      flag.  Unfortunately mmap sucks wrt.  flags evaluation because we do not
      EINVAL on unknown flags.  On those kernels we would simply use the
      traditional hint based semantic so the caller can still get a different
      address (which sucks) but at least not silently corrupt an existing
      mapping.  I do not see a good way around that.
      
      [mpe@ellerman.id.au: fix whitespace]
      [fail on clashing range with EEXIST as per Florian Weimer]
      [set MAP_FIXED before round_hint_to_min as per Khalid Aziz]
      Link: http://lkml.kernel.org/r/20171213092550.2774-2-mhocko@kernel.orgReviewed-by: default avatarKhalid Aziz <khalid.aziz@oracle.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Russell King - ARM Linux <linux@armlinux.org.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Joel Stanley <joel@jms.id.au>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Jason Evans <jasone@google.com>
      Cc: David Goldblatt <davidtgoldblatt@gmail.com>
      Cc: Edward Tomasz Napierała <trasz@FreeBSD.org>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      3be92363
    • AKASHI Takahiro's avatar
      kexec_file, x86: move re-factored code to generic side · 5fcd2e4a
      AKASHI Takahiro authored
      In the previous patches, commonly-used routines, exclude_mem_range() and
      prepare_elf64_headers(), were carved out.  Now place them in kexec common
      code.  A prefix "crash_" is given to each of their names to avoid possible
      name collisions.
      
      Link: http://lkml.kernel.org/r/20180306102303.9063-8-takahiro.akashi@linaro.orgSigned-off-by: default avatarAKASHI Takahiro <takahiro.akashi@linaro.org>
      Acked-by: default avatarDave Young <dyoung@redhat.com>
      Tested-by: default avatarDave Young <dyoung@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      5fcd2e4a
    • AKASHI Takahiro's avatar
      x86: kexec_file: clean up prepare_elf64_headers() · 7f21eae4
      AKASHI Takahiro authored
      Removing bufp variable in prepare_elf64_headers() makes the code simpler
      and more understandable.
      
      Link: http://lkml.kernel.org/r/20180306102303.9063-7-takahiro.akashi@linaro.orgSigned-off-by: default avatarAKASHI Takahiro <takahiro.akashi@linaro.org>
      Acked-by: default avatarDave Young <dyoung@redhat.com>
      Tested-by: default avatarDave Young <dyoung@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      7f21eae4
    • AKASHI Takahiro's avatar
      x86: kexec_file: lift CRASH_MAX_RANGES limit on crash_mem buffer · ddc7dc55
      AKASHI Takahiro authored
      While CRASH_MAX_RANGES (== 16) seems to be good enough, fixed-number array
      is not a good idea in general.
      
      In this patch, size of crash_mem buffer is calculated as before and the
      buffer is now dynamically allocated.  This change also allows removing
      crash_elf_data structure.
      
      Link: http://lkml.kernel.org/r/20180306102303.9063-6-takahiro.akashi@linaro.orgSigned-off-by: default avatarAKASHI Takahiro <takahiro.akashi@linaro.org>
      Acked-by: default avatarDave Young <dyoung@redhat.com>
      Tested-by: default avatarDave Young <dyoung@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      ddc7dc55
    • AKASHI Takahiro's avatar
      x86: kexec_file: remove X86_64 dependency from prepare_elf64_headers() · 8c8e99ce
      AKASHI Takahiro authored
      The code guarded by CONFIG_X86_64 is necessary on some architectures which
      have a dedicated kernel mapping outside of linear memory mapping.  (arm64
      is among those.)
      
      In this patch, an additional argument, kernel_map, is added to enable/
      disable the code removing #ifdef.
      
      Link: http://lkml.kernel.org/r/20180306102303.9063-5-takahiro.akashi@linaro.orgSigned-off-by: default avatarAKASHI Takahiro <takahiro.akashi@linaro.org>
      Acked-by: default avatarDave Young <dyoung@redhat.com>
      Tested-by: default avatarDave Young <dyoung@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      8c8e99ce
    • AKASHI Takahiro's avatar
      x86: kexec_file: purge system-ram walking from prepare_elf64_headers() · 10444715
      AKASHI Takahiro authored
      While prepare_elf64_headers() in x86 looks pretty generic for other
      architectures' use, it contains some code which tries to list crash memory
      regions by walking through system resources, which is not always
      architecture agnostic.  To make this function more generic, the related
      code should be purged.
      
      In this patch, prepare_elf64_headers() simply scans crash_mem buffer
      passed and add all the listed regions to elf header as a PT_LOAD segment.
      So walk_system_ram_res(prepare_elf64_headers_callback) have been moved
      forward before prepare_elf64_headers() where the callback,
      prepare_elf64_headers_callback(), is now responsible for filling up
      crash_mem buffer.
      
      Meanwhile exclude_elf_header_ranges() used to be called every time in this
      callback it is rather redundant and now called only once in
      prepare_elf_headers() as well.
      
      Link: http://lkml.kernel.org/r/20180306102303.9063-4-takahiro.akashi@linaro.orgSigned-off-by: default avatarAKASHI Takahiro <takahiro.akashi@linaro.org>
      Acked-by: default avatarDave Young <dyoung@redhat.com>
      Tested-by: default avatarDave Young <dyoung@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      10444715
    • AKASHI Takahiro's avatar
      kexec_file,x86,powerpc: factor out kexec_file_ops functions · 20675450
      AKASHI Takahiro authored
      As arch_kexec_kernel_image_{probe,load}(),
      arch_kimage_file_post_load_cleanup() and arch_kexec_kernel_verify_sig()
      are almost duplicated among architectures, they can be commonalized with
      an architecture-defined kexec_file_ops array.  So let's factor them out.
      
      Link: http://lkml.kernel.org/r/20180306102303.9063-3-takahiro.akashi@linaro.orgSigned-off-by: default avatarAKASHI Takahiro <takahiro.akashi@linaro.org>
      Acked-by: default avatarDave Young <dyoung@redhat.com>
      Tested-by: default avatarDave Young <dyoung@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      20675450
    • AKASHI Takahiro's avatar
    • AKASHI Takahiro's avatar
      kexec_file: make use of purgatory optional · 53ae7910
      AKASHI Takahiro authored
      Patch series "kexec_file, x86, powerpc: refactoring for other
      architecutres", v2.
      
      This is a preparatory patchset for adding kexec_file support on arm64.
      
      It was originally included in a arm64 patch set[1], but Philipp is also
      working on their kexec_file support on s390[2] and some changes are now
      conflicting.
      
      So these common parts were extracted and put into a separate patch set for
      better integration.  What's more, my original patch#4 was split into a few
      small chunks for easier review after Dave's comment.
      
      As such, the resulting code is basically identical with my original, and
      the only *visible* differences are:
        * renamings of _kexec_kernel_image_probe() and
          _kimage_file_post_load_cleanup()
        * change one of types of arguments at prepare_elf64_headers()
      
      Those, unfortunately, require a couple of trivial changes on the rest
      (#1, #6 to #13) of my arm64 kexec_file patch set[1].
      
      Patch #1 allows making a use of purgatory optional, particularly useful
      for arm64.
      
      Patch #2 commonalizes arch_kexec_kernel_{image_probe, image_load,
      verify_sig}() and arch_kimage_file_post_load_cleanup() across
      architectures.
      
      Patches #3-#7 are also intended to generalize parse_elf64_headers(), along
      with exclude_mem_range(), to be made best re-use of.
      
      [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-February/561182.html
      [2] http://lkml.iu.edu//hypermail/linux/kernel/1802.1/02596.html
      
      This patch (of 7):
      
      On arm64, crash dump kernel's usable memory is protected by *unmapping* it
      from kernel virtual space unlike other architectures where the region is
      just made read-only.  It is highly unlikely that the region is
      accidentally corrupted and this observation rationalizes that digest check
      code can also be dropped from purgatory.  The resulting code is so simple
      as it doesn't require a bit ugly re-linking/relocation stuff, i.e.
      arch_kexec_apply_relocations_add().
      
      Please see:
         http://lists.infradead.org/pipermail/linux-arm-kernel/2017-December/545428.html
      All that the purgatory does is to shuffle arguments and jump into a new
      kernel, while we still need to have some space for a hash value
      (purgatory_sha256_digest) which is never checked against.
      
      As such, it doesn't make sense to have trampline code between old kernel
      and new kernel on arm64.
      
      This patch introduces a new configuration, ARCH_HAS_KEXEC_PURGATORY, and
      allows related code to be compiled in only if necessary.
      
      Link: http://lkml.kernel.org/r/20180306102303.9063-2-takahiro.akashi@linaro.orgSigned-off-by: default avatarAKASHI Takahiro <takahiro.akashi@linaro.org>
      Acked-by: default avatarDave Young <dyoung@redhat.com>
      Tested-by: default avatarDave Young <dyoung@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      53ae7910
    • Dan Carpenter's avatar
      ipc/mqueue: add missing error code in init_mqueue_fs() · b0a42cd3
      Dan Carpenter authored
      We should propogate the error code here but we accidentally return
      success.
      
      Link: http://lkml.kernel.org/r/20180109092919.ndrvscdllrmzz6jo@mwanda
      Fixes: 946086abeddf ("mqueue: switch to on-demand creation of internal mount")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      b0a42cd3
    • Stephen Rothwell's avatar
      Merge branch 'akpm-current/current' · 4694c435
      Stephen Rothwell authored
      4694c435
    • Stephen Rothwell's avatar
      6b742018
    • Stephen Rothwell's avatar
      d486d425