1. 29 Mar, 2020 4 commits
    • Roman Gushchin's avatar
      mm: fork: fix kernel_stack memcg stats for various stack implementations · 8380ce47
      Roman Gushchin authored
      Depending on CONFIG_VMAP_STACK and the THREAD_SIZE / PAGE_SIZE ratio the
      space for task stacks can be allocated using __vmalloc_node_range(),
      alloc_pages_node() and kmem_cache_alloc_node().
      
      In the first and the second cases page->mem_cgroup pointer is set, but
      in the third it's not: memcg membership of a slab page should be
      determined using the memcg_from_slab_page() function, which looks at
      page->slab_cache->memcg_params.memcg .  In this case, using
      mod_memcg_page_state() (as in account_kernel_stack()) is incorrect:
      page->mem_cgroup pointer is NULL even for pages charged to a non-root
      memory cgroup.
      
      It can lead to kernel_stack per-memcg counters permanently showing 0 on
      some architectures (depending on the configuration).
      
      In order to fix it, let's introduce a mod_memcg_obj_state() helper,
      which takes a pointer to a kernel object as a first argument, uses
      mem_cgroup_from_obj() to get a RCU-protected memcg pointer and calls
      mod_memcg_state().  It allows to handle all possible configurations
      (CONFIG_VMAP_STACK and various THREAD_SIZE/PAGE_SIZE values) without
      spilling any memcg/kmem specifics into fork.c .
      
      Note: This is a special version of the patch created for stable
      backports.  It contains code from the following two patches:
        - mm: memcg/slab: introduce mem_cgroup_from_obj()
        - mm: fork: fix kernel_stack memcg stats for various stack implementations
      
      [guro@fb.com: introduce mem_cgroup_from_obj()]
        Link: http://lkml.kernel.org/r/20200324004221.GA36662@carbon.dhcp.thefacebook.com
      Fixes: 4d96ba35 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages")
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Bharata B Rao <bharata@linux.ibm.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200303233550.251375-1-guro@fb.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8380ce47
    • Mina Almasry's avatar
      hugetlb_cgroup: fix illegal access to memory · 726b7bbe
      Mina Almasry authored
      This appears to be a mistake in commit faced7e0 ("mm: hugetlb
      controller for cgroups v2").
      
      Essentially that commit does a hugetlb_cgroup_from_counter assuming that
      page_counter_try_charge has initialized counter.
      
      But if that has failed then it seems will not initialize counter, so
      hugetlb_cgroup_from_counter(counter) ends up pointing to random memory,
      causing kasan to complain.
      
      The solution is to simply use 'h_cg', instead of
      hugetlb_cgroup_from_counter(counter), since that is a reference to the
      hugetlb_cgroup anyway.  After this change kasan ceases to complain.
      
      Fixes: faced7e0 ("mm: hugetlb controller for cgroups v2")
      Reported-by: syzbot+cac0c4e204952cf449b1@syzkaller.appspotmail.com
      Signed-off-by: default avatarMina Almasry <almasrymina@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarGiuseppe Scrivano <gscrivan@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Link: http://lkml.kernel.org/r/20200313223920.124230-1-almasrymina@google.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      726b7bbe
    • David Hildenbrand's avatar
      drivers/base/memory.c: indicate all memory blocks as removable · 53cdc1cb
      David Hildenbrand authored
      We see multiple issues with the implementation/interface to compute
      whether a memory block can be offlined (exposed via
      /sys/devices/system/memory/memoryX/removable) and would like to simplify
      it (remove the implementation).
      
      1. It runs basically lockless. While this might be good for performance,
         we see possible races with memory offlining that will require at
         least some sort of locking to fix.
      
      2. Nowadays, more false positives are possible. No arch-specific checks
         are performed that validate if memory offlining will not be denied
         right away (and such check will require locking). For example, arm64
         won't allow to offline any memory block that was added during boot -
         which will imply a very high error rate. Other archs have other
         constraints.
      
      3. The interface is inherently racy. E.g., if a memory block is detected
         to be removable (and was not a false positive at that time), there is
         still no guarantee that offlinin...
      53cdc1cb
    • Naohiro Aota's avatar
      mm/swapfile.c: move inode_lock out of claim_swapfile · d795a90e
      Naohiro Aota authored
      claim_swapfile() currently keeps the inode locked when it is successful,
      or the file is already swapfile (with -EBUSY).  And, on the other error
      cases, it does not lock the inode.
      
      This inconsistency of the lock state and return value is quite confusing
      and actually causing a bad unlock balance as below in the "bad_swap"
      section of __do_sys_swapon().
      
      This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE
      check out of claim_swapfile().  The inode is unlocked in
      "bad_swap_unlock_inode" section, so that the inode is ensured to be
      unlocked at "bad_swap".  Thus, error handling codes after the locking now
      jumps to "bad_swap_unlock_inode" instead of "bad_swap".
      
          =====================================
          WARNING: bad unlock balance detected!
          5.5.0-rc7+ #176 Not tainted
          -------------------------------------
          swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at: __do_sys_swapon+0x94b/0x3550
          but there are no more locks to release!
      
          other info that might help us debug this:
          no locks held by swapon/4294.
      
          stack backtrace:
          CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176
          Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
          Call Trace:
           dump_stack+0xa1/0xea
           print_unlock_imbalance_bug.cold+0x114/0x123
           lock_release+0x562/0xed0
           up_write+0x2d/0x490
           __do_sys_swapon+0x94b/0x3550
           __x64_sys_swapon+0x54/0x80
           do_syscall_64+0xa4/0x4b0
           entry_SYSCALL_64_after_hwframe+0x49/0xbe
          RIP: 0033:0x7f15da0a0dc7
      
      Fixes: 1638045c ("mm: set S_SWAPFILE on blockdev swap devices")
      Signed-off-by: default avatarNaohiro Aota <naohiro.aota@wdc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: default avatarQais Youef <qais.yousef@arm.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200206090132.154869-1-naohiro.aota@wdc.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d795a90e
  2. 28 Mar, 2020 1 commit
  3. 27 Mar, 2020 14 commits
  4. 26 Mar, 2020 13 commits
  5. 25 Mar, 2020 8 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 1b649e0b
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix deadlock in bpf_send_signal() from Yonghong Song.
      
       2) Fix off by one in kTLS offload of mlx5, from Tariq Toukan.
      
       3) Add missing locking in iwlwifi mvm code, from Avraham Stern.
      
       4) Fix MSG_WAITALL handling in rxrpc, from David Howells.
      
       5) Need to hold RTNL mutex in tcindex_partial_destroy_work(), from Cong
          Wang.
      
       6) Fix producer race condition in AF_PACKET, from Willem de Bruijn.
      
       7) cls_route removes the wrong filter during change operations, from
          Cong Wang.
      
       8) Reject unrecognized request flags in ethtool netlink code, from
          Michal Kubecek.
      
       9) Need to keep MAC in reset until PHY is up in bcmgenet driver, from
          Doug Berger.
      
      10) Don't leak ct zone template in act_ct during replace, from Paul
          Blakey.
      
      11) Fix flushing of offloaded netfilter flowtable flows, also from Paul
          Blakey.
      
      12) Fix throughput drop during tx backpressure in cxgb4, from Rahul
          Lakkireddy...
      1b649e0b
    • Linus Torvalds's avatar
      Merge tag 'gpio-v5.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · 1dfb642b
      Linus Torvalds authored
      Pull GPIO fixes from Linus Walleij:
      
       - One core quirk by myself to fix the .irq_disable() semantics when the
         gpiolib core takes over this callback.
      
       - The rest is an elaborate series of four patches fixing Intel laptop
         ACPI wakeup quirks.
      
      * tag 'gpio-v5.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
        gpiolib: acpi: Add quirk to ignore EC wakeups on HP x2 10 CHT + AXP288 model
        gpiolib: acpi: Add quirk to ignore EC wakeups on HP x2 10 BYT + AXP288 model
        gpiolib: acpi: Rework honor_wakeup option into an ignore_wake option
        gpiolib: acpi: Correct comment for HP x2 10 honor_wakeup quirk
        gpiolib: Fix irq_disable() semantics
      1dfb642b
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-2020-03-25' of... · 2910594f
      David S. Miller authored
      Merge tag 'wireless-drivers-2020-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for v5.6
      
      Fourth, and last, set of fixes for v5.6. Just two important fixes to
      iwlwifi regressions.
      
      iwlwifi
      
      * fix GEO_TX_POWER_LIMIT command on certain devices which caused
        firmware to crash during initialisation
      
      * add back device ids for three devices which were accidentally
        removed
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2910594f
    • Pablo Neira Ayuso's avatar
      net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build · 2c64605b
      Pablo Neira Ayuso authored
      net/netfilter/nft_fwd_netdev.c: In function ‘nft_fwd_netdev_eval’:
          net/netfilter/nft_fwd_netdev.c:32:10: error: ‘struct sk_buff’ has no member named ‘tc_redirected’
            pkt->skb->tc_redirected = 1;
                    ^~
          net/netfilter/nft_fwd_netdev.c:33:10: error: ‘struct sk_buff’ has no member named ‘tc_from_ingress’
            pkt->skb->tc_from_ingress = 1;
                    ^~
      
      To avoid a direct dependency with tc actions from netfilter, wrap the
      redirect bits around CONFIG_NET_REDIRECT and move helpers to
      include/linux/skbuff.h. Turn on this toggle from the ifb driver, the
      only existing client of these bits in the tree.
      
      This patch adds skb_set_redirected() that sets on the redirected bit
      on the skbuff, it specifies if the packet was redirect from ingress
      and resets the timestamp (timestamp reset was originally missing in the
      netfilter bugfix).
      
      Fixes: bcfabee1 ("netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress")
      Reported-by: noreply@ellerman.id.au
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c64605b
    • Guilherme G. Piccoli's avatar
      net: ena: Add PCI shutdown handler to allow safe kexec · 428c4913
      Guilherme G. Piccoli authored
      Currently ENA only provides the PCI remove() handler, used during rmmod
      for example. This is not called on shutdown/kexec path; we are potentially
      creating a failure scenario on kexec:
      
      (a) Kexec is triggered, no shutdown() / remove() handler is called for ENA;
      instead pci_device_shutdown() clears the master bit of the PCI device,
      stopping all DMA transactions;
      
      (b) Kexec reboot happens and the device gets enabled again, likely having
      its FW with that DMA transaction buffered; then it may trigger the (now
      invalid) memory operation in the new kernel, corrupting kernel memory area.
      
      This patch aims to prevent this, by implementing a shutdown() handler
      quite similar to the remove() one - the difference being the handling
      of the netdev, which is unregistered on remove(), but following the
      convention observed in other drivers, it's only detached on shutdown().
      
      This prevents an odd issue in AWS Nitro instances, in which after the 2nd
      kexec the next one will fail with an initrd corruption, caused by a wild
      DMA write to invalid kernel memory. The lspci output for the adapter
      present in my instance is:
      
      00:05.0 Ethernet controller [0200]: Amazon.com, Inc. Elastic Network
      Adapter (ENA) [1d0f:ec20]
      Suggested-by: default avatarGavin Shan <gshan@redhat.com>
      Signed-off-by: default avatarGuilherme G. Piccoli <gpiccoli@canonical.com>
      Acked-by: default avatarSameeh Jubran <sameehj@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      428c4913
    • Hangbin Liu's avatar
      selftests/net/forwarding: define libs as TEST_PROGS_EXTENDED · c085dbfb
      Hangbin Liu authored
      The lib files should not be defined as TEST_PROGS, or we will run them
      in run_kselftest.sh.
      
      Also remove ethtool_lib.sh exec permission.
      
      Fixes: 81573b18 ("selftests/net/forwarding: add Makefile to install tests")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c085dbfb
    • Hangbin Liu's avatar
      selftests/net: add missing tests to Makefile · 919a23e9
      Hangbin Liu authored
      Find some tests are missed in Makefile by running:
      for file in $(ls *.sh); do grep -q $file Makefile || echo $file; done
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      919a23e9
    • Linus Torvalds's avatar
      Merge tag 'zonefs-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs · e2cf67f6
      Linus Torvalds authored
      Pull zonefs fix from Damien Le Moal:
       "A single fix from me to correctly handle the size of read-only zone
        files"
      
      * tag 'zonefs-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
        zonfs: Fix handling of read-only zones
      e2cf67f6