Skip to content
Snippets Groups Projects
  1. Apr 07, 2020
    • Kees Cook's avatar
      drivers/misc/lkdtm/bugs.c: add arithmetic overflow and array bounds checks · ae2e1aad
      Kees Cook authored
      
      Adds LKDTM tests for arithmetic overflow (both signed and unsigned), as
      well as array bounds checking.
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Elena Petrova <lenaptr@google.com>
      Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
      Link: http://lkml.kernel.org/r/20200227193516.32566-4-keescook@chromium.org
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ae2e1aad
    • David Hildenbrand's avatar
      mm/memory_hotplug: allow to specify a default online_type · 5f47adf7
      David Hildenbrand authored
      
      For now, distributions implement advanced udev rules to essentially
      - Don't online any hotplugged memory (s390x)
      - Online all memory to ZONE_NORMAL (e.g., most virt environments like
        hyperv)
      - Online all memory to ZONE_MOVABLE in case the zone imbalance is taken
        care of (e.g., bare metal, special virt environments)
      
      In summary: All memory is usually onlined the same way, however, the
      kernel always has to ask user space to come up with the same answer.
      E.g., Hyper-V always waits for a memory block to get onlined before
      continuing, otherwise it might end up adding memory faster than
      onlining it, which can result in strange OOM situations.  This waiting
      slows down adding of a bigger amount of memory.
      
      Let's allow to specify a default online_type, not just "online" and
      "offline".  This allows distributions to configure the default online_type
      when booting up and be done with it.
      
      We can now specify "offline", "online", "online_movable" and
      "online_kernel" via
      - "memhp_default_state=" on the kernel cmdline
      - /sys/devices/system/memory/auto_online_blocks
      just like we are able to specify for a single memory block via
      /sys/devices/system/memory/memoryX/state
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarWei Yang <richard.weiyang@gmail.com>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarPankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Wei Liu <wei.liu@kernel.org>
      Cc: Yumei Huang <yuhuang@redhat.com>
      Link: http://lkml.kernel.org/r/20200317104942.11178-9-david@redhat.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5f47adf7
    • David Hildenbrand's avatar
      mm/memory_hotplug: convert memhp_auto_online to store an online_type · 862919e5
      David Hildenbrand authored
      
      ...  and rename it to memhp_default_online_type.  This is a preparation
      for more detailed default online behavior.
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarWei Yang <richard.weiyang@gmail.com>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarPankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Wei Liu <wei.liu@kernel.org>
      Cc: Yumei Huang <yuhuang@redhat.com>
      Link: http://lkml.kernel.org/r/20200317104942.11178-8-david@redhat.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      862919e5
    • David Hildenbrand's avatar
      hv_balloon: don't check for memhp_auto_online manually · bc58ebd5
      David Hildenbrand authored
      
      We get the MEM_ONLINE notifier call if memory is added right from the
      kernel via add_memory() or later from user space.
      
      Let's get rid of the "ha_waiting" flag - the wait event has an inbuilt
      mechanism (->done) for that.  Initialize the wait event only once and
      reinitialize before adding memory.  Unconditionally call complete() and
      wait_for_completion_timeout().
      
      If there are no waiters, complete() will only increment ->done - which
      will be reset by reinit_completion().  If complete() has already been
      called, wait_for_completion_timeout() will not wait.
      
      There is still the chance for a small race between concurrent
      reinit_completion() and complete().  If complete() wins, we would not wait
      - which is tolerable (and the race exists in current code as well).
      
      Note: We only wait for "some" memory to get onlined, which seems to be
            good enough for now.
      
      [akpm@linux-foundation.org: register_memory_notifier() after init_completion(), per David]
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Wei Liu <wei.liu@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Yumei Huang <yuhuang@redhat.com>
      Link: http://lkml.kernel.org/r/20200317104942.11178-6-david@redhat.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bc58ebd5
    • David Hildenbrand's avatar
      drivers/base/memory: store mapping between MMOP_* and string in an array · 4dc8207b
      David Hildenbrand authored
      
      Let's use a simple array which we can reuse soon.  While at it, move the
      string->mmop conversion out of the device hotplug lock.
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarWei Yang <richard.weiyang@gmail.com>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarPankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Wei Liu <wei.liu@kernel.org>
      Cc: Yumei Huang <yuhuang@redhat.com>
      Link: http://lkml.kernel.org/r/20200317104942.11178-4-david@redhat.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4dc8207b
    • David Hildenbrand's avatar
      drivers/base/memory: map MMOP_OFFLINE to 0 · efc978ad
      David Hildenbrand authored
      
      Historically, we used the value -1.  Just treat 0 as the special case now.
      Clarify a comment (which was wrong, when we come via device_online() the
      first time, the online_type would have been 0 / MEM_ONLINE).  The default
      is now always MMOP_OFFLINE.  This removes the last user of the manual
      "-1", which didn't use the enum value.
      
      This is a preparation to use the online_type as an array index.
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarWei Yang <richard.weiyang@gmail.com>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarPankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Wei Liu <wei.liu@kernel.org>
      Cc: Yumei Huang <yuhuang@redhat.com>
      Link: http://lkml.kernel.org/r/20200317104942.11178-3-david@redhat.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      efc978ad
    • David Hildenbrand's avatar
      drivers/base/memory: rename MMOP_ONLINE_KEEP to MMOP_ONLINE · 956f8b44
      David Hildenbrand authored
      Patch series "mm/memory_hotplug: allow to specify a default online_type", v3.
      
      Distributions nowadays use udev rules ([1] [2]) to specify if and how to
      online hotplugged memory.  The rules seem to get more complex with many
      special cases.  Due to the various special cases,
      CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE cannot be used.  All memory hotplug
      is handled via udev rules.
      
      Every time we hotplug memory, the udev rule will come to the same
      conclusion.  Especially Hyper-V (but also soon virtio-mem) add a lot of
      memory in separate memory blocks and wait for memory to get onlined by
      user space before continuing to add more memory blocks (to not add memory
      faster than it is getting onlined).  This of course slows down the whole
      memory hotplug process.
      
      To make the job of distributions easier and to avoid udev rules that get
      more and more complicated, let's extend the mechanism provided by
      - /sys/devices/system/memory/auto_online_blocks
      - "memhp_default_state=" on the kernel cmdline
      to be able to specify also "online_movable" as well as "online_kernel"
      
      === Example /usr/libexec/config-memhotplug ===
      
      #!/bin/bash
      
      VIRT=`systemd-detect-virt --vm`
      ARCH=`uname -p`
      
      sense_virtio_mem() {
        if [ -d "/sys/bus/virtio/drivers/virtio_mem/" ]; then
          DEVICES=`find /sys/bus/virtio/drivers/virtio_mem/ -maxdepth 1 -type l | wc -l`
          if [ $DEVICES != "0" ]; then
              return 0
          fi
        fi
        return 1
      }
      
      if [ ! -e "/sys/devices/system/memory/auto_online_blocks" ]; then
        echo "Memory hotplug configuration support missing in the kernel"
        exit 1
      fi
      
      if grep "memhp_default_state=" /proc/cmdline > /dev/null; then
        echo "Memory hotplug configuration overridden in kernel cmdline (memhp_default_state=)"
        exit 1
      fi
      
      if [ $VIRT == "microsoft" ]; then
        echo "Detected Hyper-V on $ARCH"
        # Hyper-V wants all memory in ZONE_NORMAL
        ONLINE_TYPE="online_kernel"
      elif sense_virtio_mem; then
        echo "Detected virtio-mem on $ARCH"
        # virtio-mem wants all memory in ZONE_NORMAL
        ONLINE_TYPE="online_kernel"
      elif [ $ARCH == "s390x" ] || [ $ARCH == "s390" ]; then
        echo "Detected $ARCH"
        # standby memory should not be onlined automatically
        ONLINE_TYPE="offline"
      elif [ $ARCH == "ppc64" ] || [ $ARCH == "ppc64le" ]; then
        echo "Detected" $ARCH
        # PPC64 onlines all hotplugged memory right from the kernel
        ONLINE_TYPE="offline"
      elif [ $VIRT == "none" ]; then
        echo "Detected bare-metal on $ARCH"
        # Bare metal users expect hotplugged memory to be unpluggable. We assume
        # that ZONE imbalances on such enterpise servers cannot happen and is
        # properly documented
        ONLINE_TYPE="online_movable"
      else
        # TODO: Hypervisors that want to unplug DIMMs and can guarantee that ZONE
        # imbalances won't happen
        echo "Detected $VIRT on $ARCH"
        # Usually, ballooning is used in virtual environments, so memory should go to
        # ZONE_NORMAL. However, sometimes "movable_node" is relevant.
        ONLINE_TYPE="online"
      fi
      
      echo "Selected online_type:" $ONLINE_TYPE
      
      # Configure what to do with memory that will be hotplugged in the future
      echo $ONLINE_TYPE 2>/dev/null > /sys/devices/system/memory/auto_online_blocks
      if [ $? != "0" ]; then
        echo "Memory hotplug cannot be configured (e.g., old kernel or missing permissions)"
        # A backup udev rule should handle old kernels if necessary
        exit 1
      fi
      
      # Process all already pluggedd blocks (e.g., DIMMs, but also Hyper-V or virtio-mem)
      if [ $ONLINE_TYPE != "offline" ]; then
        for MEMORY in /sys/devices/system/memory/memory*; do
          STATE=`cat $MEMORY/state`
          if [ $STATE == "offline" ]; then
              echo $ONLINE_TYPE > $MEMORY/state
          fi
        done
      fi
      
      === Example /usr/lib/systemd/system/config-memhotplug.service ===
      
      [Unit]
      Description=Configure memory hotplug behavior
      DefaultDependencies=no
      Conflicts=shutdown.target
      Before=sysinit.target shutdown.target
      After=systemd-modules-load.service
      ConditionPathExists=|/sys/devices/system/memory/auto_online_blocks
      
      [Service]
      ExecStart=/usr/libexec/config-memhotplug
      Type=oneshot
      TimeoutSec=0
      RemainAfterExit=yes
      
      [Install]
      WantedBy=sysinit.target
      
      === Example modification to the 40-redhat.rules [2] ===
      
      : diff --git a/40-redhat.rules b/40-redhat.rules-new
      : index 2c690e5..168fd03 100644
      : --- a/40-redhat.rules
      : +++ b/40-redhat.rules-new
      : @@ -6,6 +6,9 @@ SUBSYSTEM=="cpu", ACTION=="add", TEST=="online", ATTR{online}=="0", ATTR{online}
      :  # Memory hotadd request
      :  SUBSYSTEM!="memory", GOTO="memory_hotplug_end"
      :  ACTION!="add", GOTO="memory_hotplug_end"
      : +# memory hotplug behavior configured
      : +PROGRAM=="grep online /sys/devices/system/memory/auto_online_blocks", GOTO="memory_hotplug_end"
      : +
      :  PROGRAM="/bin/uname -p", RESULT=="s390*", GOTO="memory_hotplug_end"
      :
      :  ENV{.state}="online"
      
      ===
      
      [1] https://github.com/lnykryn/systemd-rhel/pull/281
      [2] https://github.com/lnykryn/systemd-rhel/blob/staging/rules/40-redhat.rules
      
      
      
      This patch (of 8):
      
      The name is misleading and it's not really clear what is "kept".  Let's
      just name it like the online_type name we expose to user space ("online").
      
      Add some documentation to the types.
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarWei Yang <richard.weiyang@gmail.com>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Acked-by: default avatarPankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Yumei Huang <yuhuang@redhat.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Wei Liu <wei.liu@kernel.org>
      Link: http://lkml.kernel.org/r/20200319131221.14044-1-david@redhat.com
      Link: http://lkml.kernel.org/r/20200317104942.11178-2-david@redhat.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      956f8b44
    • David Hildenbrand's avatar
      drivers/base/memory.c: drop pages_correctly_probed() · fada9ae3
      David Hildenbrand authored
      
      pages_correctly_probed() is a leftover from ancient times.  It dates back
      to commit 3947be19 ("[PATCH] memory hotplug: sysfs and add/remove
      functions"), where Pg_reserved checks were added as a sfety net:
      
      	/*
      	 * The probe routines leave the pages reserved, just
      	 * as the bootmem code does.  Make sure they're still
      	 * that way.
      	 */
      
      The checks were refactored quite a bit over the years, especially in
      commit b77eab70 ("mm/memory_hotplug: optimize probe routine"), where
      checks for present, valid, and online sections were added.
      
      Hotplugged memory is added via add_memory(), which will create the full
      memmap for the hotplugged memory, and mark all sections valid and present.
      
      Only full memory blocks are onlined/offlined, so we also cannot have an
      inconsistency in that regard (especially, memory blocks with some sections
      being online and some being offline).
      
      1. Boot memory always starts online.  Since commit c5e79ef5
         ("mm/memory_hotplug.c: don't allow to online/offline memory blocks with
         holes") we disallow to offline any memory with holes.  Therefore, we
         never online memory with holes.  Present and validity checks are
         superfluous.
      
      2. Only complete memory blocks are onlined/offlined (and especially,
         the state - online or offline - is stored for whole memory blocks).
         Besides the core, only arch/powerpc/platforms/powernv/memtrace.c
         manually calls offline_pages() and fiddels with memory block states.
         But it also only offlines complete memory blocks.
      
      3. To make any of these conditions trigger, something would have to be
         terribly messed up in the core.  (e.g., online/offline only some
         sections of a memory block).
      
      4. Memory unplug properly makes sure that all sysfs attributes were
         removed (and therefore, that all threads left the sysfs handlers).  We
         don't have to worry about zombie devices at this point.
      
      5. The valid_section_nr(section_nr) check is actually dead code, as it
         would never have been reached due to the WARN_ON_ONCE(!pfn_valid(pfn)).
      
      No wonder we haven't seen any of these errors in a long time (or even
         ever, according to my search).  Let's just get rid of them.  Now, all
         checks that could hinder onlining and offlining are completely
         contained in online_pages()/offline_pages().
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Link: http://lkml.kernel.org/r/20200127110424.5757-3-david@redhat.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fada9ae3
    • David Hildenbrand's avatar
      drivers/base/memory.c: drop section_count · 68c3a6ac
      David Hildenbrand authored
      
      Patch series "mm: drop superfluous section checks when onlining/offlining".
      
      Let's drop some superfluous section checks on the onlining/offlining path.
      
      This patch (of 3):
      
      Since commit c5e79ef5 ("mm/memory_hotplug.c: don't allow to
      online/offline memory blocks with holes") we have a generic check in
      offline_pages() that disallows offlining memory blocks with holes.
      
      Memory blocks with missing sections are just another variant of these type
      of blocks.  We can stop checking (and especially storing) present
      sections.  A proper error message is now printed why offlining failed.
      
      section_count was initially introduced in commit 07681215 ("Driver
      core: Add section count to memory_block struct") in order to detect when
      it is okay to remove a memory block.  It was used in commit 26bbe7ef
      ("drivers/base/memory.c: prohibit offlining of memory blocks with missing
      sections") to disallow offlining memory blocks with missing sections.  As
      we refactored creation/removal of memory devices and have a proper check
      for holes in place, we can drop the section_count.
      
      This also removes a leftover comment regarding the mem_sysfs_mutex, which
      was removed in commit 848e19ad ("drivers/base/memory.c: drop the
      mem_sysfs_mutex").
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Link: http://lkml.kernel.org/r/20200127110424.5757-2-david@redhat.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      68c3a6ac
    • David Hildenbrand's avatar
      virtio-balloon: switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM · da10329c
      David Hildenbrand authored
      Commit 71994620 ("virtio_balloon: replace oom notifier with shrinker")
      changed the behavior when deflation happens automatically.  Instead of
      deflating when called by the OOM handler, the shrinker is used.
      
      However, the balloon is not simply some other slab cache that should be
      shrunk when under memory pressure.  The shrinker does not have a concept
      of priorities yet, so this behavior cannot be configured.  Eventually once
      that is in place, we might want to switch back after doing proper testing.
      
      There was a report that this results in undesired side effects when
      inflating the balloon to shrink the page cache. [1]
      	"When inflating the balloon against page cache (i.e. no free memory
      	 remains) vmscan.c will both shrink page cache, but also invoke the
      	 shrinkers -- including the balloon's shrinker. So the balloon
      	 driver allocates memory which requires reclaim, vmscan gets this
      	 memory by shrinking the balloon, and then the driver adds the
      	 memory back to the balloon. Basically a busy no-op."
      
      The name "deflate on OOM" makes it pretty clear when deflation should
      happen - after other approaches to reclaim memory failed, not while
      reclaiming. This allows to minimize the footprint of a guest - memory
      will only be taken out of the balloon when really needed.
      
      Keep using the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT, because
      this has no such side effects. Always register the shrinker with
      VIRTIO_BALLOON_F_FREE_PAGE_HINT now. We are always allowed to reuse free
      pages that are still to be processed by the guest. The hypervisor takes
      care of identifying and resolving possible races between processing a
      hinting request and the guest reusing a page.
      
      In contrast to pre commit 71994620 ("virtio_balloon: replace oom
      notifier with shrinker"), don't add a module parameter to configure the
      number of pages to deflate on OOM. Can be re-added if really needed.
      Also, pay attention that leak_balloon() returns the number of 4k pages -
      convert it properly in virtio_balloon_oom_notify().
      
      Testing done by Tyler for future reference:
        Test setup: VM with 16 CPU, 64GB RAM. Running Debian 10. We have a 42
        GB file full of random bytes that we continually cat to /dev/null.
        This fills the page cache as the file is read. Meanwhile, we trigger
        the balloon to inflate, with a target size of 53 GB. This setup causes
        the balloon inflation to pressure the page cache as the page cache is
        also trying to grow. Afterwards we shrink the balloon back to zero (so
        total deflate == total inflate).
      
        Without this patch (kernel 4.19.0-5):
        Inflation never reaches the target until we stop the "cat file >
        /dev/null" process. Total inflation time was 542 seconds. The longest
        period that made no net forward progress was 315 seconds.
          Result of "grep balloon /proc/vmstat" after the test:
          balloon_inflate 154828377
          balloon_deflate 154828377
      
        With this patch (kernel 5.6.0-rc4+):
        Total inflation duration was 63 seconds. No deflate-queue activity
        occurs when pressuring the page-cache.
          Result of "grep balloon /proc/vmstat" after the test:
          balloon_inflate 12968539
          balloon_deflate 12968539
      
        Conclusion: This patch fixes the issue.  In the test it reduced
        inflate/deflate activity by 12x, and reduced inflation time by 8.6x.
        But more importantly, if we hadn't killed the "cat file > /dev/null"
        process then, without the patch, the inflation process would never reach
        the target.
      
      [1] https://www.spinics.net/lists/linux-virtualization/msg40863.html
      
      Link: http://lkml.kernel.org/r/20200311135523.18512-2-david@redhat.com
      
      
      Fixes: 71994620 ("virtio_balloon: replace oom notifier with shrinker")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reported-by: default avatarTyler Sanderson <tysand@google.com>
      Tested-by: default avatarTyler Sanderson <tysand@google.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Cc: Wei Wang <wei.w.wang@intel.com>
      Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      da10329c
    • Alexander Duyck's avatar
      virtio-balloon: add support for providing free page reports to host · b0c504f1
      Alexander Duyck authored
      
      Add support for the page reporting feature provided by virtio-balloon.
      Reporting differs from the regular balloon functionality in that is is
      much less durable than a standard memory balloon.  Instead of creating a
      list of pages that cannot be accessed the pages are only inaccessible
      while they are being indicated to the virtio interface.  Once the
      interface has acknowledged them they are placed back into their respective
      free lists and are once again accessible by the guest system.
      
      Unlike a standard balloon we don't inflate and deflate the pages.  Instead
      we perform the reporting, and once the reporting is completed it is
      assumed that the page has been dropped from the guest and will be faulted
      back in the next time the page is accessed.
      
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Nitesh Narayan Lal <nitesh@redhat.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pagupta@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Wang <wei.w.wang@intel.com>
      Cc: Yang Zhang <yang.zhang.wz@gmail.com>
      Cc: wei qi <weiqi4@huawei.com>
      Link: http://lkml.kernel.org/r/20200211224657.29318.68624.stgit@localhost.localdomain
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b0c504f1
    • Alexander Duyck's avatar
      virtio-balloon: pull page poisoning config out of free page hinting · d74b78fa
      Alexander Duyck authored
      
      Currently the page poisoning setting wasn't being enabled unless free page
      hinting was enabled.  However we will need the page poisoning tracking
      logic as well for free page reporting.  As such pull it out and make it a
      separate bit of config in the probe function.
      
      In addition we need to add support for the more recent init_on_free
      feature which expects a behavior similar to page poisoning in that we
      expect the page to be pre-zeroed.
      
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Nitesh Narayan Lal <nitesh@redhat.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pagupta@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Wang <wei.w.wang@intel.com>
      Cc: Yang Zhang <yang.zhang.wz@gmail.com>
      Cc: wei qi <weiqi4@huawei.com>
      Link: http://lkml.kernel.org/r/20200211224646.29318.695.stgit@localhost.localdomain
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d74b78fa
  2. Apr 06, 2020
  3. Apr 05, 2020
  4. Apr 04, 2020
    • Hans de Goede's avatar
      platform/x86: intel_int0002_vgpio: Use acpi_register_wakeup_handler() · 767191db
      Hans de Goede authored
      
      The Power Management Events (PMEs) the INT0002 driver listens for get
      signalled by the Power Management Controller (PMC) using the same IRQ
      as used for the ACPI SCI.
      
      Since commit fdde0ff8 ("ACPI: PM: s2idle: Prevent spurious SCIs from
      waking up the system") the SCI triggering, without there being a wakeup
      cause recognized by the ACPI sleep code, will no longer wakeup the system.
      
      This breaks PMEs / wakeups signalled to the INT0002 driver, the system
      never leaves the s2idle_loop() now.
      
      Use acpi_register_wakeup_handler() to register a function which checks
      the GPE0a_STS register for a PME and trigger a wakeup when a PME has
      been signalled.
      
      Fixes: fdde0ff8 ("ACPI: PM: s2idle: Prevent spurious SCIs from waking up the system")
      Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Acked-by: default avatarAndy Shevchenko <andy.shevchenko@gmail.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      767191db
    • Hans de Goede's avatar
      ACPI: PM: Add acpi_[un]register_wakeup_handler() · ddfd9dcf
      Hans de Goede authored
      
      Since commit fdde0ff8 ("ACPI: PM: s2idle: Prevent spurious SCIs from
      waking up the system") the SCI triggering without there being a wakeup
      cause recognized by the ACPI sleep code will no longer wakeup the system.
      
      This works as intended, but this is a problem for devices where the SCI
      is shared with another device which is also a wakeup source.
      
      In the past these, from the pov of the ACPI sleep code, spurious SCIs
      would still cause a wakeup so the wakeup from the device sharing the
      interrupt would actually wakeup the system. This now no longer works.
      
      This is a problem on e.g. Bay Trail-T and Cherry Trail devices where
      some peripherals (typically the XHCI controller) can signal a
      Power Management Event (PME) to the Power Management Controller (PMC)
      to wakeup the system, this uses the same interrupt as the SCI.
      These wakeups are handled through a special INT0002 ACPI device which
      checks for events in the GPE0a_STS for this and takes care of acking
      the PME so that the shared interrupt stops triggering.
      
      The change to the ACPI sleep code to ignore the spurious SCI, causes
      the system to no longer wakeup on these PME events. To make things
      worse this means that the INT0002 device driver interrupt handler will
      no longer run, causing the PME to not get cleared and resulting in the
      system hanging. Trying to wakeup the system after such a PME through e.g.
      the power button no longer works.
      
      Add an acpi_register_wakeup_handler() function which registers
      a handler to be called from acpi_s2idle_wake() and when the handler
      returns true, return true from acpi_s2idle_wake().
      
      The INT0002 driver will use this mechanism to check the GPE0a_STS
      register from acpi_s2idle_wake() and to tell the system to wakeup
      if a PME is signaled in the register.
      
      Fixes: fdde0ff8 ("ACPI: PM: s2idle: Prevent spurious SCIs from waking up the system")
      Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      ddfd9dcf
    • Qian Cai's avatar
      x86: ACPI: fix CPU hotplug deadlock · 696ac2e3
      Qian Cai authored
      
      Similar to commit 0266d81e ("acpi/processor: Prevent cpu hotplug
      deadlock") except this is for acpi_processor_ffh_cstate_probe():
      
      "The problem is that the work is scheduled on the current CPU from the
      hotplug thread associated with that CPU.
      
      It's not required to invoke these functions via the workqueue because
      the hotplug thread runs on the target CPU already.
      
      Check whether current is a per cpu thread pinned on the target CPU and
      invoke the function directly to avoid the workqueue."
      
       WARNING: possible circular locking dependency detected
       ------------------------------------------------------
       cpuhp/1/15 is trying to acquire lock:
       ffffc90003447a28 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: __flush_work+0x4c6/0x630
      
       but task is already holding lock:
       ffffffffafa1c0e8 (cpuidle_lock){+.+.}-{3:3}, at: cpuidle_pause_and_lock+0x17/0x20
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #1 (cpu_hotplug_lock){++++}-{0:0}:
       cpus_read_lock+0x3e/0xc0
       irq_calc_affinity_vectors+0x5f/0x91
       __pci_enable_msix_range+0x10f/0x9a0
       pci_alloc_irq_vectors_affinity+0x13e/0x1f0
       pci_alloc_irq_vectors_affinity at drivers/pci/msi.c:1208
       pqi_ctrl_init+0x72f/0x1618 [smartpqi]
       pqi_pci_probe.cold.63+0x882/0x892 [smartpqi]
       local_pci_probe+0x7a/0xc0
       work_for_cpu_fn+0x2e/0x50
       process_one_work+0x57e/0xb90
       worker_thread+0x363/0x5b0
       kthread+0x1f4/0x220
       ret_from_fork+0x27/0x50
      
       -> #0 ((work_completion)(&wfc.work)){+.+.}-{0:0}:
       __lock_acquire+0x2244/0x32a0
       lock_acquire+0x1a2/0x680
       __flush_work+0x4e6/0x630
       work_on_cpu+0x114/0x160
       acpi_processor_ffh_cstate_probe+0x129/0x250
       acpi_processor_evaluate_cst+0x4c8/0x580
       acpi_processor_get_power_info+0x86/0x740
       acpi_processor_hotplug+0xc3/0x140
       acpi_soft_cpu_online+0x102/0x1d0
       cpuhp_invoke_callback+0x197/0x1120
       cpuhp_thread_fun+0x252/0x2f0
       smpboot_thread_fn+0x255/0x440
       kthread+0x1f4/0x220
       ret_from_fork+0x27/0x50
      
       other info that might help us debug this:
      
       Chain exists of:
       (work_completion)(&wfc.work) --> cpuhp_state-up --> cpuidle_lock
      
       Possible unsafe locking scenario:
      
       CPU0                    CPU1
       ----                    ----
       lock(cpuidle_lock);
                               lock(cpuhp_state-up);
                               lock(cpuidle_lock);
       lock((work_completion)(&wfc.work));
      
       *** DEADLOCK ***
      
       3 locks held by cpuhp/1/15:
       #0: ffffffffaf51ab10 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x69/0x2f0
       #1: ffffffffaf51ad40 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x69/0x2f0
       #2: ffffffffafa1c0e8 (cpuidle_lock){+.+.}-{3:3}, at: cpuidle_pause_and_lock+0x17/0x20
      
       Call Trace:
       dump_stack+0xa0/0xea
       print_circular_bug.cold.52+0x147/0x14c
       check_noncircular+0x295/0x2d0
       __lock_acquire+0x2244/0x32a0
       lock_acquire+0x1a2/0x680
       __flush_work+0x4e6/0x630
       work_on_cpu+0x114/0x160
       acpi_processor_ffh_cstate_probe+0x129/0x250
       acpi_processor_evaluate_cst+0x4c8/0x580
       acpi_processor_get_power_info+0x86/0x740
       acpi_processor_hotplug+0xc3/0x140
       acpi_soft_cpu_online+0x102/0x1d0
       cpuhp_invoke_callback+0x197/0x1120
       cpuhp_thread_fun+0x252/0x2f0
       smpboot_thread_fn+0x255/0x440
       kthread+0x1f4/0x220
       ret_from_fork+0x27/0x50
      
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Tested-by: default avatarBorislav Petkov <bp@suse.de>
      [ rjw: Subject ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      696ac2e3
  5. Apr 03, 2020
    • Chuanhong Guo's avatar
      net: dsa: mt7530: fix null pointer dereferencing in port5 setup · 0452800f
      Chuanhong Guo authored
      
      The 2nd gmac of mediatek soc ethernet may not be connected to a PHY
      and a phy-handle isn't always available.
      Unfortunately, mt7530 dsa driver assumes that the 2nd gmac is always
      connected to switch port 5 and setup mt7530 according to phy address
      of 2nd gmac node, causing null pointer dereferencing when phy-handle
      isn't defined in dts.
      This commit fix this setup code by checking return value of
      of_parse_phandle before using it.
      
      Fixes: 38f790a8 ("net: dsa: mt7530: Add support for port 5")
      Signed-off-by: default avatarChuanhong Guo <gch981213@gmail.com>
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: default avatarRené van Dorst <opensource@vdorst.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0452800f
    • Oleksij Rempel's avatar
      net: phy: micrel: kszphy_resume(): add delay after genphy_resume() before accessing PHY registers · 6110dff7
      Oleksij Rempel authored
      
      After the power-down bit is cleared, the chip internally triggers a
      global reset. According to the KSZ9031 documentation, we have to wait at
      least 1ms for the reset to finish.
      
      If the chip is accessed during reset, read will return 0xffff, while
      write will be ignored. Depending on the system performance and MDIO bus
      speed, we may or may not run in to this issue.
      
      This bug was discovered on an iMX6QP system with KSZ9031 PHY and
      attached PHY interrupt line. If IRQ was used, the link status update was
      lost. In polling mode, the link status update was always correct.
      
      The investigation showed, that during a read-modify-write access, the
      read returned 0xffff (while the chip was still in reset) and
      corresponding write hit the chip _after_ reset and triggered (due to the
      0xffff) another reset in an undocumented bit (register 0x1f, bit 1),
      resulting in the next write being lost due to the new reset cycle.
      
      This patch fixes the issue by adding a 1...2 ms sleep after the
      genphy_resume().
      
      Fixes: 836384d2 ("net: phy: micrel: Add specific suspend")
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6110dff7
    • Jisheng Zhang's avatar
      net: stmmac: dwmac1000: fix out-of-bounds mac address reg setting · 3e1221ac
      Jisheng Zhang authored
      
      Commit 9463c445 ("net: stmmac: dwmac1000: Clear unused address
      entries") cleared the unused mac address entries, but introduced an
      out-of bounds mac address register programming bug -- After setting
      the secondary unicast mac addresses, the "reg" value has reached
      netdev_uc_count() + 1, thus we should only clear address entries
      if (addr < perfect_addr_number)
      
      Fixes: 9463c445 ("net: stmmac: dwmac1000: Clear unused address entries")
      Signed-off-by: default avatarJisheng Zhang <Jisheng.Zhang@synaptics.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e1221ac
    • Nathan Chancellor's avatar
      remoteproc/omap: Fix set_load call in omap_rproc_request_timer · e6d05acd
      Nathan Chancellor authored
      When building arm allyesconfig:
      
      drivers/remoteproc/omap_remoteproc.c:174:44: error: too many arguments
      to function call, expected 2, have 3
              timer->timer_ops->set_load(timer->odt, 0, 0);
              ~~~~~~~~~~~~~~~~~~~~~~~~~~                ^
      1 error generated.
      
      This is due to commit 02e6d546 ("clocksource/drivers/timer-ti-dm:
      Enable autoreload in set_pwm") in the clockevents tree interacting with
      commit e28edc57 ("remoteproc/omap: Request a timer(s) for remoteproc
      usage") from the rpmsg tree.
      
      This should have been fixed during the merge of the remoteproc tree
      since it happened after the clockevents tree merge; however, it does not
      look like my email was noticed by either maintainer and I did not pay
      attention when the pull was sent since I was on CC.
      
      Fixes: c6570114 ("Merge tag 'rproc-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc")
      Link: https://lore.kernel.org/lkml/20200327185055.GA22438@ubuntu-m2-xlarge-x86/
      
      
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Acked-by: default avatarSuman Anna <s-anna@ti.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e6d05acd
    • Mikulas Patocka's avatar
      dm integrity: fix logic bug in integrity tag testing · 8267d8fb
      Mikulas Patocka authored
      
      If all the bytes are equal to DISCARD_FILLER, we want to accept the
      buffer. If any of the bytes are different, we must do thorough
      tag-by-tag checking.
      
      The condition was inverted.
      
      Fixes: 84597a44 ("dm integrity: add optional discard support")
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      8267d8fb
    • Mike Snitzer's avatar
      Revert "dm: always call blk_queue_split() in dm_process_bio()" · 120c9257
      Mike Snitzer authored
      
      This reverts commit effd58c9.
      
      blk_queue_split() is causing excessive IO splitting -- because
      blk_max_size_offset() depends on 'chunk_sectors' limit being set and
      if it isn't (as is the case for DM targets!) it falls back to
      splitting on a 'max_sectors' boundary regardless of offset.
      
      "Fix" this by reverting back to _not_ using blk_queue_split() in
      dm_process_bio() for normal IO (reads and writes).  Long-term fix is
      still TBD but it should focus on training blk_max_size_offset() to
      call into a DM provided hook (to call DM's max_io_len()).
      
      Test results from simple misaligned IO test on 4-way dm-striped device
      with chunksize of 128K and stripesize of 512K:
      
      xfs_io -d -c 'pread -b 2m 224s 4072s' /dev/mapper/stripe_dev
      
      before this revert:
      
      253,0   21        1     0.000000000  2206  Q   R 224 + 4072 [xfs_io]
      253,0   21        2     0.000008267  2206  X   R 224 / 480 [xfs_io]
      253,0   21        3     0.000010530  2206  X   R 224 / 256 [xfs_io]
      253,0   21        4     0.000027022  2206  X   R 480 / 736 [xfs_io]
      253,0   21        5     0.000028751  2206  X   R 480 / 512 [xfs_io]
      253,0   21        6     0.000033323  2206  X   R 736 / 992 [xfs_io]
      253,0   21        7     0.000035130  2206  X   R 736 / 768 [xfs_io]
      253,0   21        8     0.000039146  2206  X   R 992 / 1248 [xfs_io]
      253,0   21        9     0.000040734  2206  X   R 992 / 1024 [xfs_io]
      253,0   21       10     0.000044694  2206  X   R 1248 / 1504 [xfs_io]
      253,0   21       11     0.000046422  2206  X   R 1248 / 1280 [xfs_io]
      253,0   21       12     0.000050376  2206  X   R 1504 / 1760 [xfs_io]
      253,0   21       13     0.000051974  2206  X   R 1504 / 1536 [xfs_io]
      253,0   21       14     0.000055881  2206  X   R 1760 / 2016 [xfs_io]
      253,0   21       15     0.000057462  2206  X   R 1760 / 1792 [xfs_io]
      253,0   21       16     0.000060999  2206  X   R 2016 / 2272 [xfs_io]
      253,0   21       17     0.000062489  2206  X   R 2016 / 2048 [xfs_io]
      253,0   21       18     0.000066133  2206  X   R 2272 / 2528 [xfs_io]
      253,0   21       19     0.000067507  2206  X   R 2272 / 2304 [xfs_io]
      253,0   21       20     0.000071136  2206  X   R 2528 / 2784 [xfs_io]
      253,0   21       21     0.000072764  2206  X   R 2528 / 2560 [xfs_io]
      253,0   21       22     0.000076185  2206  X   R 2784 / 3040 [xfs_io]
      253,0   21       23     0.000077486  2206  X   R 2784 / 2816 [xfs_io]
      253,0   21       24     0.000080885  2206  X   R 3040 / 3296 [xfs_io]
      253,0   21       25     0.000082316  2206  X   R 3040 / 3072 [xfs_io]
      253,0   21       26     0.000085788  2206  X   R 3296 / 3552 [xfs_io]
      253,0   21       27     0.000087096  2206  X   R 3296 / 3328 [xfs_io]
      253,0   21       28     0.000093469  2206  X   R 3552 / 3808 [xfs_io]
      253,0   21       29     0.000095186  2206  X   R 3552 / 3584 [xfs_io]
      253,0   21       30     0.000099228  2206  X   R 3808 / 4064 [xfs_io]
      253,0   21       31     0.000101062  2206  X   R 3808 / 3840 [xfs_io]
      253,0   21       32     0.000104956  2206  X   R 4064 / 4096 [xfs_io]
      253,0   21       33     0.001138823     0  C   R 4096 + 200 [0]
      
      after this revert:
      
      253,0   18        1     0.000000000  4430  Q   R 224 + 3896 [xfs_io]
      253,0   18        2     0.000018359  4430  X   R 224 / 256 [xfs_io]
      253,0   18        3     0.000028898  4430  X   R 256 / 512 [xfs_io]
      253,0   18        4     0.000033535  4430  X   R 512 / 768 [xfs_io]
      253,0   18        5     0.000065684  4430  X   R 768 / 1024 [xfs_io]
      253,0   18        6     0.000091695  4430  X   R 1024 / 1280 [xfs_io]
      253,0   18        7     0.000098494  4430  X   R 1280 / 1536 [xfs_io]
      253,0   18        8     0.000114069  4430  X   R 1536 / 1792 [xfs_io]
      253,0   18        9     0.000129483  4430  X   R 1792 / 2048 [xfs_io]
      253,0   18       10     0.000136759  4430  X   R 2048 / 2304 [xfs_io]
      253,0   18       11     0.000152412  4430  X   R 2304 / 2560 [xfs_io]
      253,0   18       12     0.000160758  4430  X   R 2560 / 2816 [xfs_io]
      253,0   18       13     0.000183385  4430  X   R 2816 / 3072 [xfs_io]
      253,0   18       14     0.000190797  4430  X   R 3072 / 3328 [xfs_io]
      253,0   18       15     0.000197667  4430  X   R 3328 / 3584 [xfs_io]
      253,0   18       16     0.000218751  4430  X   R 3584 / 3840 [xfs_io]
      253,0   18       17     0.000226005  4430  X   R 3840 / 4096 [xfs_io]
      253,0   18       18     0.000250404  4430  Q   R 4120 + 176 [xfs_io]
      253,0   18       19     0.000847708     0  C   R 4096 + 24 [0]
      253,0   18       20     0.000855783     0  C   R 4120 + 176 [0]
      
      Fixes: effd58c9 ("dm: always call blk_queue_split() in dm_process_bio()")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Tested-by: default avatarBarry Marson <bmarson@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      120c9257
    • Mike Snitzer's avatar
      dm integrity: fix ppc64le warning · e7fc1e57
      Mike Snitzer authored
      
      Otherwise:
      
      In file included from drivers/md/dm-integrity.c:13:
      drivers/md/dm-integrity.c: In function 'dm_integrity_status':
      drivers/md/dm-integrity.c:3061:10: error: format '%llu' expects
      argument of type 'long long unsigned int', but argument 4 has type
      'long int' [-Werror=format=]
         DMEMIT("%llu %llu",
                ^~~~~~~~~~~
          atomic64_read(&ic->number_of_mismatches),
          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      ./include/linux/device-mapper.h:550:46: note: in definition of macro 'DMEMIT'
            0 : scnprintf(result + sz, maxlen - sz, x))
                                                    ^
      cc1: all warnings being treated as errors
      
      Fixes: 7649194a ("dm integrity: remove sector type casts")
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      e7fc1e57
    • Colin Ian King's avatar
      rtc: ds1307: check for failed memory allocation on wdt · 1821b79d
      Colin Ian King authored
      
      Currently a failed memory allocation will lead to a null pointer
      dereference on point wdt.  Fix this by checking for a failed
      allocation and just returning.
      
      Addresses-Coverity: ("Dereference null return")
      Fixes: fd90d48d ("rtc: ds1307: add support for watchdog timer on ds1388")
      
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Link: https://lore.kernel.org/r/20200403110437.57420-1-colin.king@canonical.com
      
      
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      1821b79d
Loading