1. 08 Oct, 2016 1 commit
  2. 26 Jul, 2016 1 commit
  3. 15 Mar, 2016 1 commit
    • Vitaly Kuznetsov's avatar
      memory-hotplug: add automatic onlining policy for the newly added memory · 31bc3858
      Vitaly Kuznetsov authored
      Currently, all newly added memory blocks remain in 'offline' state
      unless someone onlines them, some linux distributions carry special udev
      rules like:
        SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online"
      to make this happen automatically.  This is not a great solution for
      virtual machines where memory hotplug is being used to address high
      memory pressure situations as such onlining is slow and a userspace
      process doing this (udev) has a chance of being killed by the OOM killer
      as it will probably require to allocate some memory.
      Introduce default policy for the newly added memory blocks in
      /sys/devices/system/memory/auto_online_blocks file with two possible
      values: "offline" which preserves the current behavior and "online"
      which causes all newly added memory blocks to go online as soon as
      they're added.  The default is "offline".
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: default avatarDaniel Kiper <daniel.kiper@oracle.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Daniel Kiper <daniel.kiper@oracle.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  4. 16 Jan, 2016 1 commit
  5. 15 Jan, 2016 3 commits
    • John Allen's avatar
      drivers/base/memory.c: fix kernel warning during memory hotplug on ppc64 · cb5490a5
      John Allen authored
      Fix a bug where a kernel warning is triggered when performing a memory
      hotplug on ppc64.  This warning may also occur on any architecture that
      uses the memory_probe_store interface.
        WARNING: at drivers/base/memory.c:200
        CPU: 9 PID: 13042 Comm: systemd-udevd Not tainted 4.4.0-rc4-00113-g0bd0f1e6
      -dirty #7
        NIP [c00000000055e034] pages_correctly_reserved+0x134/0x1b0
        LR [c00000000055e7f8] memory_subsys_online+0x68/0x140
        Call Trace:
      The warning is triggered because there is a udev rule that automatically
      tries to online memory after it has been added.  The udev rule varies
      from distro to distro, but will generally look something like:
        SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online"
      On any architecture that uses memory_probe_store to reserve memory, the
      udev rule will be triggered after the first section of the block is
      reserved and will subsequently attempt to online the entire block,
      interrupting the memory reservation process and causing the warning.
      This patch modifies memory_probe_store to add a block of memory with a
      single call to add_memory as opposed to looping through and adding each
      section individually.  A single call to add_memory is protected by the
      mem_hotplug mutex which will prevent the udev rule from onlining memory
      until the reservation of the entire block is complete.
      Signed-off-by: default avatarJohn Allen <jallen@linux.vnet.ibm.com>
      Acked-by: default avatarDave Hansen <dave.hansen@intel.com>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Seth Jennings's avatar
      drivers/base/memory.c: rename remove_memory_block() to remove_memory_section() · cc292b0b
      Seth Jennings authored
      The function removes a section, not a block.  Rename to reflect actual
      Signed-off-by: default avatarSeth Jennings <sjennings@variantweb.net>
      Cc: Andrew Banman <abanman@sgi.com>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Greg KH <greg@kroah.com>
      Cc: Russ Anderson <rja@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Seth Jennings's avatar
      drivers/base/memory.c: clean up section counting · 56c6b5d3
      Seth Jennings authored
      Right now, section_count is calculated in add_memory_block().  However,
      init_memory_block() increments section_count as well, which, at first,
      seems like it would lead to an off-by-one error.  There is no harm done
      because add_memory_block() immediately overwrites the
      mem->section_count, but it is messy.
      This commit moves the increment out of the common init_memory_block()
      (called by both add_memory_block() and register_new_memory()) and adds
      it to register_new_memory().
      Signed-off-by: default avatarSeth Jennings <sjennings@variantweb.net>
      Cc: Andrew Banman <abanman@sgi.com>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Greg KH <greg@kroah.com>
      Cc: Russ Anderson <rja@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  6. 12 Dec, 2015 1 commit
  7. 14 Apr, 2015 2 commits
    • David Rientjes's avatar
      mm, hotplug: fix concurrent memory hot-add deadlock · 30467e0b
      David Rientjes authored
      There's a deadlock when concurrently hot-adding memory through the probe
      interface and switching a memory block from offline to online.
      When hot-adding memory via the probe interface, add_memory() first takes
      mem_hotplug_begin() and then device_lock() is later taken when registering
      the newly initialized memory block.  This creates a lock dependency of (1)
      mem_hotplug.lock (2) dev->mutex.
      When switching a memory block from offline to online, dev->mutex is first
      grabbed in device_online() when the write(2) transitions an existing
      memory block from offline to online, and then online_pages() will take
      This creates a lock inversion between mem_hotplug.lock and dev->mutex.
      Vitaly reports that this deadlock can happen when kworker handling a probe
      event races with systemd-udevd switching a memory block's state.
      This patch requires the state transition to take mem_hotplug_begin()
      before dev->mutex.  Hot-adding memory via the probe interface creates a
      memory block while holding mem_hotplug_begin(), there is no way to take
      dev->mutex first in this case.
      online_pages() and offline_pages() are only called when transitioning
      memory block state.  We now require that mem_hotplug_begin() is taken
      before calling them -- this requires exporting the mem_hotplug_begin() and
      mem_hotplug_done() to generic code.  In all hot-add and hot-remove cases,
      mem_hotplug_begin() is done prior to device_online().  This is all that is
      needed to avoid the deadlock.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Reported-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Tested-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Zhang Zhen <zhenzhang.zhang@huawei.com>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Sheng Yong's avatar
      memory hotplug: use macro to switch between section and pfn · 19c07d5e
      Sheng Yong authored
      Use macro section_nr_to_pfn() to switch between section and pfn, instead
      of open-coding it.  No semantic changes.
      Signed-off-by: default avatarSheng Yong <shengyong1@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  8. 25 Mar, 2015 2 commits
  9. 13 Dec, 2014 1 commit
  10. 10 Oct, 2014 1 commit
    • Zhang Zhen's avatar
      memory-hotplug: add sysfs valid_zones attribute · ed2f2400
      Zhang Zhen authored
      Currently memory-hotplug has two limits:
      1. If the memory block is in ZONE_NORMAL, you can change it to
         ZONE_MOVABLE, but this memory block must be adjacent to ZONE_MOVABLE.
      2. If the memory block is in ZONE_MOVABLE, you can change it to
         ZONE_NORMAL, but this memory block must be adjacent to ZONE_NORMAL.
      With this patch, we can easy to know a memory block can be onlined to
      which zone, and don't need to know the above two limits.
      Updated the related Documentation.
      [akpm@linux-foundation.org: use conventional comment layout]
      [akpm@linux-foundation.org: fix build with CONFIG_MEMORY_HOTREMOVE=n]
      [akpm@linux-foundation.org: remove unused local zone_prev]
      Signed-off-by: default avatarZhang Zhen <zhenzhang.zhang@huawei.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  11. 07 Aug, 2014 3 commits
  12. 04 Jun, 2014 1 commit
  13. 17 Oct, 2013 1 commit
  14. 29 Aug, 2013 2 commits
    • Rafael J. Wysocki's avatar
      driver core / ACPI: Avoid device hot remove locking issues · 5e33bc41
      Rafael J. Wysocki authored
      device_hotplug_lock is held around the acpi_bus_trim() call in
      acpi_scan_hot_remove() which generally removes devices (it removes
      ACPI device objects at least, but it may also remove "physical"
      device objects through .detach() callbacks of ACPI scan handlers).
      Thus, potentially, device sysfs attributes are removed under that
      lock and to remove those attributes it is necessary to hold the
      s_active references of their directory entries for writing.
      On the other hand, the execution of a .show() or .store() callback
      from a sysfs attribute is carried out with that attribute's s_active
      reference held for reading.  Consequently, if any device sysfs
      attribute that may be removed from within acpi_scan_hot_remove()
      through acpi_bus_trim() has a .store() or .show() callback which
      acquires device_hotplug_lock, the execution of that callback may
      deadlock with the removal of the attribute.  [Unfortunately, the
      "online" device attribute of CPUs and memory blocks is one of them.]
      To avoid such deadlocks, make all of the sysfs attribute callbacks
      that need to lock device hotplug, for example store_online(), use
      a special function, lock_device_hotplug_sysfs(), to lock device
      hotplug and return the result of that function immediately if it is
      not zero.  This will cause the s_active reference of the directory
      entry in question to be released and the syscall to be restarted
      if device_hotplug_lock cannot be acquired.
      [show_online() actually doesn't need to lock device hotplug, but
      it is useful to serialize it with respect to device_offline() and
      device_online() for the same device (in case user space attempts to
      run them concurrently) which can be done with the help of
      Reported-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Reported-and-tested-by: default avatarGu Zheng <guz.fnst@cn.fujitsu.com>
      Suggested-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: default avatarToshi Kani <toshi.kani@hp.com>
    • Russ Anderson's avatar
      drivers/base/memory.c: fix show_mem_removable() to handle missing sections · 21ea9f5a
      Russ Anderson authored
      "cat /sys/devices/system/memory/memory*/removable" crashed the system.
      The problem is that show_mem_removable() is passing a
      bad pfn to is_mem_section_removable(), which causes
          if (!node_online(page_to_nid(page)))
      to blow up.  Why is it passing in a bad pfn?
      The reason is that show_mem_removable() will loop sections_per_block
      times.  sections_per_block is 16, but mem->section_count is 8,
      indicating holes in this memory block.  Checking that the memory section
      is present before checking to see if the memory section is removable
      fixes the problem.
         harp5-sys:~ # cat /sys/devices/system/memory/memory*/removable
         BUG: unable to handle kernel paging request at ffffea00c3200000
         IP: [<ffffffff81117ed1>] is_pageblock_removable_nolock+0x1/0x90
         PGD 83ffd4067 PUD 37bdfce067 PMD 0
         Oops: 0000 [#1] SMP
         Modules linked in: autofs4 binfmt_misc rdma_ucm rdma_cm iw_cm ib_addr ib_srp scsi_transport_srp scsi_tgt ib_ipoib ib_cm ib_uverbs ib_umad iw_cxgb3 cxgb3 mdio mlx4_en mlx4_ib ib_sa mlx4_core ib_mthca ib_mad ib_core fuse nls_iso8859_1 nls_cp437 vfat fat joydev loop hid_generic usbhid hid hwperf(O) numatools(O) dm_mod iTCO_wdt ipv6 iTCO_vendor_support igb i2c_i801 ioatdma i2c_algo_bit ehci_pci pcspkr lpc_ich i2c_core ehci_hcd ptp sg mfd_core dca rtc_cmos pps_core mperf button xhci_hcd sd_mod crc_t10dif usbcore usb_common scsi_dh_emc scsi_dh_hp_sw scsi_dh_alua scsi_dh_rdac scsi_dh gru(O) xvma(O) xfs crc32c libcrc32c thermal sata_nv processor piix mptsas mptscsih scsi_transport_sas mptbase megaraid_sas fan thermal_sys hwmon ext3 jbd ata_piix ahci libahci libata scsi_mod
         CPU: 4 PID: 5991 Comm: cat Tainted: G           O 3.11.0-rc5-rja-uv+ #10
         Hardware name: SGI UV2000/ROMLEY, BIOS SGI UV 2000/3000 series BIOS 01/15/2013
         task: ffff88081f034580 ti: ffff880820022000 task.ti: ffff880820022000
         RIP: 0010:[<ffffffff81117ed1>]  [<ffffffff81117ed1>] is_pageblock_removable_nolock+0x1/0x90
         RSP: 0018:ffff880820023df8  EFLAGS: 00010287
         RAX: 0000000000040000 RBX: ffffea00c3200000 RCX: 0000000000000004
         RDX: ffffea00c30b0000 RSI: 00000000001c0000 RDI: ffffea00c3200000
         RBP: ffff880820023e38 R08: 0000000000000000 R09: 0000000000000001
         R10: 0000000000000000 R11: 0000000000000001 R12: ffffea00c33c0000
         R13: 0000160000000000 R14: 6db6db6db6db6db7 R15: 0000000000000001
         FS:  00007ffff7fb2700(0000) GS:ffff88083fc80000(0000) knlGS:0000000000000000
         CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
         CR2: ffffea00c3200000 CR3: 000000081b954000 CR4: 00000000000407e0
         Call Trace:
      Signed-off-by: default avatarRuss Anderson <rja@sgi.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Reviewed-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  15. 28 Aug, 2013 1 commit
  16. 21 Aug, 2013 8 commits
  17. 27 Jul, 2013 1 commit
  18. 06 Jun, 2013 1 commit
    • Nathan Fontenot's avatar
      drivers/base: Use attribute groups to create sysfs memory files · 96b2c0fc
      Nathan Fontenot authored
      Update the sysfs memory code to create/delete files at the time of device
      and subsystem registration.
      The current code creates files in the root memory directory explicitly through
      the use of init_* routines. The files for each memory block are created and
      deleted explicitly using the mem_[create|delete]_simple_file macros.
      This patch creates attribute groups for the memory root files and files in
      each memory block directory so that they are created and deleted implicitly
      at subsys and device register and unregister time.
      This did necessitate moving the register_memory() updating it to set the
      dev.groups field.
      Signed-off-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  19. 01 Jun, 2013 2 commits
  20. 12 May, 2013 1 commit
  21. 29 Apr, 2013 2 commits
  22. 24 Feb, 2013 1 commit
    • Yasuaki Ishimatsu's avatar
      memory-hotplug: check whether all memory blocks are offlined or not when removing memory · 6677e3ea
      Yasuaki Ishimatsu authored
      We remove the memory like this:
       1. lock memory hotplug
       2. offline a memory block
       3. unlock memory hotplug
       4. repeat 1-3 to offline all memory blocks
       5. lock memory hotplug
       6. remove memory(TODO)
       7. unlock memory hotplug
      All memory blocks must be offlined before removing memory.  But we don't
      hold the lock in the whole operation.  So we should check whether all
      memory blocks are offlined before step6.  Otherwise, kernel maybe
      Offlining a memory block and removing a memory device can be two
      different operations.  Users can just offline some memory blocks without
      removing the memory device.  For this purpose, the kernel has held
      lock_memory_hotplug() in __offline_pages().  To reuse the code for
      memory hot-remove, we repeat step 1-3 to offline all the memory blocks,
      repeatedly lock and unlock memory hotplug, but not hold the memory
      hotplug lock in the whole operation.
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  23. 18 Feb, 2013 1 commit
  24. 12 Dec, 2012 1 commit
    • Lai Jiangshan's avatar
      mm, memory-hotplug: dynamic configure movable memory and portion memory · 511c2aba
      Lai Jiangshan authored
      Add online_movable and online_kernel for logic memory hotplug.  This is
      the dynamic version of "movablecore" & "kernelcore".
      We have the same reason to introduce it as to introduce "movablecore" &
      "kernelcore".  It has the same motive as "movablecore" & "kernelcore", but
      it is dynamic/running-time:
      o We can configure memory as kernelcore or movablecore after boot.
        Userspace workload is increased, we need more hugepage, we can't use
        "online_movable" to add memory and allow the system use more
        THP(transparent-huge-page), vice-verse when kernel workload is increase.
        Also help for virtualization to dynamic configure host/guest's memory,
        to save/(reduce waste) memory.
        Memory capacity on Demand
      o When a new node is physically online after boot, we need to use
        "online_movable" or "online_kernel" to configure/portion it as we
        expected when we logic-online it.
        This configuration also helps for physically-memory-migrate.
      o all benefit as the same as existed "movablecore" & "kernelcore".
      o Preparing for movable-node, which is very important for power-saving,
        hardware partitioning and high-available-system(hardware fault
      (Note, we don't introduce movable-node here.)
      Action behavior:
      When a memoryblock/memorysection is onlined by "online_movable", the kernel
      will not have directly reference to the page of the memoryblock,
      thus we can remove that memory any time when needed.
      When it is online by "online_kernel", the kernel can use it.
      When it is online by "online", the zone type doesn't changed.
      Current constraints:
      Only the memoryblock which is adjacent to the ZONE_MOVABLE
      can be online from ZONE_NORMAL to ZONE_MOVABLE.
      [akpm@linux-foundation.org: use min_t, cleanups]
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>