1. 23 Jan, 2018 1 commit
  2. 16 Nov, 2017 1 commit
  3. 17 Aug, 2017 1 commit
    • Tony Luck's avatar
      x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages · ce0fa3e5
      Tony Luck authored
      Speculative processor accesses may reference any memory that has a
      valid page table entry.  While a speculative access won't generate
      a machine check, it will log the error in a machine check bank. That
      could cause escalation of a subsequent error since the overflow bit
      will be then set in the machine check bank status register.
      
      Code has to be double-plus-tricky to avoid mentioning the 1:1 virtual
      address of the page we want to map out otherwise we may trigger the
      very problem we are trying to avoid.  We use a non-canonical address
      that passes through the usual Linux table walking code to get to the
      same "pte".
      
      Thanks to Dave Hansen for reviewing several iterations of this.
      
      Also see:
      
        http://marc.info/?l=linux-mm&m=149860136413338&w=2
      
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Elliott, Robert (Persistent Memory) <elliott@hpe.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170816171803.28342-1-tony.luck@intel.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ce0fa3e5
  4. 10 Jul, 2017 9 commits
  5. 06 Jul, 2017 2 commits
  6. 16 Jun, 2017 1 commit
  7. 02 Jun, 2017 1 commit
    • Punit Agrawal's avatar
      mm/migrate: fix refcount handling when !hugepage_migration_supported() · 30809f55
      Punit Agrawal authored
      On failing to migrate a page, soft_offline_huge_page() performs the
      necessary update to the hugepage ref-count.
      
      But when !hugepage_migration_supported() , unmap_and_move_hugepage()
      also decrements the page ref-count for the hugepage.  The combined
      behaviour leaves the ref-count in an inconsistent state.
      
      This leads to soft lockups when running the overcommitted hugepage test
      from mce-tests suite.
      
        Soft offlining pfn 0x83ed600 at process virtual address 0x400000000000
        soft offline: 0x83ed600: migration failed 1, type 1fffc00000008008 (uptodate|head)
        INFO: rcu_preempt detected stalls on CPUs/tasks:
         Tasks blocked on level-0 rcu_node (CPUs 0-7): P2715
          (detected by 7, t=5254 jiffies, g=963, c=962, q=321)
          thugetlb_overco R  running task        0  2715   2685 0x00000008
          Call trace:
            dump_backtrace+0x0/0x268
            show_stack+0x24/0x30
            sched_show_task+0x134/0x180
            rcu_print_detail_task_stall_rnp+0x54/0x7c
            rcu_check_callbacks+0xa74/0xb08
            update_process_times+0x34/0x60
            tick_sched_handle.isra.7+0x38/0x70
            tick_sched_timer+0x4c/0x98
            __hrtimer_run_queues+0xc0/0x300
            hrtimer_interrupt+0xac/0x228
            arch_timer_handler_phys+0x3c/0x50
            handle_percpu_devid_irq+0x8c/0x290
            generic_handle_irq+0x34/0x50
            __handle_domain_irq+0x68/0xc0
            gic_handle_irq+0x5c/0xb0
      
      Address this by changing the putback_active_hugepage() in
      soft_offline_huge_page() to putback_movable_pages().
      
      This only triggers on systems that enable memory failure handling
      (ARCH_SUPPORTS_MEMORY_FAILURE) but not hugepage migration
      (!ARCH_ENABLE_HUGEPAGE_MIGRATION).
      
      I imagine this wasn't triggered as there aren't many systems running
      this configuration.
      
      [akpm@linux-foundation.org: remove dead comment, per Naoya]
      Link: http://lkml.kernel.org/r/20170525135146.32011-1-punit.agrawal@arm.com
      
      Reported-by: default avatarManoj Iyer <manoj.iyer@canonical.com>
      Tested-by: default avatarManoj Iyer <manoj.iyer@canonical.com>
      Suggested-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarPunit Agrawal <punit.agrawal@arm.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: <stable@vger.kernel.org>	[3.14+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      30809f55
  8. 12 May, 2017 1 commit
  9. 03 May, 2017 5 commits
  10. 02 Mar, 2017 2 commits
  11. 25 Feb, 2017 1 commit
  12. 25 Dec, 2016 1 commit
  13. 11 Nov, 2016 1 commit
  14. 28 Jul, 2016 2 commits
    • Naoya Horiguchi's avatar
      mm: hwpoison: remove incorrect comments · 7c7fd825
      Naoya Horiguchi authored
      dequeue_hwpoisoned_huge_page() can be called without page lock hold, so
      let's remove incorrect comment.
      
      The reason why the page lock is not really needed is that
      dequeue_hwpoisoned_huge_page() checks page_huge_active() inside
      hugetlb_lock, which allows us to avoid trying to dequeue a hugepage that
      are just allocated but not linked to active list yet, even without
      taking page lock.
      
      Link: http://lkml.kernel.org/r/20160720092901.GA15995@www9186uo.sakura.ne.jp
      
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reported-by: default avatarZhan Chen <zhanc1@andrew.cmu.edu>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7c7fd825
    • Mel Gorman's avatar
      mm, vmscan: move LRU lists to node · 599d0c95
      Mel Gorman authored
      This moves the LRU lists from the zone to the node and related data such
      as counters, tracing, congestion tracking and writeback tracking.
      
      Unfortunately, due to reclaim and compaction retry logic, it is
      necessary to account for the number of LRU pages on both zone and node
      logic.  Most reclaim logic is based on the node counters but the retry
      logic uses the zone counters which do not distinguish inactive and
      active sizes.  It would be possible to leave the LRU counters on a
      per-zone basis but it's a heavier calculation across multiple cache
      lines that is much more frequent than the retry checks.
      
      Other than the LRU counters, this is mostly a mechanical patch but note
      that it introduces a number of anomalies.  For example, the scans are
      per-zone but using per-node counters.  We also mark a node as congested
      when a zone is congested.  This causes weird problems that are fixed
      later but is easier to review.
      
      In the event that there is excessive overhead on 32-bit systems due to
      the nodes being on LRU then there are two potential solutions
      
      1. Long-term isolation of highmem pages when reclaim is lowmem
      
         When pages are skipped, they are immediately added back onto the LRU
         list. If lowmem reclaim persisted for long periods of time, the same
         highmem pages get continually scanned. The idea would be that lowmem
         keeps those pages on a separate list until a reclaim for highmem pages
         arrives that splices the highmem pages back onto the LRU. It potentially
         could be implemented similar to the UNEVICTABLE list.
      
         That would reduce the skip rate with the potential corner case is that
         highmem pages have to be scanned and reclaimed to free lowmem slab pages.
      
      2. Linear scan lowmem pages if the initial LRU shrink fails
      
         This will break LRU ordering but may be preferable and faster during
         memory pressure than skipping LRU pages.
      
      Link: http://lkml.kernel.org/r/1467970510-21195-4-git-send-email-mgorman@techsingularity.net
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@surriel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      599d0c95
  15. 21 May, 2016 1 commit
  16. 29 Apr, 2016 1 commit
  17. 04 Apr, 2016 1 commit
    • Kirill A. Shutemov's avatar
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov authored
      
      
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  18. 17 Mar, 2016 1 commit
  19. 15 Mar, 2016 1 commit
  20. 16 Jan, 2016 6 commits
    • Naoya Horiguchi's avatar
      mm: soft-offline: exit with failure for non anonymous thp · 98fd1ef4
      Naoya Horiguchi authored
      
      
      Currently memory_failure() doesn't handle non anonymous thp case,
      because we can hardly expect the error handling to be successful, and it
      can just hit some corner case which results in BUG_ON or something
      severe like that.  This is also the case for soft offline code, so let's
      make it in the same way.
      
      Orignal code has a MF_COUNT_INCREASED check before put_hwpoison_page(),
      but it's unnecessary because get_any_page() is already called when
      running on this code, which takes a refcount of the target page
      regardress of the flag.  So this patch also removes it.
      
      [akpm@linux-foundation.org: fix build]
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      98fd1ef4
    • Naoya Horiguchi's avatar
      mm: soft-offline: clean up soft_offline_page() · acc14dc4
      Naoya Horiguchi authored
      
      
      soft_offline_page() has some deeply indented code, that's the sign of
      demand for cleanup.  So let's do this.  No functionality change.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      acc14dc4
    • Naoya Horiguchi's avatar
      mm: hwpoison: adjust for new thp refcounting · 4e41a30c
      Naoya Horiguchi authored
      
      
      Some mm-related BUG_ON()s could trigger from hwpoison code due to recent
      changes in thp refcounting rule.  This patch fixes them up.
      
      In the new refcounting, we no longer use tail->_mapcount to keep tail's
      refcount, and thereby we can simplify get/put_hwpoison_page().
      
      And another change is that tail's refcount is not transferred to the raw
      page during thp split (more precisely, in new rule we don't take
      refcount on tail page any more.) So when we need thp split, we have to
      transfer the refcount properly to the 4kB soft-offlined page before
      migration.
      
      thp split code goes into core code only when precheck
      (total_mapcount(head) == page_count(head) - 1) passes to avoid useless
      split, where we assume that one refcount is held by the caller of thp
      split and the others are taken via mapping.  To meet this assumption,
      this patch moves thp split part in soft_offline_page() after
      get_any_page().
      
      [akpm@linux-foundation.org: remove unneeded #define, per Kirill]
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: default avatarKirill A.  Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4e41a30c
    • Naoya Horiguchi's avatar
      mm: soft-offline: check return value in second __get_any_page() call · d96b339f
      Naoya Horiguchi authored
      I saw the following BUG_ON triggered in a testcase where a process calls
      madvise(MADV_SOFT_OFFLINE) on thps, along with a background process that
      calls migratepages command repeatedly (doing ping-pong among different
      NUMA nodes) for the first process:
      
         Soft offlining page 0x60000 at 0x700000600000
         __get_any_page: 0x60000 free buddy page
         page:ffffea0001800000 count:0 mapcount:-127 mapping:          (null) index:0x1
         flags: 0x1fffc0000000000()
         page dumped because: VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0)
         ------------[ cut here ]------------
         kernel BUG at /src/linux-dev/include/linux/mm.h:342!
         invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
         Modules linked in: cfg80211 rfkill crc32c_intel serio_raw virtio_balloon i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi
         CPU: 3 PID: 3035 Comm: test_alloc_gene Tainted: G           O    4.4.0-rc8-v4.4-rc8-160107-1501-00000-rc8+ #74
         Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
         task: ffff88007c63d5c0 ti: ffff88007c210000 task.ti: ffff88007c210000
         RIP: 0010:[<ffffffff8118998c>]  [<ffffffff8118998c>] put_page+0x5c/0x60
         RSP: 0018:ffff88007c213e00  EFLAGS: 00010246
         Call Trace:
           put_hwpoison_page+0x4e/0x80
           soft_offline_page+0x501/0x520
           SyS_madvise+0x6bc/0x6f0
           entry_SYSCALL_64_fastpath+0x12/0x6a
         Code: 8b fc ff ff 5b 5d c3 48 89 df e8 b0 fa ff ff 48 89 df 31 f6 e8 c6 7d ff ff 5b 5d c3 48 c7 c6 08 54 a2 81 48 89 df e8 a4 c5 01 00 <0f> 0b 66 90 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 8b 47
         RIP  [<ffffffff8118998c>] put_page+0x5c/0x60
          RSP <ffff88007c213e00>
      
      The root cause resides in get_any_page() which retries to get a refcount
      of the page to be soft-offlined.  This function calls
      put_hwpoison_page(), expecting that the target page is putback to LRU
      list.  But it can be also freed to buddy.  So the second check need to
      care about such case.
      
      Fixes: af8fae7c
      
       ("mm/memory-failure.c: clean up soft_offline_page()")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: <stable@vger.kernel.org>	[3.9+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d96b339f
    • Kirill A. Shutemov's avatar
      thp, mm: split_huge_page(): caller need to lock page · 4d2fa965
      Kirill A. Shutemov authored
      
      
      We're going to use migration entries instead of compound_lock() to
      stabilize page refcounts.  Setup and remove migration entries require
      page to be locked.
      
      Some of split_huge_page() callers already have the page locked.  Let's
      require everybody to lock the page before calling split_huge_page().
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Tested-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarJerome Marchand <jmarchan@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4d2fa965
    • Kirill A. Shutemov's avatar
      page-flags: define PG_locked behavior on compound pages · 48c935ad
      Kirill A. Shutemov authored
      
      
      lock_page() must operate on the whole compound page.  It doesn't make
      much sense to lock part of compound page.  Change code to use head
      page's PG_locked, if tail page is passed.
      
      This patch also gets rid of custom helper functions --
      __set_page_locked() and __clear_page_locked().  They are replaced with
      helpers generated by __SETPAGEFLAG/__CLEARPAGEFLAG.  Tail pages to these
      helper would trigger VM_BUG_ON().
      
      SLUB uses PG_locked as a bit spin locked.  IIUC, tail pages should never
      appear there.  VM_BUG_ON() is added to make sure that this assumption is
      correct.
      
      [akpm@linux-foundation.org: fix fs/cifs/file.c]
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      48c935ad