1. 06 Dec, 2017 3 commits
  2. 03 Dec, 2017 1 commit
  3. 02 Dec, 2017 4 commits
  4. 30 Nov, 2017 2 commits
    • Peter Rosin's avatar
      hwmon: (jc42) optionally try to disable the SMBUS timeout · 68615eb0
      Peter Rosin authored
      With a nxp,se97 chip on an atmel sama5d31 board, the I2C adapter driver
      is not always capable of avoiding the 25-35 ms timeout as specified by
      the SMBUS protocol. This may cause silent corruption of the last bit of
      any transfer, e.g. a one is read instead of a zero if the sensor chip
      times out. This also affects the eeprom half of the nxp-se97 chip, where
      this silent corruption was originally noticed. Other I2C adapters probably
      suffer similar issues, e.g. bit-banging comes to mind as risky...
      The SMBUS register in the nxp chip is not a standard Jedec register, but
      it is not special to the nxp chips either, at least the atmel chips
      have the same mechanism. Therefore, do not special case this on the
      manufacturer, it is opt-in via the device property anyway.
      Cc: stable@vger.kernel.org # 4.9+
      Signed-off-by: default avatarPeter Rosin <peda@axentia.se>
      Acked-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
    • Michal Hocko's avatar
      Revert "mm/page-writeback.c: print a warning if the vm dirtiness settings are illogical" · 90daf306
      Michal Hocko authored
      This reverts commit 0f6d24f8 ("mm/page-writeback.c: print a warning
      if the vm dirtiness settings are illogical") because it causes false
      positive warnings during OOM situations as noticed by Tetsuo Handa:
        Node 0 active_anon:3525940kB inactive_anon:8372kB active_file:216kB inactive_file:1872kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2504kB dirty:52kB writeback:0kB shmem:8660kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 636928kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
        Node 0 DMA free:14848kB min:284kB low:352kB high:420kB active_anon:992kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
        lowmem_reserve[]: 0 2687 3645 3645
        Node 0 DMA32 free:53004kB min:49608kB low:62008kB high:74408kB active_anon:2712648kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2773132kB mlocked:0kB kernel_stack:96kB pagetables:5096kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
        lowmem_reserve[]: 0 0 958 958
        Node 0 Normal free:17140kB min:17684kB low:22104kB high:26524kB active_anon:812300kB inactive_anon:8372kB active_file:1228kB inactive_file:1868kB unevictable:0kB writepending:52kB present:1048576kB managed:981224kB mlocked:0kB kernel_stack:3520kB pagetables:8552kB bounce:0kB free_pcp:120kB local_pcp:120kB free_cma:0kB
        lowmem_reserve[]: 0 0 0 0
        Out of memory: Kill process 8459 (a.out) score 999 or sacrifice child
        Killed process 8459 (a.out) total-vm:4180kB, anon-rss:88kB, file-rss:0kB, shmem-rss:0kB
        oom_reaper: reaped process 8459 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
        vm direct limit must be set greater than background limit.
      The problem is that both thresh and bg_thresh will be 0 if
      available_memory is less than 4 pages when evaluating
      While this might be worked around the whole point of the warning is
      dubious at best.  We do rely on admins to do sensible things when
      changing tunable knobs.  Dirty memory writeback knobs are not any
      special in that regards so revert the warning rather than adding more
      hacks to work this around.
      Debugged by Yafang Shao.
      Link: http://lkml.kernel.org/r/20171127091939.tahb77nznytcxw55@dhcp22.suse.cz
      Fixes: 0f6d24f8 ("mm/page-writeback.c: print a warning if the vm dirtiness settings are illogical")
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Yafang Shao <laoar.shao@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  5. 29 Nov, 2017 3 commits
    • Tobin C. Harding's avatar
      vsprintf: add printk specifier %px · 7b1924a1
      Tobin C. Harding authored
      printk specifier %p now hashes all addresses before printing. Sometimes
      we need to see the actual unmodified address. This can be achieved using
      %lx but then we face the risk that if in future we want to change the
      way the Kernel handles printing of pointers we will have to grep through
      the already existent 50 000 %lx call sites. Let's add specifier %px as a
      clear, opt-in, way to print a pointer and maintain some level of
      isolation from all the other hex integer output within the Kernel.
      Add printk specifier %px to print the actual unmodified address.
      Signed-off-by: default avatarTobin C. Harding <me@tobin.cc>
    • Tobin C. Harding's avatar
      printk: hash addresses printed with %p · ad67b74d
      Tobin C. Harding authored
      Currently there exist approximately 14 000 places in the kernel where
      addresses are being printed using an unadorned %p. This potentially
      leaks sensitive information regarding the Kernel layout in memory. Many
      of these calls are stale, instead of fixing every call lets hash the
      address by default before printing. This will of course break some
      users, forcing code printing needed addresses to be updated.
      Code that _really_ needs the address will soon be able to use the new
      printk specifier %px to print the address.
      For what it's worth, usage of unadorned %p can be broken down as
      follows (thanks to Joe Perches).
      $ git grep -E '%p[^A-Za-z0-9]' | cut -f1 -d"/" | sort | uniq -c
         1084 arch
           20 block
           10 crypto
           32 Documentation
         8121 drivers
         1221 fs
          143 include
          101 kernel
           69 lib
          100 mm
         1510 net
           40 samples
            7 scripts
           11 security
          166 sound
          152 tools
            2 virt
      Add function ptr_to_id() to map an address to a ...
    • Tobin C. Harding's avatar
      docs: correct documentation for %pK · 553d8e8b
      Tobin C. Harding authored
      Current documentation indicates that %pK prints a leading '0x'. This is
      not the case.
      Correct documentation for printk specifier %pK.
      Signed-off-by: default avatarTobin C. Harding <me@tobin.cc>
  6. 21 Nov, 2017 2 commits
  7. 20 Nov, 2017 7 commits
  8. 19 Nov, 2017 1 commit
  9. 18 Nov, 2017 6 commits
  10. 16 Nov, 2017 11 commits
    • Rafael J. Wysocki's avatar
      PM / runtime: Drop children check from __pm_runtime_set_status() · f8817f61
      Rafael J. Wysocki authored
      The check for "active" children in __pm_runtime_set_status(), when
      trying to set the parent device status to "suspended", doesn't
      really make sense, because in fact it is not invalid to set the
      status of a device with runtime PM disabled to "suspended" in any
      case.  It is invalid to enable runtime PM for a device with its
      status set to "suspended" while its child_count reference counter
      is nonzero, but the check in __pm_runtime_set_status() doesn't
      really cover that situation.
      For this reason, drop the children check from __pm_runtime_set_status()
      and add a check against child_count reference counters of "suspended"
      devices to pm_runtime_enable().
      Fixes: a8636c89 (PM / Runtime: Don't allow to suspend a device with an active child)
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Reviewed-by: default avatarJohan Hovold <johan@kernel.org>
    • Johan Hovold's avatar
      dt-bindings: usb: document hub and host-controller properties · f877918c
      Johan Hovold authored
      Hub nodes and host-controller nodes with child nodes must specify values
      for #address-cells (1) and #size-cells (0).
      Also make the definition of the related reg property a bit more
      stringent, and add comments to the example source.
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
    • Johan Hovold's avatar
      dt-bindings: usb: clean up compatible property · bfebcf54
      Johan Hovold authored
      Add quotation marks around the compatible string to avoid ambiguity due
      to following punctuation, and define the VID and PID components.
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
    • Johan Hovold's avatar
      dt-bindings: usb: fix reg-property port-number range · f42ae7b0
      Johan Hovold authored
      The USB hub port-number range for USB 2.0 is 1-255 and not 1-31 which
      reflects an arbitrary limit set by the current Linux implementation.
      Note that for USB 3.1 hubs the valid range is 1-15.
      Increase the documented valid range in the binding to 255, which is the
      maximum allowed by the specifications.
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
    • Johan Hovold's avatar
      dt-bindings: usb: fix example hub node name · 1759f270
      Johan Hovold authored
      According to the OF Recommended Practice for USB, hub nodes shall be
      named "hub", but the example had mixed up the label and node names. Fix
      the node name and drop the redundant label.
      While at it, remove a newline and add a missing semicolon to the example
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
    • Claudio Scordino's avatar
      sched/deadline: Fix the description of runtime accounting in the documentation · 5c0342ca
      Claudio Scordino authored
      Signed-off-by: default avatarClaudio Scordino <claudio@evidence.eu.com>
      Signed-off-by: default avatarLuca Abeni <luca.abeni@santannapisa.it>
      Acked-by: default avatarDaniel Bristot de Oliveira <bristot@redhat.com>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
      Cc: linux-doc@vger.kernel.org
      Link: http://lkml.kernel.org/r/1510658366-28995-1-git-send-email-claudio@evidence.eu.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    • Kemi Wang's avatar
      mm, sysctl: make NUMA stats configurable · 4518085e
      Kemi Wang authored
      This is the second step which introduces a tunable interface that allow
      numa stats configurable for optimizing zone_statistics(), as suggested
      by Dave Hansen and Ying Huang.
      When page allocation performance becomes a bottleneck and you can
      tolerate some possible tool breakage and decreased numa counter
      precision, you can do:
      	echo 0 > /proc/sys/vm/numa_stat
      In this case, numa counter update is ignored.  We can see about
      *4.8%*(185->176) drop of cpu cycles per single page allocation and
      reclaim on Jesper's page_bench01 (single thread) and *8.1%*(343->315)
      drop of cpu cycles per single page allocation and reclaim on Jesper's
      page_bench03 (88 threads) running on a 2-Socket Broadwell-based server
      (88 threads, 126G memory).
      Benchmark link provided by Jesper D Brouer (increase loop times to
      When page allocation performance is not a bottleneck and you want all
      tooling to work, you can do:
      	echo 1 > /proc/sys/vm/numa_stat
      This is system default setting.
      Many thanks to Michal Hocko, Dave Hansen, Ying Huang and Vlastimil Babka
      for comments to help improve the original patch.
      [keescook@chromium.org: make sure mutex is a global static]
        Link: http://lkml.kernel.org/r/20171107213809.GA4314@beast
      Link: http://lkml.kernel.org/r/1508290927-8518-1-git-send-email-kemi.wang@intel.comSigned-off-by: default avatarKemi Wang <kemi.wang@intel.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reported-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Suggested-by: default avatarDave Hansen <dave.hansen@intel.com>
      Suggested-by: default avatarYing Huang <ying.huang@intel.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: "Luis R . Rodriguez" <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Christopher Lameter <cl@linux.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Tim Chen <tim.c.chen@intel.com>
      Cc: Andi Kleen <andi.kleen@intel.com>
      Cc: Aaron Lu <aaron.lu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Levin, Alexander (Sasha Levin)'s avatar
      kmemcheck: rip it out · 4675ff05
      Levin, Alexander (Sasha Levin) authored
      Fix up makefiles, remove references, and git rm kmemcheck.
      Link: http://lkml.kernel.org/r/20171007030159.22241-4-alexander.levin@verizon.comSigned-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Vegard Nossum <vegardno@ifi.uio.no>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Tim Hansen <devtimhansen@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Kirill A. Shutemov's avatar
      mm: consolidate page table accounting · af5b0f6a
      Kirill A. Shutemov authored
      Currently, we account page tables separately for each page table level,
      but that's redundant -- we only make use of total memory allocated to
      page tables for oom_badness calculation.  We also provide the
      information to userspace, but it has dubious value there too.
      This patch switches page table accounting to single counter.
      mm->pgtables_bytes is now used to account all page table levels.  We use
      bytes, because page table size for different levels of page table tree
      may be different.
      The change has user-visible effect: we don't have VmPMD and VmPUD
      reported in /proc/[pid]/status.  Not sure if anybody uses them.  (As
      alternative, we can always report 0 kB for them.)
      OOM-killer report is also slightly changed: we now report pgtables_bytes
      instead of nr_ptes, nr_pmd, nr_puds.
      Apart from reducing number of counters per-mm, the benefit is that we
      now calculate oom_badness() more correctly for machines which have
      different size of page tables depending on level or where page tables
      are less than a page in size.
      The only downside can be debuggability because we do not know which page
      table level could leak.  But I do not remember many bugs that would be
      caught by separate counters so I wouldn't lose sleep over this.
      [akpm@linux-foundation.org: fix mm/huge_memory.c]
      Link: http://lkml.kernel.org/r/20171006100651.44742-2-kirill.shutemov@linux.intel.comSigned-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      [kirill.shutemov@linux.intel.com: fix build]
        Link: http://lkml.kernel.org/r/20171016150113.ikfxy3e7zzfvsr4w@black.fi.intel.comSigned-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Kirill A. Shutemov's avatar
      mm: account pud page tables · b4e98d9a
      Kirill A. Shutemov authored
      On a machine with 5-level paging support a process can allocate
      significant amount of memory and stay unnoticed by oom-killer and memory
      cgroup.  The trick is to allocate a lot of PUD page tables.  We don't
      account PUD page tables, only PMD and PTE.
      We already addressed the same issue for PMD page tables, see commit
      dc6c9a35 ("mm: account pmd page tables to the process").
      Introduction of 5-level paging brings the same issue for PUD page
      The patch expands accounting to PUD level.
      [kirill.shutemov@linux.intel.com: s/pmd_t/pud_t/]
        Link: http://lkml.kernel.org/r/20171004074305.x35eh5u7ybbt5kar@black.fi.intel.com
      [heiko.carstens@de.ibm.com: s390/mm: fix pud table accounting]
        Link: http://lkml.kernel.org/r/20171103090551.18231-1-heiko.carstens@de.ibm.com
      Link: http://lkml.kernel.org/r/20171002080427.3320-1-kirill.shutemov@linux.intel.comSigned-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Jérôme Glisse's avatar
      mm/mmu_notifier: avoid double notification when it is useless · 0f10851e
      Jérôme Glisse authored
      This patch only affects users of mmu_notifier->invalidate_range callback
      which are device drivers related to ATS/PASID, CAPI, IOMMUv2, SVM ...
      and it is an optimization for those users.  Everyone else is unaffected
      by it.
      When clearing a pte/pmd we are given a choice to notify the event under
      the page table lock (notify version of *_clear_flush helpers do call the
      mmu_notifier_invalidate_range).  But that notification is not necessary
      in all cases.
      This patch removes almost all cases where it is useless to have a call
      to mmu_notifier_invalidate_range before
      mmu_notifier_invalidate_range_end.  It also adds documentation in all
      those cases explaining why.
      Below is a more in depth analysis of why this is fine to do this:
      For secondary TLB (non CPU TLB) like IOMMU TLB or device TLB (when
      device use thing like ATS/PASID to get the IOMMU to walk the CPU page
      table to access a process virtual address space).  There is only 2 cases
      when you need to notify those secondary TLB while holding page table
      lock when clearing a pte/pmd:
        A) page backing address is free before mmu_notifier_invalidate_range_end
        B) a page table entry is updated to point to a new page (COW, write fault
           on zero page, __replace_page(), ...)
      Case A is obvious you do not want to take the risk for the device to write
      to a page that might now be used by something completely different.
      Case B is more subtle. For correctness it requires the following sequence
      to happen:
        - take page table lock
        - clear page table entry and notify (pmd/pte_huge_clear_flush_notify())
        - set page table entry to point to new page
      If clearing the page table entry is not followed by a notify before setting
      the new pte/pmd value then you can break memory model like C11 or C++11 for
      the device.
      Consider the following scenario (device use a feature similar to ATS/
      Two address addrA and addrB such that |addrA - addrB| >= PAGE_SIZE we
      assume they are write protected for COW (other case of B apply too).
      [Time N] -----------------------------------------------------------------
      CPU-thread-0  {try to write to addrA}
      CPU-thread-1  {try to write to addrB}
      CPU-thread-2  {}
      CPU-thread-3  {}
      DEV-thread-0  {read addrA and populate device TLB}
      DEV-thread-2  {read addrB and populate device TLB}
      [Time N+1] ---------------------------------------------------------------
      CPU-thread-0  {COW_step0: {mmu_notifier_invalidate_range_start(addrA)}}
      CPU-thread-1  {COW_step0: {mmu_notifier_invalidate_range_start(addrB)}}
      CPU-thread-2  {}
      CPU-thread-3  {}
      DEV-thread-0  {}
      DEV-thread-2  {}
      [Time N+2] ---------------------------------------------------------------
      CPU-thread-0  {COW_step1: {update page table point to new page for addrA}}
      CPU-thread-1  {COW_step1: {update page table point to new page for addrB}}
      CPU-thread-2  {}
      CPU-thread-3  {}
      DEV-thread-0  {}
      DEV-thread-2  {}
      [Time N+3] ---------------------------------------------------------------
      CPU-thread-0  {preempted}
      CPU-thread-1  {preempted}
      CPU-thread-2  {write to addrA which is a write to new page}
      CPU-thread-3  {}
      DEV-thread-0  {}
      DEV-thread-2  {}
      [Time N+3] ---------------------------------------------------------------
      CPU-thread-0  {preempted}
      CPU-thread-1  {preempted}
      CPU-thread-2  {}
      CPU-thread-3  {write to addrB which is a write to new page}
      DEV-thread-0  {}
      DEV-thread-2  {}
      [Time N+4] ---------------------------------------------------------------
      CPU-thread-0  {preempted}
      CPU-thread-1  {COW_step3: {mmu_notifier_invalidate_range_end(addrB)}}
      CPU-thread-2  {}
      CPU-thread-3  {}
      DEV-thread-0  {}
      DEV-thread-2  {}
      [Time N+5] ---------------------------------------------------------------
      CPU-thread-0  {preempted}
      CPU-thread-1  {}
      CPU-thread-2  {}
      CPU-thread-3  {}
      DEV-thread-0  {read addrA from old page}
      DEV-thread-2  {read addrB from new page}
      So here because at time N+2 the clear page table entry was not pair with a
      notification to invalidate the secondary TLB, the device see the new value
      for addrB before seing the new value for addrA.  This break total memory
      ordering for the device.
      When changing a pte to write protect or to point to a new write protected
      page with same content (KSM) it is ok to delay invalidate_range callback
      to mmu_notifier_invalidate_range_end() outside the page table lock.  This
      is true even if the thread doing page table update is preempted right
      after releasing page table lock before calling
      Thanks to Andrea for thinking of a problematic scenario for COW.
      [jglisse@redhat.com: v2]
        Link: http://lkml.kernel.org/r/20171017031003.7481-2-jglisse@redhat.com
      Link: http://lkml.kernel.org/r/20170901173011.10745-1-jglisse@redhat.comSigned-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Alistair Popple <alistair@popple.id.au>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>