1. 12 Jan, 2018 1 commit
  2. 11 Dec, 2017 1 commit
  3. 04 Dec, 2017 2 commits
  4. 27 Nov, 2017 1 commit
  5. 22 Nov, 2017 1 commit
    • Kees Cook's avatar
      treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts · 841b86f3
      Kees Cook authored
      
      
      With all callbacks converted, and the timer callback prototype
      switched over, the TIMER_FUNC_TYPE cast is no longer needed,
      so remove it. Conversion was done with the following scripts:
      
          perl -pi -e 's|\(TIMER_FUNC_TYPE\)||g' \
              $(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u)
      
          perl -pi -e 's|\(TIMER_DATA_TYPE\)||g' \
              $(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u)
      
      The now unused macros are also dropped from include/linux/timer.h.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      841b86f3
  6. 08 Nov, 2017 1 commit
  7. 06 Nov, 2017 1 commit
  8. 03 Nov, 2017 1 commit
  9. 25 Oct, 2017 2 commits
    • Byungchul Park's avatar
      workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes · fd1a5b04
      Byungchul Park authored
      
      
      The workqueue code added manual lock acquisition annotations to catch
      deadlocks.
      
      After lockdepcrossrelease was introduced, some of those became redundant,
      since wait_for_completion() already does the acquisition and tracking.
      
      Remove the duplicate annotations.
      Signed-off-by: default avatarByungchul Park <byungchul.park@lge.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: amir73il@gmail.com
      Cc: axboe@kernel.dk
      Cc: darrick.wong@oracle.com
      Cc: david@fromorbit.com
      Cc: hch@infradead.org
      Cc: idryomov@gmail.com
      Cc: johan@kernel.org
      Cc: johannes.berg@intel.com
      Cc: kernel-team@lge.com
      Cc: linux-block@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-xfs@vger.kernel.org
      Cc: oleg@redhat.com
      Cc: tj@kernel.org
      Link: http://lkml.kernel.org/r/1508921765-15396-9-git-send-email-byungchul.park@lge.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fd1a5b04
    • Mark Rutland's avatar
      locking/atomics, workqueue: Convert ACCESS_ONCE() to READ_ONCE()/WRITE_ONCE() · c95491ed
      Mark Rutland authored
      
      
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't currently harmful.
      
      However, for some features it is necessary to instrument reads and
      writes separately, which is not possible with ACCESS_ONCE(). This
      distinction is critical to correct operation.
      
      It's possible to transform the bulk of kernel code using the Coccinelle
      script below. However, this doesn't handle comments, leaving references
      to ACCESS_ONCE() instances which have been removed. As a preparatory
      step, this patch converts the workqueue code and comments to use
      {READ,WRITE}_ONCE() consistently.
      
      ----
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-12-git-send-email-paulmck@linux.vnet.ibm.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c95491ed
  10. 21 Oct, 2017 1 commit
  11. 18 Oct, 2017 1 commit
  12. 10 Oct, 2017 1 commit
    • Tejun Heo's avatar
      workqueue: replace pool->manager_arb mutex with a flag · 692b4825
      Tejun Heo authored
      Josef reported a HARDIRQ-safe -> HARDIRQ-unsafe lock order detected by
      lockdep:
      
       [ 1270.472259] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
       [ 1270.472783] 4.14.0-rc1-xfstests-12888-g76833e8 #110 Not tainted
       [ 1270.473240] -----------------------------------------------------
       [ 1270.473710] kworker/u5:2/5157 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
       [ 1270.474239]  (&(&lock->wait_lock)->rlock){+.+.}, at: [<ffffffff8da253d2>] __mutex_unlock_slowpath+0xa2/0x280
       [ 1270.474994]
       [ 1270.474994] and this task is already holding:
       [ 1270.475440]  (&pool->lock/1){-.-.}, at: [<ffffffff8d2992f6>] worker_thread+0x366/0x3c0
       [ 1270.476046] which would create a new lock dependency:
       [ 1270.476436]  (&pool->lock/1){-.-.} -> (&(&lock->wait_lock)->rlock){+.+.}
       [ 1270.476949]
       [ 1270.476949] but this new dependency connects a HARDIRQ-irq-safe lock:
       [ 1270.477553]  (&pool->lock/1){-.-.}
       ...
       [ 1270.488900] to a HARDIRQ-irq-unsafe lock:
       [ 1270.489327]  (&(&lock->wait_lock)->rlock){+.+.}
       ...
       [ 1270.494735]  Possible interrupt unsafe locking scenario:
       [ 1270.494735]
       [ 1270.495250]        CPU0                    CPU1
       [ 1270.495600]        ----                    ----
       [ 1270.495947]   lock(&(&lock->wait_lock)->rlock);
       [ 1270.496295]                                local_irq_disable();
       [ 1270.496753]                                lock(&pool->lock/1);
       [ 1270.497205]                                lock(&(&lock->wait_lock)->rlock);
       [ 1270.497744]   <Interrupt>
       [ 1270.497948]     lock(&pool->lock/1);
      
      , which will cause a irq inversion deadlock if the above lock scenario
      happens.
      
      The root cause of this safe -> unsafe lock order is the
      mutex_unlock(pool->manager_arb) in manage_workers() with pool->lock
      held.
      
      Unlocking mutex while holding an irq spinlock was never safe and this
      problem has been around forever but it never got noticed because the
      only time the mutex is usually trylocked while holding irqlock making
      actual failures very unlikely and lockdep annotation missed the
      condition until the recent b9c16a0e
      
       ("locking/mutex: Fix
      lockdep_assert_held() fail").
      
      Using mutex for pool->manager_arb has always been a bit of stretch.
      It primarily is an mechanism to arbitrate managership between workers
      which can easily be done with a pool flag.  The only reason it became
      a mutex is that pool destruction path wants to exclude parallel
      managing operations.
      
      This patch replaces the mutex with a new pool flag POOL_MANAGER_ACTIVE
      and make the destruction path wait for the current manager on a wait
      queue.
      
      v2: Drop unnecessary flag clearing before pool destruction as
          suggested by Boqun.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarLai Jiangshan <jiangshanlai@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: stable@vger.kernel.org
      692b4825
  13. 05 Oct, 2017 2 commits
    • Kees Cook's avatar
      workqueue: Convert callback to use from_timer() · 8c20feb6
      Kees Cook authored
      
      
      In preparation for unconditionally passing the struct timer_list pointer
      to all timer callbacks, switch workqueue to use from_timer() and pass the
      timer pointer explicitly.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: linux-mips@linux-mips.org
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Sebastian Reichel <sre@kernel.org>
      Cc: Kalle Valo <kvalo@qca.qualcomm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: linux1394-devel@lists.sourceforge.net
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: linux-s390@vger.kernel.org
      Cc: linux-wireless@vger.kernel.org
      Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
      Cc: Wim Van Sebroeck <wim@iguana.be>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Ursula Braun <ubraun@linux.vnet.ibm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Harish Patil <harish.patil@cavium.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Manish Chopra <manish.chopra@cavium.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: linux-pm@vger.kernel.org
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Julian Wiedmann <jwi@linux.vnet.ibm.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Mark Gross <mark.gross@intel.com>
      Cc: linux-watchdog@vger.kernel.org
      Cc: linux-scsi@vger.kernel.org
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
      Cc: Michael Reed <mdr@sgi.com>
      Cc: netdev@vger.kernel.org
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
      Link: https://lkml.kernel.org/r/1507159627-127660-14-git-send-email-keescook@chromium.org
      8c20feb6
    • Kees Cook's avatar
      timer: Remove users of TIMER_DEFERRED_INITIALIZER · 5cd79d6a
      Kees Cook authored
      
      
      This removes uses of TIMER_DEFERRED_INITIALIZER and chooses a location
      to call timer_setup() from before add_timer() or mod_timer() is called.
      Adjusts callbacks to use from_timer() as needed.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: linux-mips@linux-mips.org
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Sebastian Reichel <sre@kernel.org>
      Cc: Kalle Valo <kvalo@qca.qualcomm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: linux1394-devel@lists.sourceforge.net
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: linux-s390@vger.kernel.org
      Cc: linux-wireless@vger.kernel.org
      Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
      Cc: Wim Van Sebroeck <wim@iguana.be>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Ursula Braun <ubraun@linux.vnet.ibm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Harish Patil <harish.patil@cavium.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Manish Chopra <manish.chopra@cavium.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: linux-pm@vger.kernel.org
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Julian Wiedmann <jwi@linux.vnet.ibm.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Mark Gross <mark.gross@intel.com>
      Cc: linux-watchdog@vger.kernel.org
      Cc: linux-scsi@vger.kernel.org
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
      Cc: Michael Reed <mdr@sgi.com>
      Cc: netdev@vger.kernel.org
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
      Link: https://lkml.kernel.org/r/1507159627-127660-7-git-send-email-keescook@chromium.org
      5cd79d6a
  14. 29 Aug, 2017 1 commit
  15. 25 Aug, 2017 2 commits
    • Peter Zijlstra's avatar
      locking/lockdep: Fix workqueue crossrelease annotation · e6f3faa7
      Peter Zijlstra authored
      
      
      The new completion/crossrelease annotations interact unfavourable with
      the extant flush_work()/flush_workqueue() annotations.
      
      The problem is that when a single work class does:
      
        wait_for_completion(&C)
      
      and
      
        complete(&C)
      
      in different executions, we'll build dependencies like:
      
        lock_map_acquire(W)
        complete_acquire(C)
      
      and
      
        lock_map_acquire(W)
        complete_release(C)
      
      which results in the dependency chain: W->C->W, which lockdep thinks
      spells deadlock, even though there is no deadlock potential since
      works are ran concurrently.
      
      One possibility would be to change the work 'lock' to recursive-read,
      but that would mean hitting a lockdep limitation on recursive locks.
      Also, unconditinoally switching to recursive-read here would fail to
      detect the actual deadlock on single-threaded workqueues, which do
      have a problem with this.
      
      For now, forcefully disregard these locks for crossrelease.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: boqun.feng@gmail.com
      Cc: byungchul.park@lge.com
      Cc: david@fromorbit.com
      Cc: johannes@sipsolutions.net
      Cc: oleg@redhat.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e6f3faa7
    • Peter Zijlstra's avatar
      workqueue/lockdep: 'Fix' flush_work() annotation · a1d14934
      Peter Zijlstra authored
      The flush_work() annotation as introduced by commit:
      
        e159489b
      
       ("workqueue: relax lockdep annotation on flush_work()")
      
      hits on the lockdep problem with recursive read locks.
      
      The situation as described is:
      
      Work W1:                Work W2:        Task:
      
      ARR(Q)                  ARR(Q)		flush_workqueue(Q)
      A(W1)                   A(W2)             A(Q)
        flush_work(W2)			  R(Q)
          A(W2)
          R(W2)
          if (special)
            A(Q)
          else
            ARR(Q)
          R(Q)
      
      where: A - acquire, ARR - acquire-read-recursive, R - release.
      
      Where under 'special' conditions we want to trigger a lock recursion
      deadlock, but otherwise allow the flush_work(). The allowing is done
      by using recursive read locks (ARR), but lockdep is broken for
      recursive stuff.
      
      However, there appears to be no need to acquire the lock if we're not
      'special', so if we remove the 'else' clause things become much
      simpler and no longer need the recursion thing at all.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: boqun.feng@gmail.com
      Cc: byungchul.park@lge.com
      Cc: david@fromorbit.com
      Cc: johannes@sipsolutions.net
      Cc: oleg@redhat.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a1d14934
  16. 23 Aug, 2017 1 commit
  17. 17 Aug, 2017 1 commit
    • Boqun Feng's avatar
      locking/lockdep: Explicitly initialize wq_barrier::done::map · 52fa5bc5
      Boqun Feng authored
      
      
      With the new lockdep crossrelease feature, which checks completions usage,
      a false positive is reported in the workqueue code:
      
      > Worker A : acquired of wfc.work -> wait for cpu_hotplug_lock to be released
      > Task   B : acquired of cpu_hotplug_lock -> wait for lock#3 to be released
      > Task   C : acquired of lock#3 -> wait for completion of barr->done
      > (Task C is in lru_add_drain_all_cpuslocked())
      > Worker D : wait for wfc.work to be released -> will complete barr->done
      
      Such a dead lock can not happen because Task C's barr->done and Worker D's
      barr->done can not be the same instance.
      
      The reason of this false positive is we initialize all wq_barrier::done
      at insert_wq_barrier() via init_completion(), which makes them belong to
      the same lock class, therefore, impossible circles are reported.
      
      To fix this, explicitly initialize the lockdep map for wq_barrier::done
      in insert_wq_barrier(), so that the lock class key of wq_barrier::done
      is a subkey of the corresponding work_struct, as a result we won't build
      a dependency between a wq_barrier with a unrelated work, and we can
      differ wq barriers based on the related works, so the false positive
      above is avoided.
      
      Also define the empty lockdep_init_map_crosslock() for !CROSSRELEASE
      to make the code simple and away from unnecessary #ifdefs.
      Reported-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170817094622.12915-1-boqun.feng@gmail.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      52fa5bc5
  18. 10 Aug, 2017 1 commit
    • Byungchul Park's avatar
      locking/lockdep: Implement the 'crossrelease' feature · b09be676
      Byungchul Park authored
      
      
      Lockdep is a runtime locking correctness validator that detects and
      reports a deadlock or its possibility by checking dependencies between
      locks. It's useful since it does not report just an actual deadlock but
      also the possibility of a deadlock that has not actually happened yet.
      That enables problems to be fixed before they affect real systems.
      
      However, this facility is only applicable to typical locks, such as
      spinlocks and mutexes, which are normally released within the context in
      which they were acquired. However, synchronization primitives like page
      locks or completions, which are allowed to be released in any context,
      also create dependencies and can cause a deadlock.
      
      So lockdep should track these locks to do a better job. The 'crossrelease'
      implementation makes these primitives also be tracked.
      Signed-off-by: default avatarByungchul Park <byungchul.park@lge.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akpm@linux-foundation.org
      Cc: boqun.feng@gmail.com
      Cc: kernel-team@lge.com
      Cc: kirill@shutemov.name
      Cc: npiggin@gmail.com
      Cc: walken@google.com
      Cc: willy@infradead.org
      Link: http://lkml.kernel.org/r/1502089981-21272-6-git-send-email-byungchul.park@lge.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b09be676
  19. 07 Aug, 2017 1 commit
  20. 28 Jul, 2017 1 commit
    • Michael Bringmann's avatar
      workqueue: Work around edge cases for calc of pool's cpumask · 1ad0f0a7
      Michael Bringmann authored
      
      
      There is an underlying assumption/trade-off in many layers of the Linux
      system that CPU <-> node mapping is static.  This is despite the presence
      of features like NUMA and 'hotplug' that support the dynamic addition/
      removal of fundamental system resources like CPUs and memory.  PowerPC
      systems, however, do provide extensive features for the dynamic change
      of resources available to a system.
      
      Currently, there is little or no synchronization protection around the
      updating of the CPU <-> node mapping, and the export/update of this
      information for other layers / modules.  In systems which can change
      this mapping during 'hotplug', like PowerPC, the information is changing
      underneath all layers that might reference it.
      
      This patch attempts to ensure that a valid, usable cpumask attribute
      is used by the workqueue infrastructure when setting up new resource
      pools.  It prevents a crash that has been observed when an 'empty'
      cpumask is passed along to the worker/task scheduling code.  It is
      intended as a temporary workaround until a more fundamental review and
      correction of the issue can be done.
      
      [With additions to the patch provided by Tejun Hao <tj@kernel.org>]
      Signed-off-by: default avatarMichael Bringmann <mwb@linux.vnet.ibm.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      1ad0f0a7
  21. 25 Jul, 2017 1 commit
    • Tejun Heo's avatar
      workqueue: implicit ordered attribute should be overridable · 0a94efb5
      Tejun Heo authored
      5c0338c6
      
       ("workqueue: restore WQ_UNBOUND/max_active==1 to be
      ordered") automatically enabled ordered attribute for unbound
      workqueues w/ max_active == 1.  Because ordered workqueues reject
      max_active and some attribute changes, this implicit ordered mode
      broke cases where the user creates an unbound workqueue w/ max_active
      == 1 and later explicitly changes the related attributes.
      
      This patch distinguishes explicit and implicit ordered setting and
      overrides from attribute changes if implict.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: 5c0338c6 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered")
      0a94efb5
  22. 19 Jul, 2017 1 commit
    • Tejun Heo's avatar
      workqueue: restore WQ_UNBOUND/max_active==1 to be ordered · 5c0338c6
      Tejun Heo authored
      The combination of WQ_UNBOUND and max_active == 1 used to imply
      ordered execution.  After NUMA affinity 4c16bd32
      
       ("workqueue:
      implement NUMA affinity for unbound workqueues"), this is no longer
      true due to per-node worker pools.
      
      While the right way to create an ordered workqueue is
      alloc_ordered_workqueue(), the documentation has been misleading for a
      long time and people do use WQ_UNBOUND and max_active == 1 for ordered
      workqueues which can lead to subtle bugs which are very difficult to
      trigger.
      
      It's unlikely that we'd see noticeable performance impact by enforcing
      ordering on WQ_UNBOUND / max_active == 1 workqueues.  Let's
      automatically set __WQ_ORDERED for those workqueues.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarChristoph Hellwig <hch@infradead.org>
      Reported-by: default avatarAlexei Potashnik <alexei@purestorage.com>
      Fixes: 4c16bd32 ("workqueue: implement NUMA affinity for unbound workqueues")
      Cc: stable@vger.kernel.org # v3.10+
      5c0338c6
  23. 20 Jun, 2017 1 commit
    • Ingo Molnar's avatar
      sched/wait: Rename wait_queue_t => wait_queue_entry_t · ac6424b9
      Ingo Molnar authored
      
      
      Rename:
      
      	wait_queue_t		=>	wait_queue_entry_t
      
      'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
      but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
      which had to carry the name.
      
      Start sorting this out by renaming it to 'wait_queue_entry_t'.
      
      This also allows the real structure name 'struct __wait_queue' to
      lose its double underscore and become 'struct wait_queue_entry',
      which is the more canonical nomenclature for such data types.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ac6424b9
  24. 15 Apr, 2017 1 commit
    • Thomas Gleixner's avatar
      workqueue: Provide work_on_cpu_safe() · 0e8d6a93
      Thomas Gleixner authored
      
      
      work_on_cpu() is not protected against CPU hotplug. For code which requires
      to be either executed on an online CPU or to fail if the CPU is not
      available the callsite would have to protect against CPU hotplug.
      
      Provide a function which does get/put_online_cpus() around the call to
      work_on_cpu() and fails the call with -ENODEV if the target CPU is not
      online.
      
      Preparatory patch to convert several racy task affinity manipulations.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Len Brown <lenb@kernel.org>
      Link: http://lkml.kernel.org/r/20170412201042.262610721@linutronix.de
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      0e8d6a93
  25. 06 Mar, 2017 2 commits
  26. 10 Feb, 2017 1 commit
    • Kees Cook's avatar
      time: Remove CONFIG_TIMER_STATS · dfb4357d
      Kees Cook authored
      
      
      Currently CONFIG_TIMER_STATS exposes process information across namespaces:
      
      kernel/time/timer_list.c print_timer():
      
              SEQ_printf(m, ", %s/%d", tmp, timer->start_pid);
      
      /proc/timer_list:
      
       #11: <0000000000000000>, hrtimer_wakeup, S:01, do_nanosleep, cron/2570
      
      Given that the tracer can give the same information, this patch entirely
      removes CONFIG_TIMER_STATS.
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
      Cc: linux-doc@vger.kernel.org
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Xing Gao <xgao01@email.wm.edu>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Jessica Frazelle <me@jessfraz.com>
      Cc: kernel-hardening@lists.openwall.com
      Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: linux-api@vger.kernel.org
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Link: http://lkml.kernel.org/r/20170208192659.GA32582@beast
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      dfb4357d
  27. 19 Oct, 2016 1 commit
    • Tejun Heo's avatar
      workqueue: move wq_numa_init() to workqueue_init() · 2186d9f9
      Tejun Heo authored
      
      
      While splitting up workqueue initialization into two parts,
      ac8f73400782 ("workqueue: make workqueue available early during boot")
      put wq_numa_init() into workqueue_init_early().  Unfortunately, on
      some archs including power and arm64, cpu to node mapping isn't yet
      established by the time the early init is called leading to incorrect
      NUMA initialization and subsequently the following oops due to zero
      cpumask on node-specific unbound pools.
      
        Unable to handle kernel paging request for data at address 0x00000038
        Faulting instruction address: 0xc0000000000fc0cc
        Oops: Kernel access of bad area, sig: 11 [#1]
        SMP NR_CPUS=2048 NUMA PowerNV
        Modules linked in:
        CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-compiler_gcc-6.2.0-next-20161005 #94
        task: c0000007f5400000 task.stack: c000001ffc084000
        NIP: c0000000000fc0cc LR: c0000000000ed928 CTR: c0000000000fbfd0
        REGS: c000001ffc087780 TRAP: 0300   Not tainted  (4.8.0-compiler_gcc-6.2.0-next-20161005)
        MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 48000424  XER: 00000000
        CFAR: c0000000000089dc DAR: 0000000000000038 DSISR: 40000000 SOFTE: 0
        GPR00: c0000000000ed928 c000001ffc087a00 c000000000e63200 c000000010d6d600
        GPR04: c0000007f5409200 0000000000000021 000000000748e08c 000000000000001f
        GPR08: 0000000000000000 0000000000000021 000000000748f1f8 0000000000000000
        GPR12: 0000000028000422 c00000000fb80000 c00000000000e0c8 0000000000000000
        GPR16: 0000000000000000 0000000000000000 0000000000000021 0000000000000001
        GPR20: ffffffffafb50401 0000000000000000 c000000010d6d600 000000000000ba7e
        GPR24: 000000000000ba7e c000000000d8bc58 afb504000afb5041 0000000000000001
        GPR28: 0000000000000000 0000000000000004 c0000007f5409280 0000000000000000
        NIP [c0000000000fc0cc] enqueue_task_fair+0xfc/0x18b0
        LR [c0000000000ed928] activate_task+0x78/0xe0
        Call Trace:
        [c000001ffc087a00] [c0000007f5409200] 0xc0000007f5409200 (unreliable)
        [c000001ffc087b10] [c0000000000ed928] activate_task+0x78/0xe0
        [c000001ffc087b50] [c0000000000ede58] ttwu_do_activate+0x68/0xc0
        [c000001ffc087b90] [c0000000000ef1b8] try_to_wake_up+0x208/0x4f0
        [c000001ffc087c10] [c0000000000d3484] create_worker+0x144/0x250
        [c000001ffc087cb0] [c000000000cd72d0] workqueue_init+0x124/0x150
        [c000001ffc087d00] [c000000000cc0e74] kernel_init_freeable+0x158/0x360
        [c000001ffc087dc0] [c00000000000e0e4] kernel_init+0x24/0x160
        [c000001ffc087e30] [c00000000000bfa0] ret_from_kernel_thread+0x5c/0xbc
        Instruction dump:
        62940401 3b800000 3aa00000 7f17c378 3a600001 3b600001 60000000 60000000
        60420000 72490021 ebfe0150 2f890001 <ebbf0038> 419e0de0 7fbee840 419e0e58
        ---[ end trace 0000000000000000 ]---
      
      Fix it by moving wq_numa_init() to workqueue_init().  As this means
      that the early intialization may not have full NUMA info for per-cpu
      pools and ignores NUMA affinity for unbound pools, fix them up from
      workqueue_init() after wq_numa_init().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: http://lkml.kernel.org/r/87twck5wqo.fsf@concordia.ellerman.id.au
      
      
      Fixes: ac8f73400782 ("workqueue: make workqueue available early during boot")
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      2186d9f9
  28. 11 Oct, 2016 1 commit
  29. 17 Sep, 2016 2 commits
    • Tejun Heo's avatar
      workqueue: remove keventd_up() · 863b710b
      Tejun Heo authored
      
      
      keventd_up() no longer has in-kernel users.  Remove it and make
      wq_online static.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      863b710b
    • Tejun Heo's avatar
      workqueue: make workqueue available early during boot · 3347fa09
      Tejun Heo authored
      
      
      Workqueue is currently initialized in an early init call; however,
      there are cases where early boot code has to be split and reordered to
      come after workqueue initialization or the same code path which makes
      use of workqueues is used both before workqueue initailization and
      after.  The latter cases have to gate workqueue usages with
      keventd_up() tests, which is nasty and easy to get wrong.
      
      Workqueue usages have become widespread and it'd be a lot more
      convenient if it can be used very early from boot.  This patch splits
      workqueue initialization into two steps.  workqueue_init_early() which
      sets up the basic data structures so that workqueues can be created
      and work items queued, and workqueue_init() which actually brings up
      workqueues online and starts executing queued work items.  The former
      step can be done very early during boot once memory allocation,
      cpumasks and idr are initialized.  The latter right after kthreads
      become available.
      
      This allows work item queueing and canceling from very early boot
      which is what most of these use cases want.
      
      * As systemd_wq being initialized doesn't indicate that workqueue is
        fully online anymore, update keventd_up() to test wq_online instead.
        The follow-up patches will get rid of all its usages and the
        function itself.
      
      * Flushing doesn't make sense before workqueue is fully initialized.
        The flush functions trigger WARN and return immediately before fully
        online.
      
      * Work items are never in-flight before fully online.  Canceling can
        always succeed by skipping the flush step.
      
      * Some code paths can no longer assume to be called with irq enabled
        as irq is disabled during early boot.  Use irqsave/restore
        operations instead.
      
      v2: Watchdog init, which requires timer to be running, moved from
          workqueue_init_early() to workqueue_init().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/CA+55aFx0vPuMuxn00rBSM192n-Du5uxy+4AvKa0SBSOVJeuCGg@mail.gmail.com
      3347fa09
  30. 16 Sep, 2016 1 commit
  31. 29 Aug, 2016 1 commit
  32. 14 Jul, 2016 1 commit
  33. 01 Jul, 2016 1 commit
  34. 16 Jun, 2016 1 commit
    • Peter Zijlstra's avatar
      workqueue: Fix setting affinity of unbound worker threads · d945b5e9
      Peter Zijlstra authored
      With commit e9d867a6
      
       ("sched: Allow per-cpu kernel threads to
      run on online && !active"), __set_cpus_allowed_ptr() expects that only
      strict per-cpu kernel threads can have affinity to an online CPU which
      is not yet active.
      
      This assumption is currently broken in the CPU_ONLINE notification
      handler for the workqueues where restore_unbound_workers_cpumask()
      calls set_cpus_allowed_ptr() when the first cpu in the unbound
      worker's pool->attr->cpumask comes online. Since
      set_cpus_allowed_ptr() is called with pool->attr->cpumask in which
      only one CPU is online which is not yet active, we get the following
      WARN_ON during an CPU online operation.
      
      ------------[ cut here ]------------
      WARNING: CPU: 40 PID: 248 at kernel/sched/core.c:1166
      __set_cpus_allowed_ptr+0x228/0x2e0
      Modules linked in:
      CPU: 40 PID: 248 Comm: cpuhp/40 Not tainted 4.6.0-autotest+ #4
      <..snip..>
      Call Trace:
      [c000000f273ff920] [c00000000010493c] __set_cpus_allowed_ptr+0x2cc/0x2e0 (unreliable)
      [c000000f273ffac0] [c0000000000ed4b0] workqueue_cpu_up_callback+0x2c0/0x470
      [c000000f273ffb70] [c0000000000f5c58] notifier_call_chain+0x98/0x100
      [c000000f273ffbc0] [c0000000000c5ed0] __cpu_notify+0x70/0xe0
      [c000000f273ffc00] [c0000000000c6028] notify_online+0x38/0x50
      [c000000f273ffc30] [c0000000000c5214] cpuhp_invoke_callback+0x84/0x250
      [c000000f273ffc90] [c0000000000c562c] cpuhp_up_callbacks+0x5c/0x120
      [c000000f273ffce0] [c0000000000c64d4] cpuhp_thread_fun+0x184/0x1c0
      [c000000f273ffd20] [c0000000000fa050] smpboot_thread_fn+0x290/0x2a0
      [c000000f273ffd80] [c0000000000f45b0] kthread+0x110/0x130
      [c000000f273ffe30] [c000000000009570] ret_from_kernel_thread+0x5c/0x6c
      ---[ end trace 00f1456578b2a3b2 ]---
      
      This patch fixes this by limiting the mask to the intersection of
      the pool affinity and online CPUs.
      
      Changelog-cribbed-from: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
      Reported-by: default avatarAbdul Haleem <abdhalee@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      d945b5e9