1. 28 Feb, 2013 1 commit
    • Frederic Weisbecker's avatar
      irq: Don't re-enable interrupts at the end of irq_exit · 4cd5d111
      Frederic Weisbecker authored
      Commit 74eed016
      "irq: Ensure irq_exit() code runs with interrupts disabled"
      restore interrupts flags in the end of irq_exit() for archs
      that don't define __ARCH_IRQ_EXIT_IRQS_DISABLED.
      However always returning from irq_exit() with interrupts
      disabled should not be a problem for these archs. Prior to
      this commit this was already happening anytime we processed
      pending softirqs anyway.
      Suggested-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: 's avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
  2. 21 Feb, 2013 3 commits
    • Frederic Weisbecker's avatar
      irq: Remove IRQ_EXIT_OFFSET workaround · 4d4c4e24
      Frederic Weisbecker authored
      The IRQ_EXIT_OFFSET trick was used to make sure the irq
      doesn't get preempted after we substract the HARDIRQ_OFFSET
      until we are entirely done with any code in irq_exit().
      This workaround was necessary because some archs may call
      irq_exit() with irqs enabled and there is still some code
      in the end of this function that is not covered by the
      HARDIRQ_OFFSET but want to stay non-preemptible.
      Now that irq are always disabled in irq_exit(), the whole code
      is guaranteed not to be preempted. We can thus remove this hack.
      Signed-off-by: 's avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    • Thomas Gleixner's avatar
      irq: Sanitize invoke_softirq · facd8b80
      Thomas Gleixner authored
      With the irq protection in irq_exit, we can remove the #ifdeffery and
      the bh_disable/enable dance in invoke_softirq()
      Signed-off-by: 's avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linuxfoundation.org>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1302202155320.22263@ionos
    • Thomas Gleixner's avatar
      irq: Ensure irq_exit() code runs with interrupts disabled · 74eed016
      Thomas Gleixner authored
      We had already a few problems with code called from irq_exit() when
      interrupted from a nesting interrupt. This can happen on architectures
      which do not define __ARCH_IRQ_EXIT_IRQS_DISABLED.
      __ARCH_IRQ_EXIT_IRQS_DISABLED should go away and we want to make it
      mandatory to call irq_exit() with interrupts disabled.
      As a temporary protection disable interrupts for those architectures
      which do not define __ARCH_IRQ_EXIT_IRQS_DISABLED and add a WARN_ONCE
      when an architecture which defines __ARCH_IRQ_EXIT_IRQS_DISABLED calls
      irq_exit() with interrupts enabled.
      Signed-off-by: 's avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linuxfoundation.org>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1302202155320.22263@ionos
  3. 27 Jan, 2013 1 commit
    • Frederic Weisbecker's avatar
      cputime: Safely read cputime of full dynticks CPUs · 6a61671b
      Frederic Weisbecker authored
      While remotely reading the cputime of a task running in a
      full dynticks CPU, the values stored in utime/stime fields
      of struct task_struct may be stale. Its values may be those
      of the last kernel <-> user transition time snapshot and
      we need to add the tickless time spent since this snapshot.
      To fix this, flush the cputime of the dynticks CPUs on
      kernel <-> user transition and record the time / context
      where we did this. Then on top of this snapshot and the current
      time, perform the fixup on the reader side from task_times()
      Signed-off-by: 's avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      [fixed kvm module related build errors]
      Signed-off-by: 's avatarSedat Dilek <sedat.dilek@gmail.com>
  4. 10 Jan, 2013 1 commit
    • Eric Dumazet's avatar
      softirq: reduce latencies · c10d7367
      Eric Dumazet authored
      In various network workloads, __do_softirq() latencies can be up
      to 20 ms if HZ=1000, and 200 ms if HZ=100.
      This is because we iterate 10 times in the softirq dispatcher,
      and some actions can consume a lot of cycles.
      This patch changes the fallback to ksoftirqd condition to :
      - A time limit of 2 ms.
      - need_resched() being set on current task
      When one of this condition is met, we wakeup ksoftirqd for further
      softirq processing if we still have pending softirqs.
      Using need_resched() as the only condition can trigger RCU stalls,
      as we can keep BH disabled for too long.
      I ran several benchmarks and got no significant difference in
      throughput, but a very significant reduction of latencies (one order
      of magnitude) :
      In following bench, 200 antagonist "netperf -t TCP_RR" are started in
      background, using all available cpus.
      Then we start one "netperf -t TCP_RR", bound to the cpu handling the NIC
      IRQ (hard+soft)
      Before patch :
      # netperf -H -t TCP_RR -T2,2 -- -k
      to () port 0 AF_INET : first burst 0 : cpu bind
      After patch :
      # netperf -H -t TCP_RR -T2,2 -- -k
      to () port 0 AF_INET : first burst 0 : cpu bind
      Signed-off-by: 's avatarEric Dumazet <edumazet@google.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
  5. 29 Oct, 2012 1 commit
    • Frederic Weisbecker's avatar
      cputime: Specialize irq vtime hooks · fa5058f3
      Frederic Weisbecker authored
      With CONFIG_VIRT_CPU_ACCOUNTING, when vtime_account()
      is called in irq entry/exit, we perform a check on the
      context: if we are interrupting the idle task we
      account the pending cputime to idle, otherwise account
      to system time or its sub-areas: tsk->stime, hardirq time,
      softirq time, ...
      However this check for idle only concerns the hardirq entry
      and softirq entry:
      * Hardirq may directly interrupt the idle task, in which case
      we need to flush the pending CPU time to idle.
      * The idle task may be directly interrupted by a softirq if
      it calls local_bh_enable(). There is probably no such call
      in any idle task but we need to cover every case. Ksoftirqd
      is not concerned because the idle time is flushed on context
      switch and softirq in the end of hardirq have the idle time
      already flushed from the hardirq entry.
      In the other cases we always account to system/irq time:
      * On hardirq exit we account the time to hardirq time.
      * On softirq exit we account the time to softirq time.
      To optimize this and avoid the indirect call to vtime_account()
      and the checks it performs, specialize the vtime irq APIs and
      only perform the check on irq entry. Irq exit can directly call
      CONFIG_IRQ_TIME_ACCOUNTING behaviour doesn't change and directly
      maps to its own vtime_account() implementation. One may want
      to take benefits from the new APIs to optimize irq time accounting
      as well in the future.
      Signed-off-by: 's avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
  6. 25 Sep, 2012 1 commit
    • Frederic Weisbecker's avatar
      cputime: Use a proper subsystem naming for vtime related APIs · bf9fae9f
      Frederic Weisbecker authored
      Use a naming based on vtime as a prefix for virtual based
      cputime accounting APIs:
      - account_system_vtime() -> vtime_account()
      - account_switch_vtime() -> vtime_task_switch()
      It makes it easier to allow for further declension such
      as vtime_account_system(), vtime_account_idle(), ... if we
      want to find out the context we account to from generic code.
      This also make it better to know on which subsystem these APIs
      refer to.
      Signed-off-by: 's avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
  7. 13 Aug, 2012 1 commit
  8. 01 Aug, 2012 1 commit
    • Mel Gorman's avatar
      mm: allow PF_MEMALLOC from softirq context · 907aed48
      Mel Gorman authored
      This is needed to allow network softirq packet processing to make use of
      Currently softirq context cannot use PF_MEMALLOC due to it not being
      associated with a task, and therefore not having task flags to fiddle with
      - thus the gfp to alloc flag mapping ignores the task flags when in
      interrupts (hard or soft) context.
      Allowing softirqs to make use of PF_MEMALLOC therefore requires some
      trickery.  This patch borrows the task flags from whatever process happens
      to be preempted by the softirq.  It then modifies the gfp to alloc flags
      mapping to not exclude task flags in softirq context, and modify the
      softirq code to save, clear and restore the PF_MEMALLOC flag.
      The save and clear, ensures the preempted task's PF_MEMALLOC flag doesn't
      leak into the softirq.  The restore ensures a softirq's PF_MEMALLOC flag
      cannot leak back into the preempted process.  This should be safe due to
      the following reasons
      Softirqs can run on multiple CPUs sure but the same task should not be
      	executing the same softirq code. Neither should the softirq
      	handler be preempted by any other softirq handler so the flags
      	should not leak to an unrelated softirq.
      Softirqs re-enable hardware interrupts in __do_softirq() so can be
      	preempted by hardware interrupts so PF_MEMALLOC is inherited
      	by the hard IRQ. However, this is similar to a process in
      	reclaim being preempted by a hardirq. While PF_MEMALLOC is
      	set, gfp_to_alloc_flags() distinguishes between hard and
      	soft irqs and avoids giving a hardirq the ALLOC_NO_WATERMARKS
      If the softirq is deferred to ksoftirq then its flags may be used
              instead of a normal tasks but as the softirq cannot be preempted,
              the PF_MEMALLOC flag does not leak to other code by accident.
      [davem@davemloft.net: Document why PF_MEMALLOC is safe]
      Signed-off-by: 's avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: 's avatarMel Gorman <mgorman@suse.de>
      Cc: David Miller <davem@davemloft.net>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
  9. 06 Mar, 2012 1 commit
  10. 01 Mar, 2012 2 commits
  11. 15 Feb, 2012 1 commit
    • Frederic Weisbecker's avatar
      timer: Fix bad idle check on irq entry · 0a8a2e78
      Frederic Weisbecker authored
      idle_cpu() is called on irq entry to guess if we need to call
      tick_check_idle(). This way we can catch up with jiffies if the tick
      was stopped, stop accounting idle time during the interrupt and
      maintain the sched clock if it is unstable.
      But if we are going to exit the idle loop to schedule a new task (ie:
      if we have a task in the runqueue or a remotely enqueued ttwu to
      perform), the idle_cpu() check will return 0 such that we miss the
      call to tick_check_idle() for all interrupts happening before we
      schedule the new task.
      As a result these interrupts and the softirqs coming along may deal
      with stale jiffies values, bad sched clock values, and won't substract
      their time from the idle time accounting.
      Fix this with using is_idle_task() instead that strictly checks that
      we are running the idle task, without caring about the fact we are
      going to schedule a task soon.
      Signed-off-by: 's avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Link: http://lkml.kernel.org/r/1327427984-23282-3-git-send-email-fweisbec@gmail.comSigned-off-by: 's avatarThomas Gleixner <tglx@linutronix.de>
  12. 03 Feb, 2012 1 commit
  13. 11 Dec, 2011 2 commits
    • Frederic Weisbecker's avatar
      rcu: Fix early call to rcu_idle_enter() · 416eb33c
      Frederic Weisbecker authored
      On the irq exit path, tick_nohz_irq_exit()
      may raise a softirq, which action leads to the wake up
      path and select_task_rq_fair() that makes use of rcu
      to iterate the domains.
      This is an illegal use of RCU because we may be in RCU
      extended quiescent state if we interrupted an RCU-idle
      window in the idle loop:
      [  132.978883] ===============================
      [  132.978883] [ INFO: suspicious RCU usage. ]
      [  132.978883] -------------------------------
      [  132.978883] kernel/sched_fair.c:1707 suspicious rcu_dereference_check() usage!
      [  132.978883]
      [  132.978883] other info that might help us debug this:
      [  132.978883]
      [  132.978883]
      [  132.978883] rcu_scheduler_active = 1, debug_locks = 0
      [  132.978883] RCU used illegally from extended quiescent state!
      [  132.978883] 2 locks held by swapper/0:
      [  132.978883]  #0:  (&p->pi_lock){-.-.-.}, at: [<ffffffff8105a729>] try_to_wake_up+0x39/0x2f0
      [  132.978883]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8105556a>] select_task_rq_fair+0x6a/0xec0
      [  132.978883]
      [  132.978883] stack backtrace:
      [  132.978883] Pid: 0, comm: swapper Tainted: G        W   3.0.0+ #178
      [  132.978883] Call Trace:
      [  132.978883]  <IRQ>  [<ffffffff810a01f6>] lockdep_rcu_suspicious+0xe6/0x100
      [  132.978883]  [<ffffffff81055c49>] select_task_rq_fair+0x749/0xec0
      [  132.978883]  [<ffffffff8105556a>] ? select_task_rq_fair+0x6a/0xec0
      [  132.978883]  [<ffffffff812fe494>] ? do_raw_spin_lock+0x54/0x150
      [  132.978883]  [<ffffffff810a1f2d>] ? trace_hardirqs_on+0xd/0x10
      [  132.978883]  [<ffffffff8105a7c3>] try_to_wake_up+0xd3/0x2f0
      [  132.978883]  [<ffffffff81094f98>] ? ktime_get+0x68/0xf0
      [  132.978883]  [<ffffffff8105aa35>] wake_up_process+0x15/0x20
      [  132.978883]  [<ffffffff81069dd5>] raise_softirq_irqoff+0x65/0x110
      [  132.978883]  [<ffffffff8108eb65>] __hrtimer_start_range_ns+0x415/0x5a0
      [  132.978883]  [<ffffffff812fe3ee>] ? do_raw_spin_unlock+0x5e/0xb0
      [  132.978883]  [<ffffffff8108ed08>] hrtimer_start+0x18/0x20
      [  132.978883]  [<ffffffff8109c9c3>] tick_nohz_stop_sched_tick+0x393/0x450
      [  132.978883]  [<ffffffff810694f2>] irq_exit+0xd2/0x100
      [  132.978883]  [<ffffffff81829e96>] do_IRQ+0x66/0xe0
      [  132.978883]  [<ffffffff81820d53>] common_interrupt+0x13/0x13
      [  132.978883]  <EOI>  [<ffffffff8103434b>] ? native_safe_halt+0xb/0x10
      [  132.978883]  [<ffffffff810a1f2d>] ? trace_hardirqs_on+0xd/0x10
      [  132.978883]  [<ffffffff810144ea>] default_idle+0xba/0x370
      [  132.978883]  [<ffffffff810147fe>] amd_e400_idle+0x5e/0x130
      [  132.978883]  [<ffffffff8100a9f6>] cpu_idle+0xb6/0x120
      [  132.978883]  [<ffffffff817f217f>] rest_init+0xef/0x150
      [  132.978883]  [<ffffffff817f20e2>] ? rest_init+0x52/0x150
      [  132.978883]  [<ffffffff81ed9cf3>] start_kernel+0x3da/0x3e5
      [  132.978883]  [<ffffffff81ed9346>] x86_64_start_reservations+0x131/0x135
      [  132.978883]  [<ffffffff81ed944d>] x86_64_start_kernel+0x103/0x112
      Fix this by calling rcu_idle_enter() after tick_nohz_irq_exit().
      Signed-off-by: 's avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: 's avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: 's avatarJosh Triplett <josh@joshtriplett.org>
    • Frederic Weisbecker's avatar
      nohz: Separate out irq exit and idle loop dyntick logic · 280f0677
      Frederic Weisbecker authored
      The tick_nohz_stop_sched_tick() function, which tries to delay
      the next timer tick as long as possible, can be called from two
      - From the idle loop to start the dytick idle mode
      - From interrupt exit if we have interrupted the dyntick
      idle mode, so that we reprogram the next tick event in
      case the irq changed some internal state that requires this
      There are only few minor differences between both that
      are handled by that function, driven by the ts->inidle
      cpu variable and the inidle parameter. The whole guarantees
      that we only update the dyntick mode on irq exit if we actually
      interrupted the dyntick idle mode, and that we enter in RCU extended
      quiescent state from idle loop entry only.
      Split this function into:
      - tick_nohz_idle_enter(), which sets ts->inidle to 1, enters
      dynticks idle mode unconditionally if it can, and enters into RCU
      extended quiescent state.
      - tick_nohz_irq_exit() which only updates the dynticks idle mode
      when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called).
      To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed
      into tick_nohz_idle_exit().
      This simplifies the code and micro-optimize the irq exit path (no need
      for local_irq_save there). This also prepares for the split between
      dynticks and rcu extended quiescent state logics. We'll need this split to
      further fix illegal uses of RCU in extended quiescent states in the idle
      Signed-off-by: 's avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: 's avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: 's avatarJosh Triplett <josh@joshtriplett.org>
  14. 31 Oct, 2011 1 commit
    • Paul Gortmaker's avatar
      kernel: Map most files to use export.h instead of module.h · 9984de1a
      Paul Gortmaker authored
      The changed files were only including linux/module.h for the
      EXPORT_SYMBOL infrastructure, and nothing else.  Revector them
      onto the isolated export header for faster compile times.
      Nothing to see here but a whole lot of instances of:
        -#include <linux/module.h>
        +#include <linux/export.h>
      This commit is only changing the kernel dir; next targets
      will probably be mm, fs, the arch dirs, etc.
      Signed-off-by: 's avatarPaul Gortmaker <paul.gortmaker@windriver.com>
  15. 20 Jul, 2011 1 commit
    • Peter Zijlstra's avatar
      softirq,rcu: Inform RCU of irq_exit() activity · ec433f0c
      Peter Zijlstra authored
      The rcu_read_unlock_special() function relies on in_irq() to exclude
      scheduler activity from interrupt level.  This fails because exit_irq()
      can invoke the scheduler after clearing the preempt_count() bits that
      in_irq() uses to determine that it is at interrupt level.  This situation
      can result in failures as follows:
       $task			IRQ		SoftIRQ
       /* do stuff */
       <preempt> |= UNLOCK_BLOCKED
      			/* do stuff, don't use RCU */
      					  /* do stuff */
      					          spin_lock_irq(&pi->lock) /* deadlock */
      Ed can simply trigger this 'easy' because invoke_softirq() immediately
      does a ttwu() of ksoftirqd/# instead of doing the in-place softirq stuff
      first, but even without that the above happens.
      Cure this by also excluding softirqs from the
      rcu_read_unlock_special() handler and ensuring the force_irqthreads
      ksoftirqd/# wakeup is done from full softirq context.
      [ Alternatively, delaying the ->rcu_read_lock_nesting decrement
        until after the special handling would make the thing more robust
        in the face of interrupts as well.  And there is a separate patch
        for that. ]
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Reported-and-tested-by: 's avatarEd Tomlinson <edt@aei.ca>
      Signed-off-by: 's avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: 's avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
  16. 14 Jun, 2011 1 commit
    • Shaohua Li's avatar
      rcu: Use softirq to address performance regression · 09223371
      Shaohua Li authored
      Commit a26ac245(rcu: move TREE_RCU from softirq to kthread)
      introduced performance regression. In an AIM7 test, this commit degraded
      performance by about 40%.
      The commit runs rcu callbacks in a kthread instead of softirq. We observed
      high rate of context switch which is caused by this. Out test system has
      64 CPUs and HZ is 1000, so we saw more than 64k context switch per second
      which is caused by RCU's per-CPU kthread.  A trace showed that most of
      the time the RCU per-CPU kthread doesn't actually handle any callbacks,
      but instead just does a very small amount of work handling grace periods.
      This means that RCU's per-CPU kthreads are making the scheduler do quite
      a bit of work in order to allow a very small amount of RCU-related
      processing to be done.
      Alex Shi's analysis determined that this slowdown is due to lock
      contention within the scheduler.  Unfortunately, as Peter Zijlstra points
      out, the scheduler's real-time semantics require global action, which
      means that this contention is inherent in real-time scheduling.  (Yes,
      perhaps someone will come up with a workaround -- otherwise, -rt is not
      going to do well on large SMP systems -- but this patch will work around
      this issue in the meantime.  And "the meantime" might well be forever.)
      This patch therefore re-introduces softirq processing to RCU, but only
      for core RCU work.  RCU callbacks are still executed in kthread context,
      so that only a small amount of RCU work runs in softirq context in the
      common case.  This should minimize ksoftirqd execution, allowing us to
      skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels.
      Signed-off-by: 's avatarShaohua Li <shaohua.li@intel.com>
      Tested-by: 's avatar"Alex,Shi" <alex.shi@intel.com>
      Signed-off-by: 's avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
  17. 06 May, 2011 1 commit
    • Paul E. McKenney's avatar
      rcu: move TREE_RCU from softirq to kthread · a26ac245
      Paul E. McKenney authored
      If RCU priority boosting is to be meaningful, callback invocation must
      be boosted in addition to preempted RCU readers.  Otherwise, in presence
      of CPU real-time threads, the grace period ends, but the callbacks don't
      get invoked.  If the callbacks don't get invoked, the associated memory
      doesn't get freed, so the system is still subject to OOM.
      But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit
      moves the callback invocations to a kthread, which can be boosted easily.
      Also add comments and properly synchronized all accesses to
      rcu_cpu_kthread_task, as suggested by Lai Jiangshan.
      Signed-off-by: 's avatarPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: 's avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: 's avatarJosh Triplett <josh@joshtriplett.org>
  18. 31 Mar, 2011 1 commit
  19. 23 Mar, 2011 1 commit
  20. 26 Feb, 2011 1 commit
    • Thomas Gleixner's avatar
      genirq: Provide forced interrupt threading · 8d32a307
      Thomas Gleixner authored
      Add a commandline parameter "threadirqs" which forces all interrupts except
      those marked IRQF_NO_THREAD to run threaded. That's mostly a debug option to
      allow retrieving better debug data from crashing interrupt handlers. If
      "threadirqs" is not enabled on the kernel command line, then there is no
      impact in the interrupt hotpath.
      Architecture code needs to select CONFIG_IRQ_FORCED_THREADING after
      marking the interrupts which cant be threaded IRQF_NO_THREAD. All
      interrupts which have IRQF_TIMER set are implict marked
      IRQF_NO_THREAD. Also all PER_CPU interrupts are excluded.
      Forced threading hard interrupts also forces all soft interrupt
      handling into thread context.
      When enabled it might slow down things a bit, but for debugging problems in
      interrupt code it's a reasonable penalty as it does not immediately
      crash and burn the machine when an interrupt handler is buggy.
      Some test results on a Core2Duo machine:
      Cache cold run of:
       # time git grep irq_desc
            non-threaded       threaded
       real 1m18.741s          1m19.061s
       user 0m1.874s           0m1.757s
       sys  0m5.843s           0m5.427s
       # iperf -c server
      [  3]  0.0-10.0 sec  1.09 GBytes   933 Mbits/sec
      [  3]  0.0-10.0 sec  1.09 GBytes   934 Mbits/sec
      [  3]  0.0-10.0 sec  1.09 GBytes   933 Mbits/sec
      [  3]  0.0-10.0 sec  1.09 GBytes   939 Mbits/sec
      [  3]  0.0-10.0 sec  1.09 GBytes   934 Mbits/sec
      [  3]  0.0-10.0 sec  1.09 GBytes   937 Mbits/sec
      Signed-off-by: 's avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <20110223234956.772668648@linutronix.de>
  21. 08 Feb, 2011 1 commit
  22. 26 Jan, 2011 1 commit
  23. 13 Jan, 2011 1 commit
    • Amerigo Wang's avatar
      kernel: clean up USE_GENERIC_SMP_HELPERS · 351f8f8e
      Amerigo Wang authored
      For arch which needs USE_GENERIC_SMP_HELPERS, it has to select
      USE_GENERIC_SMP_HELPERS, rather than leaving a choice to user, since they
      don't provide their own implementions.
      Also, move on_each_cpu() to kernel/smp.c, it is strange to put it in
      For arch which doesn't use USE_GENERIC_SMP_HELPERS, e.g.  blackfin, only
      on_each_cpu() is compiled.
      Signed-off-by: 's avatarAmerigo Wang <amwang@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
  24. 07 Jan, 2011 1 commit
  25. 17 Dec, 2010 1 commit
  26. 23 Oct, 2010 1 commit
  27. 21 Oct, 2010 1 commit
    • Thomas Gleixner's avatar
      tracing: Cleanup the convoluted softirq tracepoints · f4bc6bb2
      Thomas Gleixner authored
      With the addition of trace_softirq_raise() the softirq tracepoint got
      even more convoluted. Why the tracepoints take two pointers to assign
      an integer is beyond my comprehension.
      But adding an extra case which treats the first pointer as an unsigned
      long when the second pointer is NULL including the back and forth
      type casting is just horrible.
      Convert the softirq tracepoints to take a single unsigned int argument
      for the softirq vector number and fix the call sites.
      Signed-off-by: 's avatarThomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <alpine.LFD.2.00.1010191428560.6815@localhost6.localdomain6>
      Acked-by: 's avatarPeter Zijlstra <peterz@infradead.org>
      Acked-by: mathieu.desnoyers@efficios.com
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
  28. 18 Oct, 2010 3 commits
    • Venkatesh Pallipadi's avatar
      sched: Call tick_check_idle before __irq_enter · d267f87f
      Venkatesh Pallipadi authored
      When CPU is idle and on first interrupt, irq_enter calls tick_check_idle()
      to notify interruption from idle. But, there is a problem if this call
      is done after __irq_enter, as all routines in __irq_enter may find
      stale time due to yet to be done tick_check_idle.
      Specifically, trace calls in __irq_enter when they use global clock and also
      account_system_vtime change in this patch as it wants to use sched_clock_cpu()
      to do proper irq timing.
      But, tick_check_idle was moved after __irq_enter intentionally to
      prevent problem of unneeded ksoftirqd wakeups by the commit ee5f80a9:
          irq: call __irq_enter() before calling the tick_idle_check
          Impact: avoid spurious ksoftirqd wakeups
      Moving tick_check_idle() before __irq_enter and wrapping it with
      local_bh_enable/disable would solve both the problems.
      Fixed-by: 's avatarYong Zhang <yong.zhang0@gmail.com>
      Signed-off-by: 's avatarVenkatesh Pallipadi <venki@google.com>
      Signed-off-by: 's avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1286237003-12406-9-git-send-email-venki@google.com>
      Signed-off-by: 's avatarIngo Molnar <mingo@elte.hu>
    • Venkatesh Pallipadi's avatar
      sched: Add a PF flag for ksoftirqd identification · 6cdd5199
      Venkatesh Pallipadi authored
      To account softirq time cleanly in scheduler, we need to identify whether
      softirq is invoked in ksoftirqd context or softirq at hardirq tail context.
      Add PF_KSOFTIRQD for that purpose.
      As all PF flag bits are currently taken, create space by moving one of the
      infrequently used bits (PF_THREAD_BOUND) down in task_struct to be along
      with some other state fields.
      Signed-off-by: 's avatarVenkatesh Pallipadi <venki@google.com>
      Signed-off-by: 's avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1286237003-12406-4-git-send-email-venki@google.com>
      Signed-off-by: 's avatarIngo Molnar <mingo@elte.hu>
    • Venkatesh Pallipadi's avatar
      sched: Fix softirq time accounting · 75e1056f
      Venkatesh Pallipadi authored
      Peter Zijlstra found a bug in the way softirq time is accounted in
      VIRT_CPU_ACCOUNTING on this thread:
      The problem is, softirq processing uses local_bh_disable internally. There
      is no way, later in the flow, to differentiate between whether softirq is
      being processed or is it just that bh has been disabled. So, a hardirq when bh
      is disabled results in time being wrongly accounted as softirq.
      Looking at the code a bit more, the problem exists in !VIRT_CPU_ACCOUNTING
      as well. As account_system_time() in normal tick based accouting also uses
      softirq_count, which will be set even when not in softirq with bh disabled.
      Peter also suggested solution of using 2*SOFTIRQ_OFFSET as irq count
      for local_bh_{disable,enable} and using just SOFTIRQ_OFFSET while softirq
      processing. The patch below does that and adds API in_serving_softirq() which
      returns whether we are currently processing softirq or not.
      Also changes one of the usages of softirq_count in net/sched/cls_cgroup.c
      to in_serving_softirq.
      Looks like many usages of in_softirq really want in_serving_softirq. Those
      changes can be made individually on a case by case basis.
      Signed-off-by: 's avatarVenkatesh Pallipadi <venki@google.com>
      Signed-off-by: 's avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1286237003-12406-2-git-send-email-venki@google.com>
      Signed-off-by: 's avatarIngo Molnar <mingo@elte.hu>
  29. 12 Oct, 2010 2 commits
  30. 22 Sep, 2010 1 commit
  31. 04 Jun, 2010 1 commit
  32. 27 May, 2010 1 commit
  33. 10 May, 2010 1 commit
    • Paul E. McKenney's avatar
      rcu: refactor RCU's context-switch handling · 25502a6c
      Paul E. McKenney authored
      The addition of preemptible RCU to treercu resulted in a bit of
      confusion and inefficiency surrounding the handling of context switches
      for RCU-sched and for RCU-preempt.  For RCU-sched, a context switch
      is a quiescent state, pure and simple, just like it always has been.
      For RCU-preempt, a context switch is in no way a quiescent state, but
      special handling is required when a task blocks in an RCU read-side
      critical section.
      However, the callout from the scheduler and the outer loop in ksoftirqd
      still calls something named rcu_sched_qs(), whose name is no longer
      accurate.  Furthermore, when rcu_check_callbacks() notes an RCU-sched
      quiescent state, it ends up unnecessarily (though harmlessly, aside
      from the performance hit) enqueuing the current task if it happens to
      be running in an RCU-preempt read-side critical section.  This not only
      increases the maximum latency of scheduler_tick(), it also needlessly
      increases the overhead of the next outermost rcu_read_unlock() invocation.
      This patch addresses this situation by separating the notion of RCU's
      context-switch handling from that of RCU-sched's quiescent states.
      The context-switch handling is covered by rcu_note_context_switch() in
      general and by rcu_preempt_note_context_switch() for preemptible RCU.
      This permits rcu_sched_qs() to handle quiescent states and only quiescent
      states.  It also reduces the maximum latency of scheduler_tick(), though
      probably by much less than a microsecond.  Finally, it means that tasks
      within preemptible-RCU read-side critical sections avoid incurring the
      overhead of queuing unless there really is a context switch.
      Suggested-by: 's avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Acked-by: 's avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: 's avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>