1. 02 Mar, 2017 15 commits
  2. 01 Feb, 2017 1 commit
  3. 20 Oct, 2016 1 commit
  4. 17 Mar, 2016 1 commit
  5. 09 Feb, 2016 1 commit
    • Mel Gorman's avatar
      sched/debug: Make schedstats a runtime tunable that is disabled by default · cb251765
      Mel Gorman authored
      schedstats is very useful during debugging and performance tuning but it
      incurs overhead to calculate the stats. As such, even though it can be
      disabled at build time, it is often enabled as the information is useful.
      
      This patch adds a kernel command-line and sysctl tunable to enable or
      disable schedstats on demand (when it's built in). It is disabled
      by default as someone who knows they need it can also learn to enable
      it when necessary.
      
      The benefits are dependent on how scheduler-intensive the workload is.
      If it is then the patch reduces the number of cycles spent calculating
      the stats with a small benefit from reducing the cache footprint of the
      scheduler.
      
      These measurements were taken from a 48-core 2-socket
      machine with Xeon(R) E5-2670 v3 cpus although they were also tested on a
      single socket machine 8-core machine with Intel i7-3770 processors.
      
      netperf-tcp
                                 4.5.0-rc1             4.5.0-rc1
                                   vanilla          nostats-v3r1
      Hmean    64         560.45 (  0.00%)      575.98 (  2.77%)
      Hmean    128        766.66 (  0.00%)      795.79 (  3.80%)
      Hmean    256        950.51 (  0.00%)      981.50 (  3.26%)
      Hmean    1024      1433.25 (  0.00%)     1466.51 (  2.32%)
      Hmean    2048      2810.54 (  0.00%)     2879.75 (  2.46%)
      Hmean    3312      4618.18 (  0.00%)     4682.09 (  1.38%)
      Hmean    4096      5306.42 (  0.00%)     5346.39 (  0.75%)
      Hmean    8192     10581.44 (  0.00%)    10698.15 (  1.10%)
      Hmean    16384    18857.70 (  0.00%)    18937.61 (  0.42%)
      
      Small gains here, UDP_STREAM showed nothing intresting and neither did
      the TCP_RR tests. The gains on the 8-core machine were very similar.
      
      tbench4
                                       4.5.0-rc1             4.5.0-rc1
                                         vanilla          nostats-v3r1
      Hmean    mb/sec-1         500.85 (  0.00%)      522.43 (  4.31%)
      Hmean    mb/sec-2         984.66 (  0.00%)     1018.19 (  3.41%)
      Hmean    mb/sec-4        1827.91 (  0.00%)     1847.78 (  1.09%)
      Hmean    mb/sec-8        3561.36 (  0.00%)     3611.28 (  1.40%)
      Hmean    mb/sec-16       5824.52 (  0.00%)     5929.03 (  1.79%)
      Hmean    mb/sec-32      10943.10 (  0.00%)    10802.83 ( -1.28%)
      Hmean    mb/sec-64      15950.81 (  0.00%)    16211.31 (  1.63%)
      Hmean    mb/sec-128     15302.17 (  0.00%)    15445.11 (  0.93%)
      Hmean    mb/sec-256     14866.18 (  0.00%)    15088.73 (  1.50%)
      Hmean    mb/sec-512     15223.31 (  0.00%)    15373.69 (  0.99%)
      Hmean    mb/sec-1024    14574.25 (  0.00%)    14598.02 (  0.16%)
      Hmean    mb/sec-2048    13569.02 (  0.00%)    13733.86 (  1.21%)
      Hmean    mb/sec-3072    12865.98 (  0.00%)    13209.23 (  2.67%)
      
      Small gains of 2-4% at low thread counts and otherwise flat.  The
      gains on the 8-core machine were slightly different
      
      tbench4 on 8-core i7-3770 single socket machine
      Hmean    mb/sec-1        442.59 (  0.00%)      448.73 (  1.39%)
      Hmean    mb/sec-2        796.68 (  0.00%)      794.39 ( -0.29%)
      Hmean    mb/sec-4       1322.52 (  0.00%)     1343.66 (  1.60%)
      Hmean    mb/sec-8       2611.65 (  0.00%)     2694.86 (  3.19%)
      Hmean    mb/sec-16      2537.07 (  0.00%)     2609.34 (  2.85%)
      Hmean    mb/sec-32      2506.02 (  0.00%)     2578.18 (  2.88%)
      Hmean    mb/sec-64      2511.06 (  0.00%)     2569.16 (  2.31%)
      Hmean    mb/sec-128     2313.38 (  0.00%)     2395.50 (  3.55%)
      Hmean    mb/sec-256     2110.04 (  0.00%)     2177.45 (  3.19%)
      Hmean    mb/sec-512     2072.51 (  0.00%)     2053.97 ( -0.89%)
      
      In constract, this shows a relatively steady 2-3% gain at higher thread
      counts. Due to the nature of the patch and the type of workload, it's
      not a surprise that the result will depend on the CPU used.
      
      hackbench-pipes
                               4.5.0-rc1             4.5.0-rc1
                                 vanilla          nostats-v3r1
      Amean    1        0.0637 (  0.00%)      0.0660 ( -3.59%)
      Amean    4        0.1229 (  0.00%)      0.1181 (  3.84%)
      Amean    7        0.1921 (  0.00%)      0.1911 (  0.52%)
      Amean    12       0.3117 (  0.00%)      0.2923 (  6.23%)
      Amean    21       0.4050 (  0.00%)      0.3899 (  3.74%)
      Amean    30       0.4586 (  0.00%)      0.4433 (  3.33%)
      Amean    48       0.5910 (  0.00%)      0.5694 (  3.65%)
      Amean    79       0.8663 (  0.00%)      0.8626 (  0.43%)
      Amean    110      1.1543 (  0.00%)      1.1517 (  0.22%)
      Amean    141      1.4457 (  0.00%)      1.4290 (  1.16%)
      Amean    172      1.7090 (  0.00%)      1.6924 (  0.97%)
      Amean    192      1.9126 (  0.00%)      1.9089 (  0.19%)
      
      Some small gains and losses and while the variance data is not included,
      it's close to the noise. The UMA machine did not show anything particularly
      different
      
      pipetest
                                   4.5.0-rc1             4.5.0-rc1
                                     vanilla          nostats-v2r2
      Min         Time        4.13 (  0.00%)        3.99 (  3.39%)
      1st-qrtle   Time        4.38 (  0.00%)        4.27 (  2.51%)
      2nd-qrtle   Time        4.46 (  0.00%)        4.39 (  1.57%)
      3rd-qrtle   Time        4.56 (  0.00%)        4.51 (  1.10%)
      Max-90%     Time        4.67 (  0.00%)        4.60 (  1.50%)
      Max-93%     Time        4.71 (  0.00%)        4.65 (  1.27%)
      Max-95%     Time        4.74 (  0.00%)        4.71 (  0.63%)
      Max-99%     Time        4.88 (  0.00%)        4.79 (  1.84%)
      Max         Time        4.93 (  0.00%)        4.83 (  2.03%)
      Mean        Time        4.48 (  0.00%)        4.39 (  1.91%)
      Best99%Mean Time        4.47 (  0.00%)        4.39 (  1.91%)
      Best95%Mean Time        4.46 (  0.00%)        4.38 (  1.93%)
      Best90%Mean Time        4.45 (  0.00%)        4.36 (  1.98%)
      Best50%Mean Time        4.36 (  0.00%)        4.25 (  2.49%)
      Best10%Mean Time        4.23 (  0.00%)        4.10 (  3.13%)
      Best5%Mean  Time        4.19 (  0.00%)        4.06 (  3.20%)
      Best1%Mean  Time        4.13 (  0.00%)        4.00 (  3.39%)
      
      Small improvement and similar gains were seen on the UMA machine.
      
      The gain is small but it stands to reason that doing less work in the
      scheduler is a good thing. The downside is that the lack of schedstats and
      tracepoints may be surprising to experts doing performance analysis until
      they find the existence of the schedstats= parameter or schedstats sysctl.
      It will be automatically activated for latencytop and sleep profiling to
      alleviate the problem. For tracepoints, there is a simple warning as it's
      not safe to activate schedstats in the context when it's known the tracepoint
      may be wanted but is unavailable.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reviewed-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Reviewed-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <mgalbraith@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1454663316-22048-1-git-send-email-mgorman@techsingularity.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      cb251765
  6. 23 Sep, 2015 1 commit
  7. 19 Jun, 2015 1 commit
    • Thomas Gleixner's avatar
      timer: Reduce timer migration overhead if disabled · bc7a34b8
      Thomas Gleixner authored
      Eric reported that the timer_migration sysctl is not really nice
      performance wise as it needs to check at every timer insertion whether
      the feature is enabled or not. Further the check does not live in the
      timer code, so we have an extra function call which checks an extra
      cache line to figure out that it is disabled.
      
      We can do better and store that information in the per cpu (hr)timer
      bases. I pondered to use a static key, but that's a nightmare to
      update from the nohz code and the timer base cache line is hot anyway
      when we select a timer base.
      
      The old logic enabled the timer migration unconditionally if
      CONFIG_NO_HZ was set even if nohz was disabled on the kernel command
      line.
      
      With this modification, we start off with migration disabled. The user
      visible sysctl is still set to enabled. If the kernel switches to NOHZ
      migration is enabled, if the user did not disable it via the sysctl
      prior to the switch. If nohz=off is on the kernel command line,
      migration stays disabled no matter what.
      
      Before:
        47.76%  hog       [.] main
        14.84%  [kernel]  [k] _raw_spin_lock_irqsave
         9.55%  [kernel]  [k] _raw_spin_unlock_irqrestore
         6.71%  [kernel]  [k] mod_timer
         6.24%  [kernel]  [k] lock_timer_base.isra.38
         3.76%  [kernel]  [k] detach_if_pending
         3.71%  [kernel]  [k] del_timer
         2.50%  [kernel]  [k] internal_add_timer
         1.51%  [kernel]  [k] get_nohz_timer_target
         1.28%  [kernel]  [k] __internal_add_timer
         0.78%  [kernel]  [k] timerfn
         0.48%  [kernel]  [k] wake_up_nohz_cpu
      
      After:
        48.10%  hog       [.] main
        15.25%  [kernel]  [k] _raw_spin_lock_irqsave
         9.76%  [kernel]  [k] _raw_spin_unlock_irqrestore
         6.50%  [kernel]  [k] mod_timer
         6.44%  [kernel]  [k] lock_timer_base.isra.38
         3.87%  [kernel]  [k] detach_if_pending
         3.80%  [kernel]  [k] del_timer
         2.67%  [kernel]  [k] internal_add_timer
         1.33%  [kernel]  [k] __internal_add_timer
         0.73%  [kernel]  [k] timerfn
         0.54%  [kernel]  [k] wake_up_nohz_cpu
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Joonwoo Park <joonwoop@codeaurora.org>
      Cc: Wenbo Wang <wenbo.wang@memblaze.com>
      Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      bc7a34b8
  8. 08 May, 2015 1 commit
  9. 04 Jun, 2014 1 commit
  10. 22 May, 2014 1 commit
  11. 22 Feb, 2014 3 commits
  12. 09 Feb, 2014 3 commits
  13. 25 Jan, 2014 1 commit
  14. 24 Jan, 2014 1 commit
  15. 13 Jan, 2014 4 commits
    • Peter Zijlstra's avatar
      sched/deadline: Remove the sysctl_sched_dl knobs · 1724813d
      Peter Zijlstra authored
      Remove the deadline specific sysctls for now. The problem with them is
      that the interaction with the exisiting rt knobs is nearly impossible
      to get right.
      
      The current (as per before this patch) situation is that the rt and dl
      bandwidth is completely separate and we enforce rt+dl < 100%. This is
      undesirable because this means that the rt default of 95% leaves us
      hardly any room, even though dl tasks are saver than rt tasks.
      
      Another proposed solution was (a discarted patch) to have the dl
      bandwidth be a fraction of the rt bandwidth. This is highly
      confusing imo.
      
      Furthermore neither proposal is consistent with the situation we
      actually want; which is rt tasks ran from a dl server. In which case
      the rt bandwidth is a direct subset of dl.
      
      So whichever way we go, the introduction of dl controls at this point
      is painful. Therefore remove them and instead share the rt budget.
      
      This means that for now the rt knobs are used for dl admission control
      and the dl runtime is accounted against the rt runtime. I realise that
      this isn't entirely desirable either; but whatever we do we appear to
      need to change the interface later, so better have a small interface
      for now.
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-zpyqbqds1r0vyxtxza1e7rdc@git.kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1724813d
    • Dario Faggioli's avatar
      sched/deadline: Add bandwidth management for SCHED_DEADLINE tasks · 332ac17e
      Dario Faggioli authored
      In order of deadline scheduling to be effective and useful, it is
      important that some method of having the allocation of the available
      CPU bandwidth to tasks and task groups under control.
      This is usually called "admission control" and if it is not performed
      at all, no guarantee can be given on the actual scheduling of the
      -deadline tasks.
      
      Since when RT-throttling has been introduced each task group have a
      bandwidth associated to itself, calculated as a certain amount of
      runtime over a period. Moreover, to make it possible to manipulate
      such bandwidth, readable/writable controls have been added to both
      procfs (for system wide settings) and cgroupfs (for per-group
      settings).
      
      Therefore, the same interface is being used for controlling the
      bandwidth distrubution to -deadline tasks and task groups, i.e.,
      new controls but with similar names, equivalent meaning and with
      the same usage paradigm are added.
      
      However, more discussion is needed in order to figure out how
      we want to manage SCHED_DEADLINE bandwidth at the task group level.
      Therefore, this patch adds a less sophisticated, but actually
      very sensible, mechanism to ensure that a certain utilization
      cap is not overcome per each root_domain (the single rq for !SMP
      configurations).
      
      Another main difference between deadline bandwidth management and
      RT-throttling is that -deadline tasks have bandwidth on their own
      (while -rt ones doesn't!), and thus we don't need an higher level
      throttling mechanism to enforce the desired bandwidth.
      
      This patch, therefore:
      
       - adds system wide deadline bandwidth management by means of:
          * /proc/sys/kernel/sched_dl_runtime_us,
          * /proc/sys/kernel/sched_dl_period_us,
         that determine (i.e., runtime / period) the total bandwidth
         available on each CPU of each root_domain for -deadline tasks;
      
       - couples the RT and deadline bandwidth management, i.e., enforces
         that the sum of how much bandwidth is being devoted to -rt
         -deadline tasks to stay below 100%.
      
      This means that, for a root_domain comprising M CPUs, -deadline tasks
      can be created until the sum of their bandwidths stay below:
      
          M * (sched_dl_runtime_us / sched_dl_period_us)
      
      It is also possible to disable this bandwidth management logic, and
      be thus free of oversubscribing the system up to any arbitrary level.
      Signed-off-by: default avatarDario Faggioli <raistlin@linux.it>
      Signed-off-by: default avatarJuri Lelli <juri.lelli@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1383831828-15501-12-git-send-email-juri.lelli@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      332ac17e
    • Dario Faggioli's avatar
      sched/deadline: Add SCHED_DEADLINE inheritance logic · 2d3d891d
      Dario Faggioli authored
      Some method to deal with rt-mutexes and make sched_dl interact with
      the current PI-coded is needed, raising all but trivial issues, that
      needs (according to us) to be solved with some restructuring of
      the pi-code (i.e., going toward a proxy execution-ish implementation).
      
      This is under development, in the meanwhile, as a temporary solution,
      what this commits does is:
      
       - ensure a pi-lock owner with waiters is never throttled down. Instead,
         when it runs out of runtime, it immediately gets replenished and it's
         deadline is postponed;
      
       - the scheduling parameters (relative deadline and default runtime)
         used for that replenishments --during the whole period it holds the
         pi-lock-- are the ones of the waiting task with earliest deadline.
      
      Acting this way, we provide some kind of boosting to the lock-owner,
      still by using the existing (actually, slightly modified by the previous
      commit) pi-architecture.
      
      We would stress the fact that this is only a surely needed, all but
      clean solution to the problem. In the end it's only a way to re-start
      discussion within the community. So, as always, comments, ideas, rants,
      etc.. are welcome! :-)
      Signed-off-by: default avatarDario Faggioli <raistlin@linux.it>
      Signed-off-by: default avatarJuri Lelli <juri.lelli@gmail.com>
      [ Added !RT_MUTEXES build fix. ]
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1383831828-15501-11-git-send-email-juri.lelli@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2d3d891d
    • Dario Faggioli's avatar
      sched/deadline: Add SCHED_DEADLINE structures & implementation · aab03e05
      Dario Faggioli authored
      Introduces the data structures, constants and symbols needed for
      SCHED_DEADLINE implementation.
      
      Core data structure of SCHED_DEADLINE are defined, along with their
      initializers. Hooks for checking if a task belong to the new policy
      are also added where they are needed.
      
      Adds a scheduling class, in sched/dl.c and a new policy called
      SCHED_DEADLINE. It is an implementation of the Earliest Deadline
      First (EDF) scheduling algorithm, augmented with a mechanism (called
      Constant Bandwidth Server, CBS) that makes it possible to isolate
      the behaviour of tasks between each other.
      
      The typical -deadline task will be made up of a computation phase
      (instance) which is activated on a periodic or sporadic fashion. The
      expected (maximum) duration of such computation is called the task's
      runtime; the time interval by which each instance need to be completed
      is called the task's relative deadline. The task's absolute deadline
      is dynamically calculated as the time instant a task (better, an
      instance) activates plus the relative deadline.
      
      The EDF algorithms selects the task with the smallest absolute
      deadline as the one to be executed first, while the CBS ensures each
      task to run for at most its runtime every (relative) deadline
      length time interval, avoiding any interference between different
      tasks (bandwidth isolation).
      Thanks to this feature, also tasks that do not strictly comply with
      the computational model sketched above can effectively use the new
      policy.
      
      To summarize, this patch:
       - introduces the data structures, constants and symbols needed;
       - implements the core logic of the scheduling algorithm in the new
         scheduling class file;
       - provides all the glue code between the new scheduling class and
         the core scheduler and refines the interactions between sched/dl
         and the other existing scheduling classes.
      Signed-off-by: default avatarDario Faggioli <raistlin@linux.it>
      Signed-off-by: default avatarMichael Trimarchi <michael@amarulasolutions.com>
      Signed-off-by: default avatarFabio Checconi <fchecconi@gmail.com>
      Signed-off-by: default avatarJuri Lelli <juri.lelli@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1383831828-15501-4-git-send-email-juri.lelli@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      aab03e05
  16. 17 Dec, 2013 1 commit
  17. 09 Oct, 2013 1 commit
  18. 23 Sep, 2013 1 commit
  19. 22 Feb, 2013 1 commit