Commit 583ffd99 authored by Brendan Jackman's avatar Brendan Jackman Committed by Ingo Molnar

sched/fair: Force balancing on NOHZ balance if local group has capacity

The "goto force_balance" here is intended to mitigate the fact that
avg_load calculations can result in bad placement decisions when
priority is asymmetrical.

The original commit that adds it:

  fab47622 ("sched: Force balancing on newidle balance if local group has capacity")


    Under certain situations, such as a niced down task (i.e. nice =
    -15) in the presence of nr_cpus NICE0 tasks, the niced task lands
    on a sched group and kicks away other tasks because of its large
    weight. This leads to sub-optimal utilization of the
    machine. Even though the sched group has capacity, it does not
    pull tasks because sds.this_load >> sds.max_load, and f_b_g()
    returns NULL.

A similar but inverted issue also affects ARM big.LITTLE (asymmetrical CPU
capacity) systems - consider 8 always-running, same-priority tasks on a
system with 4 "big" and 4 "little" CPUs. Suppose that 5 of them end up on
the "big" CPUs (which will be represented by one sched_group in the DIE
sched_domain) and 3 on the "little" (the other sched_group in DIE), leaving
one CPU unused. Because the "big" group has a higher group_capacity its
avg_load may not present an imbalance that would cause migrating a
task to the idle "little".

The force_balance case here solves the problem but currently only for
CPU_NEWLY_IDLE balances, which in theory might never happen on the
unused CPU. Including CPU_IDLE in the force_balance case means
there's an upper bound on the time before we can attempt to solve the
underutilization: after DIE's sd->balance_interval has passed the
next nohz balance kick will help us out.
Signed-off-by: default avatarBrendan Jackman <>
Signed-off-by: default avatarPeter Zijlstra (Intel) <>
Cc: Linus Torvalds <>
Cc: Mike Galbraith <>
Cc: Morten Rasmussen <>
Cc: Paul Turner <>
Cc: Peter Zijlstra <>
Cc: Thomas Gleixner <>
Link: default avatarIngo Molnar <>
parent ea16f0ea
......@@ -8186,8 +8186,11 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
if (busiest->group_type == group_imbalanced)
goto force_balance;
/* SD_BALANCE_NEWIDLE trumps SMP nice when underutilized */
if (env->idle == CPU_NEWLY_IDLE && group_has_capacity(env, local) &&
* When dst_cpu is idle, prevent SMP nice and/or asymmetric group
* capacities from resulting in underutilization due to avg_load.
if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) &&
goto force_balance;
Markdown is supported
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment