Skip to content
  • Chris Down's avatar
    mm, memcg: make scan aggression always exclude protection · 1bc63fb1
    Chris Down authored
    This patch is an incremental improvement on the existing
    memory.{low,min} relative reclaim work to base its scan pressure
    calculations on how much protection is available compared to the current
    usage, rather than how much the current usage is over some protection
    threshold.
    
    This change doesn't change the experience for the user in the normal
    case too much.  One benefit is that it replaces the (somewhat arbitrary)
    100% cutoff with an indefinite slope, which makes it easier to ballpark
    a memory.low value.
    
    As well as this, the old methodology doesn't quite apply generically to
    machines with varying amounts of physical memory.  Let's say we have a
    top level cgroup, workload.slice, and another top level cgroup,
    system-management.slice.  We want to roughly give 12G to
    system-management.slice, so on a 32GB machine we set memory.low to 20GB
    in workload.slice, and on a 64GB machine we set memory.low to 52GB.
    However, because these are relative amounts to the total machine size,
    while the amount of memory we want to generally be willing to yield to
    system.slice is absolute (12G), we end up putting more pressure on
    system.slice just because we have a larger machine and a larger workload
    to fill it, which seems fairly unintuitive.  With this new behaviour, we
    don't end up with this unintended side effect.
    
    Previously the way that memory.low protection works is that if you are
    50% over a certain baseline, you get 50% of your normal scan pressure.
    This is certainly better than the previous cliff-edge behaviour, but it
    can be improved even further by always considering memory under the
    currently enforced protection threshold to be out of bounds.  This means
    that we can set relatively low memory.low thresholds for variable or
    bursty workloads while still getting a reasonable level of protection,
    whereas with the previous version we may still trivially hit the 100%
    clamp.  The previous 100% clamp is also somewhat arbitrary, whereas this
    one is more concretely based on the currently enforced protection
    threshold, which is likely easier to reason about.
    
    There is also a subtle issue with the way that proportional reclaim
    worked previously -- it promotes having no memory.low, since it makes
    pressure higher during low reclaim.  This happens because we base our
    scan pressure modulation on how far memory.current is between memory.min
    and memory.low, but if memory.low is unset, we only use the overage
    method.  In most cromulent configurations, this then means that we end
    up with *more* pressure than with no memory.low at all when we're in low
    reclaim, which is not really very usable or expected.
    
    With this patch, memory.low and memory.min affect reclaim pressure in a
    more understandable and composable way.  For example, from a user
    standpoint, "protected" memory now remains untouchable from a reclaim
    aggression standpoint, and users can also have more confidence that
    bursty workloads will still receive some amount of guaranteed
    protection.
    
    Link: http://lkml.kernel.org/r/20190322160307.GA3316@chrisdown.name
    
    
    Signed-off-by: default avatarChris Down <chris@chrisdown.name>
    Reviewed-by: default avatarRoman Gushchin <guro@fb.com>
    Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Acked-by: default avatarMichal Hocko <mhocko@kernel.org>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Dennis Zhou <dennis@kernel.org>
    Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    1bc63fb1