Skip to content
  • Michal Hocko's avatar
    memcg: rework mem_cgroup_iter to use cgroup iterators · 542f85f9
    Michal Hocko authored
    mem_cgroup_iter curently relies on css->id when walking down a group
    hierarchy tree.  This is really awkward because the tree walk depends on
    the groups creation ordering.  The only guarantee is that a parent node is
    visited before its children.
    
    Example:
    
     1) mkdir -p a a/d a/b/c
     2) mkdir -a a/b/c a/d
    
    Will create the same trees but the tree walks will be different:
    
     1) a, d, b, c
     2) a, b, c, d
    
    Commit 574bd9f7
    
     ("cgroup: implement generic child / descendant walk
    macros") has introduced generic cgroup tree walkers which provide either
    pre-order or post-order tree walk.  This patch converts css->id based
    iteration to pre-order tree walk to keep the semantic with the original
    iterator where parent is always visited before its subtree.
    
    cgroup_for_each_descendant_pre suggests using post_create and
    pre_destroy for proper synchronization with groups addidition resp.
    removal.  This implementation doesn't use those because a new memory
    cgroup is initialized sufficiently for iteration in mem_cgroup_css_alloc
    already and css reference counting enforces that the group is alive for
    both the last seen cgroup and the found one resp.  it signals that the
    group is dead and it should be skipped.
    
    If the reclaim cookie is used we need to store the last visited group
    into the iterator so we have to be careful that it doesn't disappear in
    the mean time.  Elevated reference count on the css keeps it alive even
    though the group have been removed (parked waiting for the last dput so
    that it can be freed).
    
    Per node-zone-prio iter_lock has been introduced to ensure that
    css_tryget and iter->last_visited is set atomically.  Otherwise two
    racing walkers could both take a references and only one release it
    leading to a css leak (which pins cgroup dentry).
    
    Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: Li Zefan <lizefan@huawei.com>
    Cc: Ying Han <yinghan@google.com>
    Cc: Tejun Heo <htejun@gmail.com>
    Cc: Glauber Costa <glommer@parallels.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    542f85f9