Skip to content
  • Vlastimil Babka's avatar
    mm, page_alloc: fix check for NULL preferred_zone · ea57485a
    Vlastimil Babka authored
    Patch series "fix premature OOM regression in 4.7+ due to cpuset races".
    
    This is v2 of my attempt to fix the recent report based on LTP cpuset
    stress test [1].  The intention is to go to stable 4.9 LTSS with this,
    as triggering repeated OOMs is not nice.  That's why the patches try to
    be not too intrusive.
    
    Unfortunately why investigating I found that modifying the testcase to
    use per-VMA policies instead of per-task policies will bring the OOM's
    back, but that seems to be much older and harder to fix problem.  I have
    posted a RFC [2] but I believe that fixing the recent regressions has a
    higher priority.
    
    Longer-term we might try to think how to fix the cpuset mess in a better
    and less error prone way.  I was for example very surprised to learn,
    that cpuset updates change not only task->mems_allowed, but also
    nodemask of mempolicies.  Until now I expected the parameter to
    alloc_pages_nodemask() to be stable.  I wonder why do we then treat
    cpusets specially in get_page_from_freelist() and distinguish HARDWALL
    etc, when there's unconditional intersection between mempolicy and
    cpuset.  I would expect the nodemask adjustment for saving overhead in
    g_p_f(), but that clearly doesn't happen in the current form.  So we
    have both crazy complexity and overhead, AFAICS.
    
    [1] https://lkml.kernel.org/r/CAFpQJXUq-JuEP=QPidy4p_=FN0rkH5Z-kfB4qBvsf6jMS87Edg@mail.gmail.com
    [2] https://lkml.kernel.org/r/7c459f26-13a6-a817-e508-b65b903a8378@suse.cz
    
    This patch (of 4):
    
    Since commit c33d6c06 ("mm, page_alloc: avoid looking up the first
    zone in a zonelist twice") we have a wrong check for NULL preferred_zone,
    which can theoretically happen due to concurrent cpuset modification.  We
    check the zoneref pointer which is never NULL and we should check the zone
    pointer.  Also document this in first_zones_zonelist() comment per Michal
    Hocko.
    
    Fixes: c33d6c06 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
    Link: http://lkml.kernel.org/r/20170120103843.24587-2-vbabka@suse.cz
    
    
    Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
    Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
    Cc: Ganapatrao Kulkarni <gpkulkarni@gmail.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    ea57485a