Skip to content
  • Matt Fleming's avatar
    sched/topology: Improve load balancing on AMD EPYC systems · a55c7454
    Matt Fleming authored
    SD_BALANCE_{FORK,EXEC} and SD_WAKE_AFFINE are stripped in sd_init()
    for any sched domains with a NUMA distance greater than 2 hops
    (RECLAIM_DISTANCE). The idea being that it's expensive to balance
    across domains that far apart.
    
    However, as is rather unfortunately explained in:
    
      commit 32e45ff4
    
     ("mm: increase RECLAIM_DISTANCE to 30")
    
    the value for RECLAIM_DISTANCE is based on node distance tables from
    2011-era hardware.
    
    Current AMD EPYC machines have the following NUMA node distances:
    
     node distances:
     node   0   1   2   3   4   5   6   7
       0:  10  16  16  16  32  32  32  32
       1:  16  10  16  16  32  32  32  32
       2:  16  16  10  16  32  32  32  32
       3:  16  16  16  10  32  32  32  32
       4:  32  32  32  32  10  16  16  16
       5:  32  32  32  32  16  10  16  16
       6:  32  32  32  32  16  16  10  16
       7:  32  32  32  32  16  16  16  10
    
    where 2 hops is 32.
    
    The result is that the scheduler fails to load balance properly across
    NUMA nodes on different sockets -- 2 hops apart.
    
    For example, pinning 16 busy threads to NUMA nodes 0 (CPUs 0-7) and 4
    (CPUs 32-39) like so,
    
      $ numactl -C 0-7,32-39 ./spinner 16
    
    causes all threads to fork and remain on node 0 until the active
    balancer kicks in after a few seconds and forcibly moves some threads
    to node 4.
    
    Override node_reclaim_distance for AMD Zen.
    
    Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: Suravee.Suthikulpanit@amd.com
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Thomas.Lendacky@amd.com
    Cc: Tony Luck <tony.luck@intel.com>
    Link: https://lkml.kernel.org/r/20190808195301.13222-3-matt@codeblueprint.co.uk
    
    
    Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    a55c7454