• Christoph Lameter's avatar
    [PATCH] zoned vm counters: create vmstat.c/.h from page_alloc.c/.h · f6ac2354
    Christoph Lameter authored
    NOTE: ZVC are *not* the lightweight event counters.  ZVCs are reliable whereas
    event counters do not need to be.
    Zone based VM statistics are necessary to be able to determine what the state
    of memory in one zone is.  In a NUMA system this can be helpful for local
    reclaim and other memory optimizations that may be able to shift VM load in
    order to get more balanced memory use.
    It is also useful to know how the computing load affects the memory
    allocations on various zones.  This patchset allows the retrieval of that data
    from userspace.
    The patchset introduces a framework for counters that is a cross between the
    existing page_stats --which are simply global counters split per cpu-- and the
    approach of deferred incremental updates implemented for nr_pagecache.
    Small per cpu 8 bit counters are added to struct zone.  If the counter exceeds
    certain thresholds then the counters are accumulated in an array of
    atomic_long in the zone and in a global array that sums up all zone values.
    The small 8 bit counters are next to the per cpu page pointers and so they
    will be in high in the cpu cache when pages are allocated and freed.
    Access to VM counter information for a zone and for the whole machine is then
    possible by simply indexing an array (Thanks to Nick Piggin for pointing out
    that approach).  The access to the total number of pages of various types does
    no longer require the summing up of all per cpu counters.
    Benefits of this patchset right now:
    - Ability for UP and SMP configuration to determine how memory
      is balanced between the DMA, NORMAL and HIGHMEM zones.
    - loops over all processors are avoided in writeback and
      reclaim paths. We can avoid caching the writeback information
      because the needed information is directly accessible.
    - Special handling for nr_pagecache removed.
    - zone_reclaim_interval vanishes since VM stats can now determine
      when it is worth to do local reclaim.
    - Fast inline per node page state determination.
    - Accurate counters in /sys/devices/system/node/node*/meminfo. Current
      counters are counting simply which processor allocated a page somewhere
      and guestimate based on that. So the counters were not useful to show
      the actual distribution of page use on a specific zone.
    - The swap_prefetch patch requires per node statistics in order to
      figure out when processors of a node can prefetch. This patch provides
      some of the needed numbers.
    - Detailed VM counters available in more /proc and /sys status files.
    References to earlier discussions:
    V1 http://marc.theaimsgroup.com/?l=linux-kernel&m=113511649910826&w=2
    V2 http://marc.theaimsgroup.com/?l=linux-kernel&m=114980851924230&w=2
    V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115014697910351&w=2
    V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767318740&w=2
    Performance tests with AIM7 did not show any regressions.  Seems to be a tad
    faster even.  Tested on ia64/NUMA.  Builds fine on i386, SMP / UP.  Includes
    fixes for s390/arm/uml arch code.
    This patch:
    Move counter code from page_alloc.c/page-flags.h to vmstat.c/h.
    Create vmstat.c/vmstat.h by separating the counter code and the proc
    Move the vm_stat_text array before zoneinfo_show.
    [akpm@osdl.org: s390 build fix]
    [akpm@osdl.org: HOTPLUG_CPU build fix]
    Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
    Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
    Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
page_alloc.c 62.9 KB