1. 09 May, 2007 6 commits
    • Christoph Lameter's avatar
      Move remote node draining out of slab allocators · 4037d452
      Christoph Lameter authored
      
      
      Currently the slab allocators contain callbacks into the page allocator to
      perform the draining of pagesets on remote nodes.  This requires SLUB to have
      a whole subsystem in order to be compatible with SLAB.  Moving node draining
      out of the slab allocators avoids a section of code in SLUB.
      
      Move the node draining so that is is done when the vm statistics are updated.
      At that point we are already touching all the cachelines with the pagesets of
      a processor.
      
      Add a expire counter there.  If we have to update per zone or global vm
      statistics then assume that the pageset will require subsequent draining.
      
      The expire counter will be decremented on each vm stats update pass until it
      reaches zero.  Then we will drain one batch from the pageset.  The draining
      will cause vm counter updates which will then cause another expiration until
      the pcp is empty.  So we will drain a batch every 3 seconds.
      
      Note that remote node draining is a somewhat esoteric feature that is required
      on large NUMA systems because otherwise significant portions of system memory
      can become trapped in pcp queues.  The number of pcp is determined by the
      number of processors and nodes in a system.  A system with 4 processors and 2
      nodes has 8 pcps which is okay.  But a system with 1024 processors and 512
      nodes has 512k pcps with a high potential for large amount of memory being
      caught in them.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4037d452
    • Christoph Lameter's avatar
      vmstat: use our own timer events · d1187ed2
      Christoph Lameter authored
      
      
      vmstat is currently using the cache reaper to periodically bring the
      statistics up to date.  The cache reaper does only exists in SLUB as a way to
      provide compatibility with SLAB.  This patch removes the vmstat calls from the
      slab allocators and provides its own handling.
      
      The advantage is also that we can use a different frequency for the updates.
      Refreshing vm stats is a pretty fast job so we can run this every second and
      stagger this by only one tick.  This will lead to some overlap in large
      systems.  F.e a system running at 250 HZ with 1024 processors will have 4 vm
      updates occurring at once.
      
      However, the vm stats update only accesses per node information.  It is only
      necessary to stagger the vm statistics updates per processor in each node.  Vm
      counter updates occurring on distant nodes will not cause cacheline
      contention.
      
      We could implement an alternate approach that runs the first processor on each
      node at the second and then each of the other processor on a node on a
      subsequent tick.  That may be useful to keep a large amount of the second free
      of timer activity.  Maybe the timer folks will have some feedback on this one?
      
      [jirislaby@gmail.com: add missing break]
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarJiri Slaby <jirislaby@gmail.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d1187ed2
    • Rafael J. Wysocki's avatar
      Add suspend-related notifications for CPU hotplug · 8bb78442
      Rafael J. Wysocki authored
      
      
      Since nonboot CPUs are now disabled after tasks and devices have been
      frozen and the CPU hotplug infrastructure is used for this purpose, we need
      special CPU hotplug notifications that will help the CPU-hotplug-aware
      subsystems distinguish normal CPU hotplug events from CPU hotplug events
      related to a system-wide suspend or resume operation in progress.  This
      patch introduces such notifications and causes them to be used during
      suspend and resume transitions.  It also changes all of the
      CPU-hotplug-aware subsystems to take these notifications into consideration
      (for now they are handled in the same way as the corresponding "normal"
      ones).
      
      [oleg@tv-sign.ru: cleanups]
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8bb78442
    • Christoph Lameter's avatar
      slab: shut down cache_reaper when cpu goes down · 5830c590
      Christoph Lameter authored
      
      
      Shutdown the cache_reaper if the cpu is brought down and set the
      cache_reap.func to NULL.  Otherwise hotplug shuts down the reaper for good.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5830c590
    • Heiko Carstens's avatar
      slab: use CPU_LOCK_[ACQUIRE|RELEASE] · 38c3bd96
      Heiko Carstens authored
      
      
      Looks like this was forgotten when CPU_LOCK_[ACQUIRE|RELEASE] was
      introduced.
      
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Gautham Shenoy <ego@in.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      38c3bd96
    • Pekka J Enberg's avatar
      krealloc: fix kerneldoc comments · 7ae439ce
      Pekka J Enberg authored
      
      
      No "blank" (or "*") line is allowed between the function name and lines for
      it parameter(s).
      
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7ae439ce
  2. 08 May, 2007 2 commits
  3. 07 May, 2007 14 commits
  4. 02 May, 2007 1 commit
  5. 04 Apr, 2007 1 commit
  6. 01 Mar, 2007 1 commit
  7. 21 Feb, 2007 1 commit
  8. 11 Feb, 2007 6 commits
  9. 06 Jan, 2007 1 commit
  10. 22 Dec, 2006 2 commits
  11. 13 Dec, 2006 4 commits
    • Eric Dumazet's avatar
      [PATCH] SLAB: use a multiply instead of a divide in obj_to_index() · 6a2d7a95
      Eric Dumazet authored
      
      
      When some objects are allocated by one CPU but freed by another CPU we can
      consume lot of cycles doing divides in obj_to_index().
      
      (Typical load on a dual processor machine where network interrupts are
      handled by one particular CPU (allocating skbufs), and the other CPU is
      running the application (consuming and freeing skbufs))
      
      Here on one production server (dual-core AMD Opteron 285), I noticed this
      divide took 1.20 % of CPU_CLK_UNHALTED events in kernel.  But Opteron are
      quite modern cpus and the divide is much more expensive on oldest
      architectures :
      
      On a 200 MHz sparcv9 machine, the division takes 64 cycles instead of 1
      cycle for a multiply.
      
      Doing some math, we can use a reciprocal multiplication instead of a divide.
      
      If we want to compute V = (A / B)  (A and B being u32 quantities)
      we can instead use :
      
      V = ((u64)A * RECIPROCAL(B)) >> 32 ;
      
      where RECIPROCAL(B) is precalculated to ((1LL << 32) + (B - 1)) / B
      
      Note :
      
      I wrote pure C code for clarity. gcc output for i386 is not optimal but
      acceptable :
      
      mull   0x14(%ebx)
      mov    %edx,%eax // part of the >> 32
      xor     %edx,%edx // useless
      mov    %eax,(%esp) // could be avoided
      mov    %edx,0x4(%esp) // useless
      mov    (%esp),%ebx
      
      [akpm@osdl.org: small cleanups]
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      6a2d7a95
    • Paul Jackson's avatar
      [PATCH] cpuset: rework cpuset_zone_allowed api · 02a0e53d
      Paul Jackson authored
      
      
      Elaborate the API for calling cpuset_zone_allowed(), so that users have to
      explicitly choose between the two variants:
      
        cpuset_zone_allowed_hardwall()
        cpuset_zone_allowed_softwall()
      
      Until now, whether or not you got the hardwall flavor depended solely on
      whether or not you or'd in the __GFP_HARDWALL gfp flag to the gfp_mask
      argument.
      
      If you didn't specify __GFP_HARDWALL, you implicitly got the softwall
      version.
      
      Unfortunately, this meant that users would end up with the softwall version
      without thinking about it.  Since only the softwall version might sleep,
      this led to bugs with possible sleeping in interrupt context on more than
      one occassion.
      
      The hardwall version requires that the current tasks mems_allowed allows
      the node of the specified zone (or that you're in interrupt or that
      __GFP_THISNODE is set or that you're on a one cpuset system.)
      
      The softwall version, depending on the gfp_mask, might allow a node if it
      was allowed in the nearest enclusing cpuset marked mem_exclusive (which
      requires taking the cpuset lock 'callback_mutex' to evaluate.)
      
      This patch removes the cpuset_zone_allowed() call, and forces the caller to
      explicitly choose between the hardwall and the softwall case.
      
      If the caller wants the gfp_mask to determine this choice, they should (1)
      be sure they can sleep or that __GFP_HARDWALL is set, and (2) invoke the
      cpuset_zone_allowed_softwall() routine.
      
      This adds another 100 or 200 bytes to the kernel text space, due to the few
      lines of nearly duplicate code at the top of both cpuset_zone_allowed_*
      routines.  It should save a few instructions executed for the calls that
      turned into calls of cpuset_zone_allowed_hardwall, thanks to not having to
      set (before the call) then check (within the call) the __GFP_HARDWALL flag.
      
      For the most critical call, from get_page_from_freelist(), the same
      instructions are executed as before -- the old cpuset_zone_allowed()
      routine it used to call is the same code as the
      cpuset_zone_allowed_softwall() routine that it calls now.
      
      Not a perfect win, but seems worth it, to reduce this chance of hitting a
      sleeping with irq off complaint again.
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      02a0e53d
    • Christoph Lameter's avatar
      [PATCH] More slab.h cleanups · 55935a34
      Christoph Lameter authored
      
      
      More cleanups for slab.h
      
      1. Remove tabs from weird locations as suggested by Pekka
      
      2. Drop the check for NUMA and SLAB_DEBUG from the fallback section
         as suggested by Pekka.
      
      3. Uses static inline for the fallback defs as also suggested by Pekka.
      
      4. Make kmem_ptr_valid take a const * argument.
      
      5. Separate the NUMA fallback definitions from the kmalloc_track fallback
         definitions.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      55935a34
    • Christoph Lameter's avatar
      [PATCH] slab: fix sleeping in atomic bug · dd47ea75
      Christoph Lameter authored
      
      
      Fallback_alloc() does not do the check for GFP_WAIT as done in
      cache_grow().  Thus interrupts are disabled when we call kmem_getpages()
      which results in the failure.
      
      Duplicate the handling of GFP_WAIT in cache_grow().
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Jay Cliburn <jacliburn@bellsouth.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      dd47ea75
  12. 10 Dec, 2006 1 commit