Skip to content
Snippets Groups Projects
Select Git revision
  • d1c3fb1f8f29c41b0d098d7cfb3c32939043631f
  • master default
  • android-container
  • nanopc-t4
  • for-kernelci
  • WIP-syscall
  • v4.16-rc5
  • v4.16-rc4
  • v4.16-rc3
  • v4.16-rc2
  • v4.16-rc1
  • v4.15
  • v4.15-rc9
  • v4.15-rc8
  • v4.15-rc7
  • v4.15-rc6
  • v4.15-rc5
  • v4.15-rc4
  • v4.15-rc3
  • v4.15-rc2
  • v4.15-rc1
  • v4.14
  • v4.14-rc8
  • v4.14-rc7
  • v4.14-rc6
  • v4.14-rc5
26 results

sysctl.c

Blame
    • Nishanth Aravamudan's avatar
      d1c3fb1f
      hugetlb: introduce nr_overcommit_hugepages sysctl · d1c3fb1f
      Nishanth Aravamudan authored
      
      hugetlb: introduce nr_overcommit_hugepages sysctl
      
      While examining the code to support /proc/sys/vm/hugetlb_dynamic_pool, I
      became convinced that having a boolean sysctl was insufficient:
      
      1) To support per-node control of hugepages, I have previously submitted
      patches to add a sysfs attribute related to nr_hugepages. However, with
      a boolean global value and per-mount quota enforcement constraining the
      dynamic pool, adding corresponding control of the dynamic pool on a
      per-node basis seems inconsistent to me.
      
      2) Administration of the hugetlb dynamic pool with multiple hugetlbfs
      mount points is, arguably, more arduous than it needs to be. Each quota
      would need to be set separately, and the sum would need to be monitored.
      
      To ease the administration, and to help make the way for per-node
      control of the static & dynamic hugepage pool, I added a separate
      sysctl, nr_overcommit_hugepages. This value serves as a high watermark
      for the overall hugepage pool, while nr_hugepages serves as a low
      watermark. The boolean sysctl can then be removed, as the condition
      
      	nr_overcommit_hugepages > 0
      
      indicates the same administrative setting as
      
      	hugetlb_dynamic_pool == 1
      
      Quotas still serve as local enforcement of the size of the pool on a
      per-mount basis.
      
      A few caveats:
      
      1) There is a race whereby the global surplus huge page counter is
      incremented before a hugepage has allocated. Another process could then
      try grow the pool, and fail to convert a surplus huge page to a normal
      huge page and instead allocate a fresh huge page. I believe this is
      benign, as no memory is leaked (the actual pages are still tracked
      correctly) and the counters won't go out of sync.
      
      2) Shrinking the static pool while a surplus is in effect will allow the
      number of surplus huge pages to exceed the overcommit value. As long as
      this condition holds, however, no more surplus huge pages will be
      allowed on the system until one of the two sysctls are increased
      sufficiently, or the surplus huge pages go out of use and are freed.
      
      Successfully tested on x86_64 with the current libhugetlbfs snapshot,
      modified to use the new sysctl.
      
      Signed-off-by: default avatarNishanth Aravamudan <nacc@us.ibm.com>
      Acked-by: default avatarAdam Litke <agl@us.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d1c3fb1f
      History
      hugetlb: introduce nr_overcommit_hugepages sysctl
      Nishanth Aravamudan authored
      
      hugetlb: introduce nr_overcommit_hugepages sysctl
      
      While examining the code to support /proc/sys/vm/hugetlb_dynamic_pool, I
      became convinced that having a boolean sysctl was insufficient:
      
      1) To support per-node control of hugepages, I have previously submitted
      patches to add a sysfs attribute related to nr_hugepages. However, with
      a boolean global value and per-mount quota enforcement constraining the
      dynamic pool, adding corresponding control of the dynamic pool on a
      per-node basis seems inconsistent to me.
      
      2) Administration of the hugetlb dynamic pool with multiple hugetlbfs
      mount points is, arguably, more arduous than it needs to be. Each quota
      would need to be set separately, and the sum would need to be monitored.
      
      To ease the administration, and to help make the way for per-node
      control of the static & dynamic hugepage pool, I added a separate
      sysctl, nr_overcommit_hugepages. This value serves as a high watermark
      for the overall hugepage pool, while nr_hugepages serves as a low
      watermark. The boolean sysctl can then be removed, as the condition
      
      	nr_overcommit_hugepages > 0
      
      indicates the same administrative setting as
      
      	hugetlb_dynamic_pool == 1
      
      Quotas still serve as local enforcement of the size of the pool on a
      per-mount basis.
      
      A few caveats:
      
      1) There is a race whereby the global surplus huge page counter is
      incremented before a hugepage has allocated. Another process could then
      try grow the pool, and fail to convert a surplus huge page to a normal
      huge page and instead allocate a fresh huge page. I believe this is
      benign, as no memory is leaked (the actual pages are still tracked
      correctly) and the counters won't go out of sync.
      
      2) Shrinking the static pool while a surplus is in effect will allow the
      number of surplus huge pages to exceed the overcommit value. As long as
      this condition holds, however, no more surplus huge pages will be
      allowed on the system until one of the two sysctls are increased
      sufficiently, or the surplus huge pages go out of use and are freed.
      
      Successfully tested on x86_64 with the current libhugetlbfs snapshot,
      modified to use the new sysctl.
      
      Signed-off-by: default avatarNishanth Aravamudan <nacc@us.ibm.com>
      Acked-by: default avatarAdam Litke <agl@us.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    sysctl.c 62.57 KiB
    /*
     * sysctl.c: General linux system control interface
     *
     * Begun 24 March 1995, Stephen Tweedie
     * Added /proc support, Dec 1995
     * Added bdflush entry and intvec min/max checking, 2/23/96, Tom Dyas.
     * Added hooks for /proc/sys/net (minor, minor patch), 96/4/1, Mike Shaver.
     * Added kernel/java-{interpreter,appletviewer}, 96/5/10, Mike Shaver.
     * Dynamic registration fixes, Stephen Tweedie.
     * Added kswapd-interval, ctrl-alt-del, printk stuff, 1/8/97, Chris Horn.
     * Made sysctl support optional via CONFIG_SYSCTL, 1/10/97, Chris
     *  Horn.
     * Added proc_doulongvec_ms_jiffies_minmax, 09/08/99, Carlos H. Bauer.
     * Added proc_doulongvec_minmax, 09/08/99, Carlos H. Bauer.
     * Changed linked lists to use list.h instead of lists.h, 02/24/00, Bill
     *  Wendling.
     * The list_for_each() macro wasn't appropriate for the sysctl loop.
     *  Removed it and replaced it with older style, 03/23/00, Bill Wendling
     */
    
    #include <linux/module.h>
    #include <linux/mm.h>
    #include <linux/swap.h>
    #include <linux/slab.h>
    #include <linux/sysctl.h>
    #include <linux/proc_fs.h>
    #include <linux/security.h>
    #include <linux/ctype.h>
    #include <linux/utsname.h>
    #include <linux/smp_lock.h>
    #include <linux/fs.h>
    #include <linux/init.h>
    #include <linux/kernel.h>
    #include <linux/kobject.h>
    #include <linux/net.h>
    #include <linux/sysrq.h>
    #include <linux/highuid.h>
    #include <linux/writeback.h>
    #include <linux/hugetlb.h>
    #include <linux/security.h>
    #include <linux/initrd.h>
    #include <linux/times.h>
    #include <linux/limits.h>
    #include <linux/dcache.h>
    #include <linux/syscalls.h>
    #include <linux/nfs_fs.h>
    #include <linux/acpi.h>
    #include <linux/reboot.h>
    
    #include <asm/uaccess.h>
    #include <asm/processor.h>
    
    #ifdef CONFIG_X86
    #include <asm/nmi.h>
    #include <asm/stacktrace.h>
    #endif
    
    static int deprecated_sysctl_warning(struct __sysctl_args *args);
    
    #if defined(CONFIG_SYSCTL)
    
    /* External variables not in a header file. */
    extern int C_A_D;
    extern int print_fatal_signals;
    extern int sysctl_overcommit_memory;
    extern int sysctl_overcommit_ratio;
    extern int sysctl_panic_on_oom;
    extern int sysctl_oom_kill_allocating_task;
    extern int max_threads;
    extern int core_uses_pid;