Skip to content
Snippets Groups Projects
  • Peter Xu's avatar
    cefdca0a
    userfaultfd/sysctl: add vm.unprivileged_userfaultfd · cefdca0a
    Peter Xu authored
    Userfaultfd can be misued to make it easier to exploit existing
    use-after-free (and similar) bugs that might otherwise only make a
    short window or race condition available.  By using userfaultfd to
    stall a kernel thread, a malicious program can keep some state that it
    wrote, stable for an extended period, which it can then access using an
    existing exploit.  While it doesn't cause the exploit itself, and while
    it's not the only thing that can stall a kernel thread when accessing a
    memory location, it's one of the few that never needs privilege.
    
    We can add a flag, allowing userfaultfd to be restricted, so that in
    general it won't be useable by arbitrary user programs, but in
    environments that require userfaultfd it can be turned back on.
    
    Add a global sysctl knob "vm.unprivileged_userfaultfd" to control
    whether userfaultfd is allowed by unprivileged users.  When this is
    set to zero, only privileged users (root user, or users with the
    CAP_SYS_PTRACE capability) will be able to use the userfaultfd
    syscalls.
    
    Andrea said:
    
    : The only difference between the bpf sysctl and the userfaultfd sysctl
    : this way is that the bpf sysctl adds the CAP_SYS_ADMIN capability
    : requirement, while userfaultfd adds the CAP_SYS_PTRACE requirement,
    : because the userfaultfd monitor is more likely to need CAP_SYS_PTRACE
    : already if it's doing other kind of tracking on processes runtime, in
    : addition of userfaultfd.  In other words both syscalls works only for
    : root, when the two sysctl are opt-in set to 1.
    
    [dgilbert@redhat.com: changelog additions]
    [akpm@linux-foundation.org: documentation tweak, per Mike]
    Link: http://lkml.kernel.org/r/20190319030722.12441-2-peterx@redhat.com
    
    
    Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
    Suggested-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
    Suggested-by: default avatarMike Rapoport <rppt@linux.ibm.com>
    Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
    Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Luis Chamberlain <mcgrof@kernel.org>
    Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
    Cc: Maya Gokhale <gokhale2@llnl.gov>
    Cc: Jerome Glisse <jglisse@redhat.com>
    Cc: Pavel Emelyanov <xemul@virtuozzo.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Martin Cracauer <cracauer@cons.org>
    Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
    Cc: Marty McFadden <mcfadden8@llnl.gov>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
    Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    cefdca0a
    History
    userfaultfd/sysctl: add vm.unprivileged_userfaultfd
    Peter Xu authored
    Userfaultfd can be misued to make it easier to exploit existing
    use-after-free (and similar) bugs that might otherwise only make a
    short window or race condition available.  By using userfaultfd to
    stall a kernel thread, a malicious program can keep some state that it
    wrote, stable for an extended period, which it can then access using an
    existing exploit.  While it doesn't cause the exploit itself, and while
    it's not the only thing that can stall a kernel thread when accessing a
    memory location, it's one of the few that never needs privilege.
    
    We can add a flag, allowing userfaultfd to be restricted, so that in
    general it won't be useable by arbitrary user programs, but in
    environments that require userfaultfd it can be turned back on.
    
    Add a global sysctl knob "vm.unprivileged_userfaultfd" to control
    whether userfaultfd is allowed by unprivileged users.  When this is
    set to zero, only privileged users (root user, or users with the
    CAP_SYS_PTRACE capability) will be able to use the userfaultfd
    syscalls.
    
    Andrea said:
    
    : The only difference between the bpf sysctl and the userfaultfd sysctl
    : this way is that the bpf sysctl adds the CAP_SYS_ADMIN capability
    : requirement, while userfaultfd adds the CAP_SYS_PTRACE requirement,
    : because the userfaultfd monitor is more likely to need CAP_SYS_PTRACE
    : already if it's doing other kind of tracking on processes runtime, in
    : addition of userfaultfd.  In other words both syscalls works only for
    : root, when the two sysctl are opt-in set to 1.
    
    [dgilbert@redhat.com: changelog additions]
    [akpm@linux-foundation.org: documentation tweak, per Mike]
    Link: http://lkml.kernel.org/r/20190319030722.12441-2-peterx@redhat.com
    
    
    Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
    Suggested-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
    Suggested-by: default avatarMike Rapoport <rppt@linux.ibm.com>
    Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
    Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Luis Chamberlain <mcgrof@kernel.org>
    Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
    Cc: Maya Gokhale <gokhale2@llnl.gov>
    Cc: Jerome Glisse <jglisse@redhat.com>
    Cc: Pavel Emelyanov <xemul@virtuozzo.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Martin Cracauer <cracauer@cons.org>
    Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
    Cc: Marty McFadden <mcfadden8@llnl.gov>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
    Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
sysctl.c 78.85 KiB
/*
 * sysctl.c: General linux system control interface
 *
 * Begun 24 March 1995, Stephen Tweedie
 * Added /proc support, Dec 1995
 * Added bdflush entry and intvec min/max checking, 2/23/96, Tom Dyas.
 * Added hooks for /proc/sys/net (minor, minor patch), 96/4/1, Mike Shaver.
 * Added kernel/java-{interpreter,appletviewer}, 96/5/10, Mike Shaver.
 * Dynamic registration fixes, Stephen Tweedie.
 * Added kswapd-interval, ctrl-alt-del, printk stuff, 1/8/97, Chris Horn.
 * Made sysctl support optional via CONFIG_SYSCTL, 1/10/97, Chris
 *  Horn.
 * Added proc_doulongvec_ms_jiffies_minmax, 09/08/99, Carlos H. Bauer.
 * Added proc_doulongvec_minmax, 09/08/99, Carlos H. Bauer.
 * Changed linked lists to use list.h instead of lists.h, 02/24/00, Bill
 *  Wendling.
 * The list_for_each() macro wasn't appropriate for the sysctl loop.
 *  Removed it and replaced it with older style, 03/23/00, Bill Wendling
 */

#include <linux/module.h>
#include <linux/aio.h>
#include <linux/mm.h>
#include <linux/swap.h>
#include <linux/slab.h>
#include <linux/sysctl.h>
#include <linux/bitmap.h>
#include <linux/signal.h>
#include <linux/printk.h>
#include <linux/proc_fs.h>
#include <linux/security.h>
#include <linux/ctype.h>
#include <linux/kmemleak.h>
#include <linux/fs.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/kobject.h>
#include <linux/net.h>
#include <linux/sysrq.h>
#include <linux/highuid.h>
#include <linux/writeback.h>
#include <linux/ratelimit.h>
#include <linux/compaction.h>
#include <linux/hugetlb.h>
#include <linux/initrd.h>
#include <linux/key.h>
#include <linux/times.h>
#include <linux/limits.h>
#include <linux/dcache.h>
#include <linux/dnotify.h>
#include <linux/syscalls.h>
#include <linux/vmstat.h>
#include <linux/nfs_fs.h>
#include <linux/acpi.h>
#include <linux/reboot.h>
#include <linux/ftrace.h>
#include <linux/perf_event.h>
#include <linux/kprobes.h>
#include <linux/pipe_fs_i.h>
#include <linux/oom.h>
#include <linux/kmod.h>
#include <linux/capability.h>
#include <linux/binfmts.h>
#include <linux/sched/sysctl.h>
#include <linux/sched/coredump.h>
#include <linux/kexec.h>
#include <linux/bpf.h>
#include <linux/mount.h>
#include <linux/userfaultfd_k.h>