Skip to content
  • Linus Torvalds's avatar
    Merge branch 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 6aee4bad
    Linus Torvalds authored
    Pull openat2 support from Al Viro:
     "This is the openat2() series from Aleksa Sarai.
    
      I'm afraid that the rest of namei stuff will have to wait - it got
      zero review the last time I'd posted #work.namei, and there had been a
      leak in the posted series I'd caught only last weekend. I was going to
      repost it on Monday, but the window opened and the odds of getting any
      review during that... Oh, well.
    
      Anyway, openat2 part should be ready; that _did_ get sane amount of
      review and public testing, so here it comes"
    
    From Aleksa's description of the series:
     "For a very long time, extending openat(2) with new features has been
      incredibly frustrating. This stems from the fact that openat(2) is
      possibly the most famous counter-example to the mantra "don't silently
      accept garbage from userspace" -- it doesn't check whether unknown
      flags are present[1].
    
      This means that (generally) the addition of new flags to openat(2) has
      been fraught with backwards-compatibility issues (O_TMPFILE has to be
      defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
      kernels gave errors, since it's insecure to silently ignore the
      flag[2]). All new security-related flags therefore have a tough road
      to being added to openat(2).
    
      Furthermore, the need for some sort of control over VFS's path
      resolution (to avoid malicious paths resulting in inadvertent
      breakouts) has been a very long-standing desire of many userspace
      applications.
    
      This patchset is a revival of Al Viro's old AT_NO_JUMPS[3] patchset
      (which was a variant of David Drysdale's O_BENEATH patchset[4] which
      was a spin-off of the Capsicum project[5]) with a few additions and
      changes made based on the previous discussion within [6] as well as
      others I felt were useful.
    
      In line with the conclusions of the original discussion of
      AT_NO_JUMPS, the flag has been split up into separate flags. However,
      instead of being an openat(2) flag it is provided through a new
      syscall openat2(2) which provides several other improvements to the
      openat(2) interface (see the patch description for more details). The
      following new LOOKUP_* flags are added:
    
      LOOKUP_NO_XDEV:
    
         Blocks all mountpoint crossings (upwards, downwards, or through
         absolute links). Absolute pathnames alone in openat(2) do not
         trigger this. Magic-link traversal which implies a vfsmount jump is
         also blocked (though magic-link jumps on the same vfsmount are
         permitted).
    
      LOOKUP_NO_MAGICLINKS:
    
         Blocks resolution through /proc/$pid/fd-style links. This is done
         by blocking the usage of nd_jump_link() during resolution in a
         filesystem. The term "magic-links" is used to match with the only
         reference to these links in Documentation/, but I'm happy to change
         the name.
    
         It should be noted that this is different to the scope of
         ~LOOKUP_FOLLOW in that it applies to all path components. However,
         you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it
         will *not* fail (assuming that no parent component was a
         magic-link), and you will have an fd for the magic-link.
    
         In order to correctly detect magic-links, the introduction of a new
         LOOKUP_MAGICLINK_JUMPED state flag was required.
    
      LOOKUP_BENEATH:
    
         Disallows escapes to outside the starting dirfd's
         tree, using techniques such as ".." or absolute links. Absolute
         paths in openat(2) are also disallowed.
    
         Conceptually this flag is to ensure you "stay below" a certain
         point in the filesystem tree -- but this requires some additional
         to protect against various races that would allow escape using
         "..".
    
         Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it
         can trivially beam you around the filesystem (breaking the
         protection). In future, there might be similar safety checks done
         as in LOOKUP_IN_ROOT, but that requires more discussion.
    
      In addition, two new flags are added that expand on the above ideas:
    
      LOOKUP_NO_SYMLINKS:
    
         Does what it says on the tin. No symlink resolution is allowed at
         all, including magic-links. Just as with LOOKUP_NO_MAGICLINKS this
         can still be used with NOFOLLOW to open an fd for the symlink as
         long as no parent path had a symlink component.
    
      LOOKUP_IN_ROOT:
    
         This is an extension of LOOKUP_BENEATH that, rather than blocking
         attempts to move past the root, forces all such movements to be
         scoped to the starting point. This provides chroot(2)-like
         protection but without the cost of a chroot(2) for each filesystem
         operation, as well as being safe against race attacks that
         chroot(2) is not.
    
         If a race is detected (as with LOOKUP_BENEATH) then an error is
         generated, and similar to LOOKUP_BENEATH it is not permitted to
         cross magic-links with LOOKUP_IN_ROOT.
    
         The primary need for this is from container runtimes, which
         currently need to do symlink scoping in userspace[7] when opening
         paths in a potentially malicious container.
    
         There is a long list of CVEs that could have bene mitigated by
         having RESOLVE_THIS_ROOT (such as CVE-2017-1002101,
         CVE-2017-1002102, CVE-2018-15664, and CVE-2019-5736, just to name a
         few).
    
      In order to make all of the above more usable, I'm working on
      libpathrs[8] which is a C-friendly library for safe path resolution.
      It features a userspace-emulated backend if the kernel doesn't support
      openat2(2). Hopefully we can get userspace to switch to using it, and
      thus get openat2(2) support for free once it's ready.
    
      Future work would include implementing things like
      RESOLVE_NO_AUTOMOUNT and possibly a RESOLVE_NO_REMOTE (to allow
      programs to be sure they don't hit DoSes though stale NFS handles)"
    
    * 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
      Documentation: path-lookup: include new LOOKUP flags
      selftests: add openat2(2) selftests
      open: introduce openat2(2) syscall
      namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution
      namei: LOOKUP_IN_ROOT: chroot-like scoped resolution
      namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution
      namei: LOOKUP_NO_XDEV: block mountpoint crossing
      namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution
      namei: LOOKUP_NO_SYMLINKS: block symlink resolution
      namei: allow set_root() to produce errors
      namei: allow nd_jump_link() to produce errors
      nsfs: clean-up ns_get_path() signature to return int
      namei: only return -ECHILD from follow_dotdot_rcu()
    6aee4bad