Skip to content
Snippets Groups Projects
  1. Aug 06, 2011
    • Linus Torvalds's avatar
      vfs: show O_CLOEXE bit properly in /proc/<pid>/fdinfo/<fd> files · 1117f72e
      Linus Torvalds authored
      
      The CLOEXE bit is magical, and for performance (and semantic) reasons we
      don't actually maintain it in the file descriptor itself, but in a
      separate bit array.  Which means that when we show f_flags, the CLOEXE
      status is shown incorrectly: we show the status not as it is now, but as
      it was when the file was opened.
      
      Fix that by looking up the bit properly in the 'fdt->close_on_exec' bit
      array.
      
      Uli needs this in order to re-implement the pfiles program:
      
        "For normal file descriptors (not sockets) this was the last piece of
         information which wasn't available.  This is all part of my 'give
         Solaris users no reason to not switch' effort.  I intend to offer the
         code to the util-linux-ng maintainers."
      
      Requested-by: default avatarUlrich Drepper <drepper@akkadia.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1117f72e
    • Linus Torvalds's avatar
      oom_ajd: don't use WARN_ONCE, just use printk_once · c2142704
      Linus Torvalds authored
      
      WARN_ONCE() is very annoying, in that it shows the stack trace that we
      don't care about at all, and also triggers various user-level "kernel
      oopsed" logic that we really don't care about.  And it's not like the
      user can do anything about the applications (sshd) in question, it's a
      distro issue.
      
      Requested-by: Andi Kleen <andi@firstfloor.org> (and many others)
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c2142704
  2. Jul 26, 2011
    • Vasiliy Kulikov's avatar
      proc: fix a race in do_io_accounting() · 293eb1e7
      Vasiliy Kulikov authored
      
      If an inode's mode permits opening /proc/PID/io and the resulting file
      descriptor is kept across execve() of a setuid or similar binary, the
      ptrace_may_access() check tries to prevent using this fd against the
      task with escalated privileges.
      
      Unfortunately, there is a race in the check against execve().  If
      execve() is processed after the ptrace check, but before the actual io
      information gathering, io statistics will be gathered from the
      privileged process.  At least in theory this might lead to gathering
      sensible information (like ssh/ftp password length) that wouldn't be
      available otherwise.
      
      Holding task->signal->cred_guard_mutex while gathering the io
      information should protect against the race.
      
      The order of locking is similar to the one inside of ptrace_attach():
      first goes cred_guard_mutex, then lock_task_sighand().
      
      Signed-off-by: default avatarVasiliy Kulikov <segoon@openwall.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      293eb1e7
    • David Rientjes's avatar
      oom: make deprecated use of oom_adj more verbose · be8f684d
      David Rientjes authored
      
      /proc/pid/oom_adj is deprecated and scheduled for removal in August 2012
      according to Documentation/feature-removal-schedule.txt.
      
      This patch makes the warning more verbose by making it appear as a more
      serious problem (the presence of a stack trace and being multiline should
      attract more attention) so that applications still using the old interface
      can get fixed.
      
      Very popular users of the old interface have been converted since the oom
      killer rewrite has been introduced.  udevd switched to the
      /proc/pid/oom_score_adj interface for v162, kde switched in 4.6.1, and
      opensshd switched in 5.7p1.
      
      At the start of 2012, this should be changed into a WARN() to emit all
      such incidents and then finally remove the tunable in August 2012 as
      scheduled.
      
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      be8f684d
  3. Jul 21, 2011
  4. Jul 20, 2011
  5. Jun 28, 2011
  6. Jun 22, 2011
  7. Jun 20, 2011
  8. May 27, 2011
    • Chris Metcalf's avatar
      arch/tile: more /proc and /sys file support · f133ecca
      Chris Metcalf authored
      
      This change introduces a few of the less controversial /proc and
      /proc/sys interfaces for tile, along with sysfs attributes for
      various things that were originally proposed as /proc/tile files.
      It also adjusts the "hardwall" proc API.
      
      Arnd Bergmann reviewed the initial arch/tile submission, which
      included a complete set of all the /proc/tile and /proc/sys/tile
      knobs that we had added in a somewhat ad hoc way during initial
      development, and provided feedback on where most of them should go.
      
      One knob turned out to be similar enough to the existing
      /proc/sys/debug/exception-trace that it was re-implemented to use
      that model instead.
      
      Another knob was /proc/tile/grid, which reported the "grid" dimensions
      of a tile chip (e.g. 8x8 processors = 64-core chip).  Arnd suggested
      looking at sysfs for that, so this change moves that information
      to a pair of sysfs attributes (chip_width and chip_height) in the
      /sys/devices/system/cpu directory.  We also put the "chip_serial"
      and "chip_revision" information from our old /proc/tile/board file
      as attributes in /sys/devices/system/cpu.
      
      Other information collected via hypervisor APIs is now placed in
      /sys/hypervisor.  We create a /sys/hypervisor/type file (holding the
      constant string "tilera") to be parallel with the Xen use of
      /sys/hypervisor/type holding "xen".  We create three top-level files,
      "version" (the hypervisor's own version), "config_version" (the
      version of the configuration file), and "hvconfig" (the contents of
      the configuration file).  The remaining information from our old
      /proc/tile/board and /proc/tile/switch files becomes an attribute
      group appearing under /sys/hypervisor/board/.
      
      Finally, after some feedback from Arnd Bergmann for the previous
      version of this patch, the /proc/tile/hardwall file is split up into
      two conceptual parts.  First, a directory /proc/tile/hardwall/ which
      contains one file per active hardwall, each file named after the
      hardwall's ID and holding a cpulist that says which cpus are enclosed by
      the hardwall.  Second, a /proc/PID file "hardwall" that is either
      empty (for non-hardwall-using processes) or contains the hardwall ID.
      
      Finally, this change pushes the /proc/sys/tile/unaligned_fixup/
      directory, with knobs controlling the kernel code for handling the
      fixup of unaligned exceptions.
      
      Reviewed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarChris Metcalf <cmetcalf@tilera.com>
      f133ecca
    • KOSAKI Motohiro's avatar
      proc: put check_mem_permission after __get_free_page in mem_write · 30cd8903
      KOSAKI Motohiro authored
      
      It whould be better if put check_mem_permission after __get_free_page in
      mem_write, to be same as function mem_read.
      
      Hugh Dickins explained the reason.
      
          check_mem_permission gets a reference to the mm.  If we __get_free_page
          after check_mem_permission, imagine what happens if the system is out
          of memory, and the mm we're looking at is selected for killing by the
          OOM killer: while we wait in __get_free_page for more memory, no memory
          is freed from the selected mm because it cannot reach exit_mmap while
          we hold that reference.
      
      Reported-by: default avatarJovi Zhang <bookjovi@gmail.com>
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarStephen Wilson <wilsons@start.ca>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      30cd8903
    • Alexey Dobriyan's avatar
      fs/proc: convert to kstrtoX() · 0a8cb8e3
      Alexey Dobriyan authored
      
      Convert fs/proc/ from strict_strto*() to kstrto*() functions.
      
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0a8cb8e3
    • Jiri Slaby's avatar
      mm: extract exe_file handling from procfs · 38646013
      Jiri Slaby authored
      
      Setup and cleanup of mm_struct->exe_file is currently done in fs/proc/.
      This was because exe_file was needed only for /proc/<pid>/exe.  Since we
      will need the exe_file functionality also for core dumps (so core name can
      contain full binary path), built this functionality always into the
      kernel.
      
      To achieve that move that out of proc FS to the kernel/ where in fact it
      should belong.  By doing that we can make dup_mm_exe_file static.  Also we
      can drop linux/proc_fs.h inclusion in fs/exec.c and kernel/fork.c.
      
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      38646013
  9. May 10, 2011
    • Eric W. Biederman's avatar
      ns: proc files for namespace naming policy. · 6b4e306a
      Eric W. Biederman authored
      
      Create files under /proc/<pid>/ns/ to allow controlling the
      namespaces of a process.
      
      This addresses three specific problems that can make namespaces hard to
      work with.
      - Namespaces require a dedicated process to pin them in memory.
      - It is not possible to use a namespace unless you are the child
        of the original creator.
      - Namespaces don't have names that userspace can use to talk about
        them.
      
      The namespace files under /proc/<pid>/ns/ can be opened and the
      file descriptor can be used to talk about a specific namespace, and
      to keep the specified namespace alive.
      
      A namespace can be kept alive by either holding the file descriptor
      open or bind mounting the file someplace else.  aka:
      mount --bind /proc/self/ns/net /some/filesystem/path
      mount --bind /proc/self/fd/<N> /some/filesystem/path
      
      This allows namespaces to be named with userspace policy.
      
      It requires additional support to make use of these filedescriptors
      and that will be comming in the following patches.
      
      Acked-by: default avatarDaniel Lezcano <daniel.lezcano@free.fr>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      6b4e306a
  10. Apr 18, 2011
  11. Mar 31, 2011
  12. Mar 24, 2011
  13. Mar 23, 2011
  14. Mar 10, 2011
  15. Jan 14, 2011
    • Mandeep Singh Baines's avatar
      oom: allow a non-CAP_SYS_RESOURCE proces to oom_score_adj down · dabb16f6
      Mandeep Singh Baines authored
      
      We'd like to be able to oom_score_adj a process up/down as it
      enters/leaves the foreground.  Currently, it is not possible to oom_adj
      down without CAP_SYS_RESOURCE.  This patch allows a task to decrease its
      oom_score_adj back to the value that a CAP_SYS_RESOURCE thread set it to
      or its inherited value at fork.  Assuming the thread that has forked it
      has oom_score_adj of 0, each process could decrease it back from 0 upon
      activation unless a CAP_SYS_RESOURCE thread elevated it to something
      higher.
      
      Alternative considered:
      
      * a setuid binary
      * a daemon with CAP_SYS_RESOURCE
      
      Since you don't wan't all processes to be able to reduce their oom_adj, a
      setuid or daemon implementation would be complex.  The alternatives also
      have much higher overhead.
      
      This patch updated from original patch based on feedback from David
      Rientjes.
      
      Signed-off-by: default avatarMandeep Singh Baines <msb@chromium.org>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dabb16f6
  16. Jan 13, 2011
  17. Jan 07, 2011
    • Nick Piggin's avatar
      fs: provide rcu-walk aware permission i_ops · b74c79e9
      Nick Piggin authored
      
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      b74c79e9
    • Nick Piggin's avatar
      fs: rcu-walk aware d_revalidate method · 34286d66
      Nick Piggin authored
      
      Require filesystems be aware of .d_revalidate being called in rcu-walk
      mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning
      -ECHILD from all implementations.
      
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      34286d66
    • Nick Piggin's avatar
      fs: dcache reduce branches in lookup path · fb045adb
      Nick Piggin authored
      
      Reduce some branches and memory accesses in dcache lookup by adding dentry
      flags to indicate common d_ops are set, rather than having to check them.
      This saves a pointer memory access (dentry->d_op) in common path lookup
      situations, and saves another pointer load and branch in cases where we
      have d_op but not the particular operation.
      
      Patched with:
      
      git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i
      
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      fb045adb
    • Nick Piggin's avatar
      fs: change d_delete semantics · fe15ce44
      Nick Piggin authored
      
      Change d_delete from a dentry deletion notification to a dentry caching
      advise, more like ->drop_inode. Require it to be constant and idempotent,
      and not take d_lock. This is how all existing filesystems use the callback
      anyway.
      
      This makes fine grained dentry locking of dput and dentry lru scanning
      much simpler.
      
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      fe15ce44
  18. Dec 06, 2010
    • Eric W. Biederman's avatar
      Revert "vfs: show unreachable paths in getcwd and proc" · 7b2a69ba
      Eric W. Biederman authored
      
      Because it caused a chroot ttyname regression in 2.6.36.
      
      As of 2.6.36 ttyname does not work in a chroot.  It has already been
      reported that screen breaks, and for me this breaks an automated
      distribution testsuite, that I need to preserve the ability to run the
      existing binaries on for several more years.  glibc 2.11.3 which has a
      fix for this is not an option.
      
      The root cause of this breakage is:
      
          commit 8df9d1a4
          Author: Miklos Szeredi <mszeredi@suse.cz>
          Date:   Tue Aug 10 11:41:41 2010 +0200
      
          vfs: show unreachable paths in getcwd and proc
      
          Prepend "(unreachable)" to path strings if the path is not reachable
          from the current root.
      
          Two places updated are
           - the return string from getcwd()
           - and symlinks under /proc/$PID.
      
          Other uses of d_path() are left unchanged (we know that some old
          software crashes if /proc/mounts is changed).
      
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      
      So remove the nice sounding, but ultimately ill advised change to how
      /proc/fd symlinks work.
      
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7b2a69ba
Loading