- Aug 06, 2011
-
-
Linus Torvalds authored
The CLOEXE bit is magical, and for performance (and semantic) reasons we don't actually maintain it in the file descriptor itself, but in a separate bit array. Which means that when we show f_flags, the CLOEXE status is shown incorrectly: we show the status not as it is now, but as it was when the file was opened. Fix that by looking up the bit properly in the 'fdt->close_on_exec' bit array. Uli needs this in order to re-implement the pfiles program: "For normal file descriptors (not sockets) this was the last piece of information which wasn't available. This is all part of my 'give Solaris users no reason to not switch' effort. I intend to offer the code to the util-linux-ng maintainers." Requested-by:
Ulrich Drepper <drepper@akkadia.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
WARN_ONCE() is very annoying, in that it shows the stack trace that we don't care about at all, and also triggers various user-level "kernel oopsed" logic that we really don't care about. And it's not like the user can do anything about the applications (sshd) in question, it's a distro issue. Requested-by: Andi Kleen <andi@firstfloor.org> (and many others) Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Jul 26, 2011
-
-
Vasiliy Kulikov authored
If an inode's mode permits opening /proc/PID/io and the resulting file descriptor is kept across execve() of a setuid or similar binary, the ptrace_may_access() check tries to prevent using this fd against the task with escalated privileges. Unfortunately, there is a race in the check against execve(). If execve() is processed after the ptrace check, but before the actual io information gathering, io statistics will be gathered from the privileged process. At least in theory this might lead to gathering sensible information (like ssh/ftp password length) that wouldn't be available otherwise. Holding task->signal->cred_guard_mutex while gathering the io information should protect against the race. The order of locking is similar to the one inside of ptrace_attach(): first goes cred_guard_mutex, then lock_task_sighand(). Signed-off-by:
Vasiliy Kulikov <segoon@openwall.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
David Rientjes authored
/proc/pid/oom_adj is deprecated and scheduled for removal in August 2012 according to Documentation/feature-removal-schedule.txt. This patch makes the warning more verbose by making it appear as a more serious problem (the presence of a stack trace and being multiline should attract more attention) so that applications still using the old interface can get fixed. Very popular users of the old interface have been converted since the oom killer rewrite has been introduced. udevd switched to the /proc/pid/oom_score_adj interface for v162, kde switched in 4.6.1, and opensshd switched in 5.7p1. At the start of 2012, this should be changed into a WARN() to emit all such incidents and then finally remove the tunable in August 2012 as scheduled. Signed-off-by:
David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Jul 21, 2011
-
-
Kay Sievers authored
Moving the event counter into the dynamically allocated 'struc seq_file' allows poll() support without the need to allocate its own tracking structure. All current users are switched over to use the new counter. Requested-by:
Andrew Morton <akpm@linux-foundation.org> Acked-by:
NeilBrown <neilb@suse.de> Tested-by:
Lucas De Marchi <lucas.demarchi@profusion.mobi> Signed-off-by:
Kay Sievers <kay.sievers@vrfy.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Jul 20, 2011
-
-
Al Viro authored
not used by the instances anymore. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
redundant; all callers get it duplicated in mask & MAY_NOT_BLOCK and none of them removes that bit. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
its value depends only on inode and does not change; we might as well store it in ->i_op->check_acl and be done with that. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Jun 28, 2011
-
-
Vasiliy Kulikov authored
/proc/PID/io may be used for gathering private information. E.g. for openssh and vsftpd daemons wchars/rchars may be used to learn the precise password length. Restrict it to processes being able to ptrace the target process. ptrace_may_access() is needed to prevent keeping open file descriptor of "io" file, executing setuid binary and gathering io information of the setuid'ed process. Signed-off-by:
Vasiliy Kulikov <segoon@openwall.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Jun 22, 2011
-
-
Tejun Heo authored
tracehook.h is on the way out. Rename tracehook_tracer_task() to ptrace_parent() and move it from tracehook.h to ptrace.h. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: John Johansen <john.johansen@canonical.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by:
Oleg Nesterov <oleg@redhat.com>
-
- Jun 20, 2011
-
-
Al Viro authored
nothing blocking except generic_permission() Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- May 27, 2011
-
-
Chris Metcalf authored
This change introduces a few of the less controversial /proc and /proc/sys interfaces for tile, along with sysfs attributes for various things that were originally proposed as /proc/tile files. It also adjusts the "hardwall" proc API. Arnd Bergmann reviewed the initial arch/tile submission, which included a complete set of all the /proc/tile and /proc/sys/tile knobs that we had added in a somewhat ad hoc way during initial development, and provided feedback on where most of them should go. One knob turned out to be similar enough to the existing /proc/sys/debug/exception-trace that it was re-implemented to use that model instead. Another knob was /proc/tile/grid, which reported the "grid" dimensions of a tile chip (e.g. 8x8 processors = 64-core chip). Arnd suggested looking at sysfs for that, so this change moves that information to a pair of sysfs attributes (chip_width and chip_height) in the /sys/devices/system/cpu directory. We also put the "chip_serial" and "chip_revision" information from our old /proc/tile/board file as attributes in /sys/devices/system/cpu. Other information collected via hypervisor APIs is now placed in /sys/hypervisor. We create a /sys/hypervisor/type file (holding the constant string "tilera") to be parallel with the Xen use of /sys/hypervisor/type holding "xen". We create three top-level files, "version" (the hypervisor's own version), "config_version" (the version of the configuration file), and "hvconfig" (the contents of the configuration file). The remaining information from our old /proc/tile/board and /proc/tile/switch files becomes an attribute group appearing under /sys/hypervisor/board/. Finally, after some feedback from Arnd Bergmann for the previous version of this patch, the /proc/tile/hardwall file is split up into two conceptual parts. First, a directory /proc/tile/hardwall/ which contains one file per active hardwall, each file named after the hardwall's ID and holding a cpulist that says which cpus are enclosed by the hardwall. Second, a /proc/PID file "hardwall" that is either empty (for non-hardwall-using processes) or contains the hardwall ID. Finally, this change pushes the /proc/sys/tile/unaligned_fixup/ directory, with knobs controlling the kernel code for handling the fixup of unaligned exceptions. Reviewed-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Chris Metcalf <cmetcalf@tilera.com>
-
KOSAKI Motohiro authored
It whould be better if put check_mem_permission after __get_free_page in mem_write, to be same as function mem_read. Hugh Dickins explained the reason. check_mem_permission gets a reference to the mm. If we __get_free_page after check_mem_permission, imagine what happens if the system is out of memory, and the mm we're looking at is selected for killing by the OOM killer: while we wait in __get_free_page for more memory, no memory is freed from the selected mm because it cannot reach exit_mmap while we hold that reference. Reported-by:
Jovi Zhang <bookjovi@gmail.com> Signed-off-by:
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by:
Hugh Dickins <hughd@google.com> Reviewed-by:
Stephen Wilson <wilsons@start.ca> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Alexey Dobriyan authored
Convert fs/proc/ from strict_strto*() to kstrto*() functions. Signed-off-by:
Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Jiri Slaby authored
Setup and cleanup of mm_struct->exe_file is currently done in fs/proc/. This was because exe_file was needed only for /proc/<pid>/exe. Since we will need the exe_file functionality also for core dumps (so core name can contain full binary path), built this functionality always into the kernel. To achieve that move that out of proc FS to the kernel/ where in fact it should belong. By doing that we can make dup_mm_exe_file static. Also we can drop linux/proc_fs.h inclusion in fs/exec.c and kernel/fork.c. Signed-off-by:
Jiri Slaby <jslaby@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- May 10, 2011
-
-
Eric W. Biederman authored
Create files under /proc/<pid>/ns/ to allow controlling the namespaces of a process. This addresses three specific problems that can make namespaces hard to work with. - Namespaces require a dedicated process to pin them in memory. - It is not possible to use a namespace unless you are the child of the original creator. - Namespaces don't have names that userspace can use to talk about them. The namespace files under /proc/<pid>/ns/ can be opened and the file descriptor can be used to talk about a specific namespace, and to keep the specified namespace alive. A namespace can be kept alive by either holding the file descriptor open or bind mounting the file someplace else. aka: mount --bind /proc/self/ns/net /some/filesystem/path mount --bind /proc/self/fd/<N> /some/filesystem/path This allows namespaces to be named with userspace policy. It requires additional support to make use of these filedescriptors and that will be comming in the following patches. Acked-by:
Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by:
Eric W. Biederman <ebiederm@xmission.com>
-
- Apr 18, 2011
-
-
Linus Torvalds authored
Rather than pass in some random truncated offset to the pid-related functions, check that the offset is in range up-front. This is just cleanup, the previous commit fixed the real problem. Cc: stable@kernel.org Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Mar 31, 2011
-
-
Lucas De Marchi authored
Fixes generated by 'codespell' and manually reviewed. Signed-off-by:
Lucas De Marchi <lucas.demarchi@profusion.mobi>
-
- Mar 24, 2011
-
-
Jovi Zhang authored
[root@wei 1]# cat /proc/1/mem cat: /proc/1/mem: No such process error code -ESRCH is wrong in this situation. Return -EPERM instead. Signed-off-by:
Jovi Zhang <bookjovi@gmail.com> Reviewed-by:
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Konstantin Khlebnikov authored
This file is readable for the task owner. Hide kernel addresses from unprivileged users, leave them function names and offsets. Signed-off-by:
Konstantin Khlebnikov <khlebnikov@openvz.org> Acked-by:
Kees Cook <kees.cook@canonical.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Mar 23, 2011
-
-
Al Viro authored
All of those are rw-r--r-- and all are broken for suid - if you open a file before the target does suid-root exec, you'll be still able to access it. For personality it's not a big deal, but for syscall and stack it's a real problem. Fix: check that task is tracable for you at the time of read(). Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Stephen Wilson authored
With recent changes there is no longer a security hazard with writing to /proc/pid/mem. Remove the #ifdef. Signed-off-by:
Stephen Wilson <wilsons@start.ca> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Stephen Wilson authored
This change allows us to take advantage of access_remote_vm(), which in turn eliminates a security issue with the mem_write() implementation. The previous implementation of mem_write() was insecure since the target task could exec a setuid-root binary between the permission check and the actual write. Holding a reference to the target mm_struct eliminates this vulnerability. Signed-off-by:
Stephen Wilson <wilsons@start.ca> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Stephen Wilson authored
Avoid a potential race when task exec's and we get a new ->mm but check against the old credentials in ptrace_may_access(). Holding of the mutex is implemented by factoring out the body of the code into a helper function __check_mem_permission(). Performing this factorization now simplifies upcoming changes and minimizes churn in the diff's. Signed-off-by:
Stephen Wilson <wilsons@start.ca> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Stephen Wilson authored
This change makes mem_write() observe the same constraints as mem_read(). This is particularly important for mem_write as an accidental leak of the fd across an exec could result in arbitrary modification of the target process' memory. IOW, /proc/pid/mem is implicitly close-on-exec. Signed-off-by:
Stephen Wilson <wilsons@start.ca> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
same as for environ, except that we didn't do any checks to prevent access after suid execve Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Switch to mm_for_maps(). Maybe we ought to make it r--r--r--, since we do checks on IO anyway... Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
just use mm_for_maps() Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Mar 10, 2011
-
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Jan 14, 2011
-
-
Mandeep Singh Baines authored
We'd like to be able to oom_score_adj a process up/down as it enters/leaves the foreground. Currently, it is not possible to oom_adj down without CAP_SYS_RESOURCE. This patch allows a task to decrease its oom_score_adj back to the value that a CAP_SYS_RESOURCE thread set it to or its inherited value at fork. Assuming the thread that has forked it has oom_score_adj of 0, each process could decrease it back from 0 upon activation unless a CAP_SYS_RESOURCE thread elevated it to something higher. Alternative considered: * a setuid binary * a daemon with CAP_SYS_RESOURCE Since you don't wan't all processes to be able to reduce their oom_adj, a setuid or daemon implementation would be complex. The alternatives also have much higher overhead. This patch updated from original patch based on feedback from David Rientjes. Signed-off-by:
Mandeep Singh Baines <msb@chromium.org> Acked-by:
David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ying Han <yinghan@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Jan 13, 2011
-
-
Jovi Zhang authored
single_open()'s third argument is for copying into seq_file->private. Use that, rather than open-coding it. Signed-off-by:
Jovi Zhang <bookjovi@gmail.com> Acked-by:
David Rientjes <rientjes@google.com> Acked-by:
Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by:
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Alexey Dobriyan authored
For string without format specifiers, use seq_puts(). For seq_printf("\n"), use seq_putc('\n'). text data bss dec hex filename 61866 488 112 62466 f402 fs/proc/proc.o 61729 488 112 62329 f379 fs/proc/proc.o ---------------------------------------------------- -139 Signed-off-by:
Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Joe Perches authored
Use temporary lr for struct latency_record for improved readability and fewer columns used. Removed trailing space from output. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by:
Joe Perches <joe@perches.com> Cc: Jiri Kosina <trivial@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Jan 07, 2011
-
-
Nick Piggin authored
Signed-off-by:
Nick Piggin <npiggin@kernel.dk>
-
Nick Piggin authored
Require filesystems be aware of .d_revalidate being called in rcu-walk mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning -ECHILD from all implementations. Signed-off-by:
Nick Piggin <npiggin@kernel.dk>
-
Nick Piggin authored
Reduce some branches and memory accesses in dcache lookup by adding dentry flags to indicate common d_ops are set, rather than having to check them. This saves a pointer memory access (dentry->d_op) in common path lookup situations, and saves another pointer load and branch in cases where we have d_op but not the particular operation. Patched with: git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i Signed-off-by:
Nick Piggin <npiggin@kernel.dk>
-
Nick Piggin authored
Change d_delete from a dentry deletion notification to a dentry caching advise, more like ->drop_inode. Require it to be constant and idempotent, and not take d_lock. This is how all existing filesystems use the callback anyway. This makes fine grained dentry locking of dput and dentry lru scanning much simpler. Signed-off-by:
Nick Piggin <npiggin@kernel.dk>
-
- Dec 06, 2010
-
-
Eric W. Biederman authored
Because it caused a chroot ttyname regression in 2.6.36. As of 2.6.36 ttyname does not work in a chroot. It has already been reported that screen breaks, and for me this breaks an automated distribution testsuite, that I need to preserve the ability to run the existing binaries on for several more years. glibc 2.11.3 which has a fix for this is not an option. The root cause of this breakage is: commit 8df9d1a4 Author: Miklos Szeredi <mszeredi@suse.cz> Date: Tue Aug 10 11:41:41 2010 +0200 vfs: show unreachable paths in getcwd and proc Prepend "(unreachable)" to path strings if the path is not reachable from the current root. Two places updated are - the return string from getcwd() - and symlinks under /proc/$PID. Other uses of d_path() are left unchanged (we know that some old software crashes if /proc/mounts is changed). Signed-off-by:
Miklos Szeredi <mszeredi@suse.cz> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> So remove the nice sounding, but ultimately ill advised change to how /proc/fd symlinks work. Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-