- Jan 26, 2016
-
-
Thomas Gleixner authored
Sasha reported a lockdep splat about a potential deadlock between RCU boosting rtmutex and the posix timer it_lock. CPU0 CPU1 rtmutex_lock(&rcu->rt_mutex) spin_lock(&rcu->rt_mutex.wait_lock) local_irq_disable() spin_lock(&timer->it_lock) spin_lock(&rcu->mutex.wait_lock) --> Interrupt spin_lock(&timer->it_lock) This is caused by the following code sequence on CPU1 rcu_read_lock() x = lookup(); if (x) spin_lock_irqsave(&x->it_lock); rcu_read_unlock(); return x; We could fix that in the posix timer code by keeping rcu read locked across the spinlocked and irq disabled section, but the above sequence is common and there is no reason not to support it. Taking rt_mutex.wait_lock irq safe prevents the deadlock. Reported-by:
Sasha Levin <sasha.levin@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
-
- Jan 21, 2016
-
-
Jann Horn authored
By checking the effective credentials instead of the real UID / permitted capabilities, ensure that the calling process actually intended to use its credentials. To ensure that all ptrace checks use the correct caller credentials (e.g. in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS flag), use two new flags and require one of them to be set. The problem was that when a privileged task had temporarily dropped its privileges, e.g. by calling setreuid(0, user_uid), with the intent to perform following syscalls with the credentials of a user, it still passed ptrace access checks that the user would not be able to pass. While an attacker should not be able to convince the privileged task to perform a ptrace() syscall, this is a problem because the ptrace access check is reused for things in procfs. In particular, the following somewhat interesting procfs entries only rely on ptrace access checks: /proc/$pid/stat - uses the check for determining whether pointers should be visible, useful for bypassing ASLR /proc/$pid/maps - also useful for bypassing ASLR /proc/$pid/cwd - useful for gaining access to restricted directories that contain files with lax permissions, e.g. in this scenario: lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar drwx------ root root /root drwxr-xr-x root root /root/foobar -rw-r--r-- root root /root/foobar/secret Therefore, on a system where a root-owned mode 6755 binary changes its effective credentials as described and then dumps a user-specified file, this could be used by an attacker to reveal the memory layout of root's processes or reveal the contents of files he is not allowed to access (through /proc/$pid/cwd). [akpm@linux-foundation.org: fix warning] Signed-off-by:
Jann Horn <jann@thejh.net> Acked-by:
Kees Cook <keescook@chromium.org> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Jan 16, 2016
-
-
Dominik Dingel authored
During Jason's work with postcopy migration support for s390 a problem regarding gmap faults was discovered. The gmap code will call fixup_user_fault which will end up always in handle_mm_fault. Till now we never cared about retries, but as the userfaultfd code kind of relies on it. this needs some fix. This patchset does not take care of the futex code. I will now look closer at this. This patch (of 2): With the introduction of userfaultfd, kvm on s390 needs fixup_user_fault to pass in FAULT_FLAG_ALLOW_RETRY and give feedback if during the faulting we ever unlocked mmap_sem. This patch brings in the logic to handle retries as well as it cleans up the current documentation. fixup_user_fault was not having the same semantics as filemap_fault. It never indicated if a retry happened and so a caller wasn't able to handle that case. So we now changed the behaviour to always retry a locked mmap_sem. Signed-off-by:
Dominik Dingel <dingel@linux.vnet.ibm.com> Reviewed-by:
Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: "Jason J. Herne" <jjherne@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric B Munson <emunson@akamai.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Dominik Dingel <dingel@linux.vnet.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
With new THP refcounting, we don't need tricks to stabilize huge page. If we've got reference to tail page, it can't split under us. This patch effectively reverts a5b338f2 ("thp: update futex compound knowledge"). Signed-off-by:
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by:
Sasha Levin <sasha.levin@oracle.com> Tested-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by:
Jerome Marchand <jmarchan@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Tested-by:
Artem Savkov <artem.savkov@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Dec 20, 2015
-
-
Darren Hart authored
While reviewing Michael Kerrisk's recent futex manpage update, I noticed that we allow the FUTEX_CLOCK_REALTIME flag for FUTEX_WAIT_BITSET but not for FUTEX_WAIT. FUTEX_WAIT is treated as a simple version for FUTEX_WAIT_BITSET internally (with a bitmask of FUTEX_BITSET_MATCH_ANY). As such, I cannot come up with a reason for this exclusion for FUTEX_WAIT. This change does modify the behavior of the futex syscall, changing a call with FUTEX_WAIT | FUTEX_CLOCK_REALTIME from returning -ENOSYS, to be equivalent to FUTEX_WAIT_BITSET | FUTEX_CLOCK_REALTIME with a bitset of FUTEX_BITSET_MATCH_ANY. Reported-by:
Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by:
Darren Hart <dvhart@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Link: http://lkml.kernel.org/r/9f3bdc116d79d23f5ee72ceb9a2a857f5ff8fa29.1450474525.git.dvhart@linux.intel.com Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Thomas Gleixner authored
out_unlock: does not only drop the locks, it also drops the refcount on the pi_state. Really intuitive. Move the label after the put_pi_state() call and use 'break' in the error handling path of the requeue loop. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <darren@dvhart.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Bhuvanesh_Surachari@mentor.com Cc: Andy Lowe <Andy_Lowe@mentor.com> Link: http://lkml.kernel.org/r/20151219200607.526665141@linutronix.de Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Thomas Gleixner authored
In the error handling cases we neither have pi_state nor a reference to it. Remove the pointless code. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <darren@dvhart.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Bhuvanesh_Surachari@mentor.com Cc: Andy Lowe <Andy_Lowe@mentor.com> Link: http://lkml.kernel.org/r/20151219200607.432780944@linutronix.de Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Thomas Gleixner authored
Documentation of the pi_state refcounting in the requeue code is non existent. Add it. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <darren@dvhart.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Bhuvanesh_Surachari@mentor.com Cc: Andy Lowe <Andy_Lowe@mentor.com> Link: http://lkml.kernel.org/r/20151219200607.335938312@linutronix.de Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Thomas Gleixner authored
free_pi_state() is confusing as it is in fact only freeing/caching the pi state when the last reference is gone. Rename it to put_pi_state() which reflects better what it is doing. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <darren@dvhart.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Bhuvanesh_Surachari@mentor.com Cc: Andy Lowe <Andy_Lowe@mentor.com> Link: http://lkml.kernel.org/r/20151219200607.259636467@linutronix.de Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Thomas Gleixner authored
If the proxy lock in the requeue loop acquires the rtmutex for a waiter then it acquired also refcount on the pi_state related to the futex, but the waiter side does not drop the reference count. Add the missing free_pi_state() call. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <darren@dvhart.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Bhuvanesh_Surachari@mentor.com Cc: Andy Lowe <Andy_Lowe@mentor.com> Link: http://lkml.kernel.org/r/20151219200607.178132067@linutronix.de Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
-
- Oct 04, 2015
-
-
Viresh Kumar authored
Its a bit odd that debugfs_create_bool() takes 'u32 *' as an argument, when all it needs is a boolean pointer. It would be better to update this API to make it accept 'bool *' instead, as that will make it more consistent and often more convenient. Over that bool takes just a byte. That required updates to all user sites as well, in the same commit updating the API. regmap core was also using debugfs_{read|write}_file_bool(), directly and variable types were updated for that to be bool as well. Signed-off-by:
Viresh Kumar <viresh.kumar@linaro.org> Acked-by:
Mark Brown <broonie@kernel.org> Acked-by:
Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Sep 22, 2015
-
-
Rasmus Villemoes authored
futex_hash() references two global variables: the base pointer futex_queues and the size of the array futex_hashsize. The latter is marked __read_mostly, while the former is not, so they are likely to end up very far from each other. This means that futex_hash() is likely to encounter two cache misses. We could mark futex_queues as __read_mostly as well, but that doesn't guarantee they'll end up next to each other (and even if they do, they may still end up in different cache lines). So put the two variables in a small singleton struct with sufficient alignment and mark that as __read_mostly. Signed-off-by:
Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: kbuild test robot <fengguang.wu@intel.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/1441834601-13633-1-git-send-email-linux@rasmusvillemoes.dk Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
- Jul 20, 2015
-
-
kbuild test robot authored
Signed-off-by:
Fengguang Wu <fengguang.wu@intel.com> Cc: kbuild-all@01.org Cc: tipbuild@zytor.com Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <darren@dvhart.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Brian Silverman <bsilver16384@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Davidlohr Bueso authored
Although futexes are well known for being a royal pita, we really have very little debugging capabilities - except for relying on tglx's eye half the time. By simply making use of the existing fault-injection machinery, we can improve this situation, allowing generating artificial uaddress faults and deadlock scenarios. Of course, when this is disabled in production systems, the overhead for failure checks is practically zero -- so this is very cheap at the same time. Future work would be nice to now enhance trinity to make use of this. There is a special tunable 'ignore-private', which can filter out private futexes. Given the tsk->make_it_fail filter and this option, pi futexes can be narrowed down pretty closely. Signed-off-by:
Davidlohr Bueso <dbueso@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <darren@dvhart.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Link: http://lkml.kernel.org/r/1435645562-975-3-git-send-email-dave@stgolabs.net Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Davidlohr Bueso authored
... serves a bit better to clarify between blocking and non-blocking code paths. Signed-off-by:
Davidlohr Bueso <dbueso@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <darren@dvhart.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Link: http://lkml.kernel.org/r/1435645562-975-2-git-send-email-dave@stgolabs.net Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
- Jun 19, 2015
-
-
Sebastian Andrzej Siewior authored
wake_futex_pi() wakes the task before releasing the hash bucket lock (HB). The first thing the woken up task usually does is to acquire the lock which requires the HB lock. On SMP Systems this leads to blocking on the HB lock which is released by the owner shortly after. This patch rearranges the unlock path by first releasing the HB lock and then waking up the task. [ tglx: Fixed up the rtmutex unlock path ] Originally-from: Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Link: http://lkml.kernel.org/r/20150617083350.GA2433@linutronix.de Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
- May 19, 2015
-
-
Peter Zijlstra authored
Since set_mb() is really about an smp_mb() -- not a IO/DMA barrier like mb() rename it to match the recent smp_load_acquire() and smp_store_release(). Suggested-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- May 08, 2015
-
-
Davidlohr Bueso authored
Given the overall futex architecture, any chance of reducing hb->lock contention is welcome. In this particular case, using wake-queues to enable lockless wakeups addresses very much real world performance concerns, even cases of soft-lockups in cases of large amounts of blocked tasks (which is not hard to find in large boxes, using but just a handful of futex). At the lowest level, this patch can reduce latency of a single thread attempting to acquire hb->lock in highly contended scenarios by a up to 2x. At lower counts of nr_wake there are no regressions, confirming, of course, that the wake_q handling overhead is practically non existent. For instance, while a fair amount of variation, the extended pef-bench wakeup benchmark shows for a 20 core machine the following avg per-thread time to wakeup its share of tasks: nr_thr ms-before ms-after 16 0.0590 0.0215 32 0.0396 0.0220 48 0.0417 0.0182 64 0.0536 0.0236 80 0.0414 0.0097 96 0.0672 0.0152 Naturally, this can cause spurious wakeups. However there is no core code that cannot handle them afaict, and furthermore tglx does have the point that other events can already trigger them anyway. Signed-off-by:
Davidlohr Bueso <dbueso@suse.de> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Mason <clm@fb.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: George Spelvin <linux@horizon.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1430494072-30283-3-git-send-email-dave@stgolabs.net Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Apr 22, 2015
-
-
Thomas Gleixner authored
The check for hrtimer_active() after starting the timer is pointless. If the timer is inactive it has expired already and therefor the task pointer is already NULL. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Acked-by:
Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20150414203502.985825453@linutronix.de Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
- Feb 18, 2015
-
-
Oleg Nesterov authored
attach_to_pi_owner() checks p->mm to prevent attaching to kthreads and this looks doubly wrong: 1. It should actually check PF_KTHREAD, kthread can do use_mm(). 2. If this task is not kthread and it is actually the lock owner we can wrongly return -EPERM instead of -ESRCH or retry-if-EAGAIN. And note that this wrong EPERM is the likely case unless the exiting task is (auto)reaped quickly, we check ->mm before PF_EXITING. Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Darren Hart <darren@dvhart.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mateusz Guzik <mguzik@redhat.com> Link: http://lkml.kernel.org/r/20150202140536.GA26406@redhat.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Feb 13, 2015
-
-
Andy Lutomirski authored
If an attacker can cause a controlled kernel stack overflow, overwriting the restart block is a very juicy exploit target. This is because the restart_block is held in the same memory allocation as the kernel stack. Moving the restart block to struct task_struct prevents this exploit by making the restart_block harder to locate. Note that there are other fields in thread_info that are also easy targets, at least on some architectures. It's also a decent simplification, since the restart code is more or less identical on all architectures. [james.hogan@imgtec.com: metag: align thread_info::supervisor_stack] Signed-off-by:
Andy Lutomirski <luto@amacapital.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: David Miller <davem@davemloft.net> Acked-by:
Richard Weinberger <richard@nod.at> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Steven Miao <realmz6@gmail.com> Cc: Mark Salter <msalter@redhat.com> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Chen Liqin <liqin.linux@gmail.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by:
James Hogan <james.hogan@imgtec.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Jan 19, 2015
-
-
Michael Kerrisk authored
This patch fixes two separate buglets in calls to futex_lock_pi(): * Eliminate unused 'detect' argument * Change unused 'timeout' argument of FUTEX_TRYLOCK_PI to NULL The 'detect' argument of futex_lock_pi() seems never to have been used (when it was included with the initial PI mutex implementation in Linux 2.6.18, all checks against its value were disabled by ANDing against 0 (i.e., if (detect... && 0)), and with commit 778e9a9c, any mention of this argument in futex_lock_pi() went way altogether. Its presence now serves only to confuse readers of the code, by giving the impression that the futex() FUTEX_LOCK_PI operation actually does use the 'val' argument. This patch removes the argument. The futex_lock_pi() call that corresponds to FUTEX_TRYLOCK_PI includes 'timeout' as one of its arguments. This misleads the reader into thinking that the FUTEX_TRYLOCK_PI operation does employ timeouts for some sensible purpose; but it does not. Indeed, it cannot, because the checks at the start of sys_futex() exclude FUTEX_TRYLOCK_PI from the set of operations that do copy_from_user() on the timeout argument. So, in the FUTEX_TRYLOCK_PI futex_lock_pi() call it would be simplest to change 'timeout' to 'NULL'. This patch does that. Signed-off-by:
Michael Kerrisk <mtk.manpages@gmail.com> Reviewed-by:
Darren Hart <darren@dvhart.com> Link: http://lkml.kernel.org/r/54B96646.8010200@gmail.com Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
- Oct 26, 2014
-
-
Brian Silverman authored
free_pi_state and exit_pi_state_list both clean up futex_pi_state's. exit_pi_state_list takes the hb lock first, and most callers of free_pi_state do too. requeue_pi doesn't, which means free_pi_state can free the pi_state out from under exit_pi_state_list. For example: task A | task B exit_pi_state_list | pi_state = | curr->pi_state_list->next | | futex_requeue(requeue_pi=1) | // pi_state is the same as | // the one in task A | free_pi_state(pi_state) | list_del_init(&pi_state->list) | kfree(pi_state) list_del_init(&pi_state->list) | Move the free_pi_state calls in requeue_pi to before it drops the hb locks which it's already holding. [ tglx: Removed a pointless free_pi_state() call and the hb->lock held debugging. The latter comes via a seperate patch ] Signed-off-by:
Brian Silverman <bsilver16384@gmail.com> Cc: austin.linux@gmail.com Cc: darren@dvhart.com Cc: peterz@infradead.org Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1414282837-23092-1-git-send-email-bsilver16384@gmail.com Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Davidlohr Bueso authored
Update our documentation as of fix 76835b0e (futex: Ensure get_futex_key_refs() always implies a barrier). Explicitly state that we don't do key referencing for private futexes. Signed-off-by:
Davidlohr Bueso <dbueso@suse.de> Cc: Matteo Franchin <Matteo.Franchin@arm.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by:
Catalin Marinas <catalin.marinas@arm.com> Link: http://lkml.kernel.org/r/1414121220.817.0.camel@linux-t7sj.site Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
- Oct 18, 2014
-
-
Catalin Marinas authored
Commit b0c29f79 (futexes: Avoid taking the hb->lock if there's nothing to wake up) changes the futex code to avoid taking a lock when there are no waiters. This code has been subsequently fixed in commit 11d4616b (futex: revert back to the explicit waiter counting code). Both the original commit and the fix-up rely on get_futex_key_refs() to always imply a barrier. However, for private futexes, none of the cases in the switch statement of get_futex_key_refs() would be hit and the function completes without a memory barrier as required before checking the "waiters" in futex_wake() -> hb_waiters_pending(). The consequence is a race with a thread waiting on a futex on another CPU, allowing the waker thread to read "waiters == 0" while the waiter thread to have read "futex_val == locked" (in kernel). Without this fix, the problem (user space deadlocks) can be seen with Android bionic's mutex implementation on an arm64 multi-cluster system. Signed-off-by:
Catalin Marinas <catalin.marinas@arm.com> Reported-by:
Matteo Franchin <Matteo.Franchin@arm.com> Fixes: b0c29f79 (futexes: Avoid taking the hb->lock if there's nothing to wake up) Acked-by:
Davidlohr Bueso <dave@stgolabs.net> Tested-by:
Mike Galbraith <umgwanakikbuti@gmail.com> Cc: <stable@vger.kernel.org> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Sep 12, 2014
-
-
Thomas Gleixner authored
futex_wait_requeue_pi() calls futex_wait_setup(). If futex_wait_setup() succeeds it returns with hb->lock held and preemption disabled. Now the sanity check after this does: if (match_futex(&q.key, &key2)) { ret = -EINVAL; goto out_put_keys; } which releases the keys but does not release hb->lock. So we happily return to user space with hb->lock held and therefor preemption disabled. Unlock hb->lock before taking the exit route. Reported-by:
Dave "Trinity" Jones <davej@redhat.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Darren Hart <dvhart@linux.intel.com> Reviewed-by:
Davidlohr Bueso <dave@stgolabs.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409112318500.4178@nanos Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
- Jun 21, 2014
-
-
Thomas Gleixner authored
futex_lock_pi_atomic() is a maze of retry hoops and loops. Reduce it to simple and understandable states: First step is to lookup existing waiters (state) in the kernel. If there is an existing waiter, validate it and attach to it. If there is no existing waiter, check the user space value If the TID encoded in the user space value is 0, take over the futex preserving the owner died bit. If the TID encoded in the user space value is != 0, lookup the owner task, validate it and attach to it. Reduces text size by 128 bytes on x8664. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Kees Cook <kees@outflux.net> Cc: wad@chromium.org Cc: Darren Hart <darren@dvhart.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1406131137020.5170@nanos Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Thomas Gleixner authored
We want to be a bit more clever in futex_lock_pi_atomic() and separate the possible states. Split out the code which attaches the first waiter to the owner into a separate function. No functional change. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Darren Hart <darren@dvhart.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Kees Cook <kees@outflux.net> Cc: wad@chromium.org Link: http://lkml.kernel.org/r/20140611204237.271300614@linutronix.de Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Thomas Gleixner authored
We want to be a bit more clever in futex_lock_pi_atomic() and separate the possible states. Split out the waiter verification into a separate function. No functional change. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Darren Hart <darren@dvhart.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Kees Cook <kees@outflux.net> Cc: wad@chromium.org Link: http://lkml.kernel.org/r/20140611204237.180458410@linutronix.de Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Thomas Gleixner authored
No point in open coding the same function again. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Darren Hart <darren@dvhart.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Kees Cook <kees@outflux.net> Cc: wad@chromium.org Link: http://lkml.kernel.org/r/20140611204237.092947239@linutronix.de Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Thomas Gleixner authored
The kernel tries to atomically unlock the futex without checking whether there is kernel state associated to the futex. So if user space manipulated the user space value, this will leave kernel internal state around associated to the owner task. For robustness sake, lookup first whether there are waiters on the futex. If there are waiters, wake the top priority waiter with all the proper sanity checks applied. If there are no waiters, do the atomic release. We do not have to preserve the waiters bit in this case, because a potentially incoming waiter is blocked on the hb->lock and will acquire the futex atomically. We neither have to preserve the owner died bit. The caller is the owner and it was supposed to cleanup the mess. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <darren@dvhart.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Kees Cook <kees@outflux.net> Cc: wad@chromium.org Link: http://lkml.kernel.org/r/20140611204237.016987332@linutronix.de Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Thomas Gleixner authored
The deadlock logic is only required for futexes. Remove the extra arguments for the public functions and also for the futex specific ones which get always called with deadlock detection enabled. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Steven Rostedt <rostedt@goodmis.org>
-
- Jun 05, 2014
-
-
Thomas Gleixner authored
The current implementation of lookup_pi_state has ambigous handling of the TID value 0 in the user space futex. We can get into the kernel even if the TID value is 0, because either there is a stale waiters bit or the owner died bit is set or we are called from the requeue_pi path or from user space just for fun. The current code avoids an explicit sanity check for pid = 0 in case that kernel internal state (waiters) are found for the user space address. This can lead to state leakage and worse under some circumstances. Handle the cases explicit: Waiter | pi_state | pi->owner | uTID | uODIED | ? [1] NULL | --- | --- | 0 | 0/1 | Valid [2] NULL | --- | --- | >0 | 0/1 | Valid [3] Found | NULL | -- | Any | 0/1 | Invalid [4] Found | Found | NULL | 0 | 1 | Valid [5] Found | Found | NULL | >0 | 1 | Invalid [6] Found | Found | task | 0 | 1 | Valid [7] Found | Found | NULL | Any | 0 | Invalid [8] Found | Found | task | ==taskTID | 0/1 | Valid [9] Found | Found | task | 0 | 0 | Invalid [10] Found | Found | task | !=taskTID | 0/1 | Invalid [1] Indicates that the kernel can acquire the futex atomically. We came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit. [2] Valid, if TID does not belong to a kernel thread. If no matching thread is found then it indicates that the owner TID has died. [3] Invalid. The waiter is queued on a non PI futex [4] Valid state after exit_robust_list(), which sets the user space value to FUTEX_WAITERS | FUTEX_OWNER_DIED. [5] The user space value got manipulated between exit_robust_list() and exit_pi_state_list() [6] Valid state after exit_pi_state_list() which sets the new owner in the pi_state but cannot access the user space value. [7] pi_state->owner can only be NULL when the OWNER_DIED bit is set. [8] Owner and user space value match [9] There is no transient state which sets the user space TID to 0 except exit_robust_list(), but this is indicated by the FUTEX_OWNER_DIED bit. See [4] [10] There is no transient state which leaves owner and user space TID out of sync. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Will Drewry <wad@chromium.org> Cc: Darren Hart <dvhart@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Thomas Gleixner authored
If the owner died bit is set at futex_unlock_pi, we currently do not cleanup the user space futex. So the owner TID of the current owner (the unlocker) persists. That's observable inconsistant state, especially when the ownership of the pi state got transferred. Clean it up unconditionally. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Will Drewry <wad@chromium.org> Cc: Darren Hart <dvhart@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Thomas Gleixner authored
We need to protect the atomic acquisition in the kernel against rogue user space which sets the user space futex to 0, so the kernel side acquisition succeeds while there is existing state in the kernel associated to the real owner. Verify whether the futex has waiters associated with kernel state. If it has, return -EINVAL. The state is corrupted already, so no point in cleaning it up. Subsequent calls will fail as well. Not our problem. [ tglx: Use futex_top_waiter() and explain why we do not need to try restoring the already corrupted user space state. ] Signed-off-by:
Darren Hart <dvhart@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Will Drewry <wad@chromium.org> Cc: stable@vger.kernel.org Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Thomas Gleixner authored
futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in futex_requeue(..., requeue_pi=1) If uaddr == uaddr2, then we have broken the rule of only requeueing from a non-pi futex to a pi futex with this call. If we attempt this, then dangling pointers may be left for rt_waiter resulting in an exploitable condition. This change brings futex_requeue() in line with futex_wait_requeue_pi() which performs the same check as per commit 6f7b0a2a ("futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()") [ tglx: Compare the resulting keys as well, as uaddrs might be different depending on the mapping ] Fixes CVE-2014-3153. Reported-by: Pinkie Pie Signed-off-by:
Will Drewry <wad@chromium.org> Signed-off-by:
Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Darren Hart <dvhart@linux.intel.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- May 19, 2014
-
-
Thomas Gleixner authored
We happily allow userspace to declare a random kernel thread to be the owner of a user space PI futex. Found while analysing the fallout of Dave Jones syscall fuzzer. We also should validate the thread group for private futexes and find some fast way to validate whether the "alleged" owner has RW access on the file which backs the SHM, but that's a separate issue. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Dave Jones <davej@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <darren@dvhart.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Clark Williams <williams@redhat.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: Carlos ODonell <carlos@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20140512201701.194824402@linutronix.de Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
-
Thomas Gleixner authored
Dave Jones trinity syscall fuzzer exposed an issue in the deadlock detection code of rtmutex: http://lkml.kernel.org/r/20140429151655.GA14277@redhat.com That underlying issue has been fixed with a patch to the rtmutex code, but the futex code must not call into rtmutex in that case because - it can detect that issue early - it avoids a different and more complex fixup for backing out If the user space variable got manipulated to 0x80000000 which means no lock holder, but the waiters bit set and an active pi_state in the kernel is found we can figure out the recursive locking issue by looking at the pi_state owner. If that is the current task, then we can safely return -EDEADLK. The check should have been added in commit 59fa6245 (futex: Handle futex_pi OWNER_DIED take over correctly) already, but I did not see the above issue caused by user space manipulation back then. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Dave Jones <davej@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <darren@dvhart.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Clark Williams <williams@redhat.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: Carlos ODonell <carlos@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20140512201701.097349971@linutronix.de Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
-
- Apr 18, 2014
-
-
Peter Zijlstra authored
Mostly scripted conversion of the smp_mb__* barriers. Signed-off-by:
Peter Zijlstra <peterz@infradead.org> Acked-by:
Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Apr 13, 2014
-
-
Davidlohr Bueso authored
Commits 11d4616b ("futex: revert back to the explicit waiter counting code") and 69cd9eba ("futex: avoid race between requeue and wake") changed some of the finer details of how we think about futexes. One was a late fix and the other a consequence of overlooking the whole requeuing logic. The first change caused our documentation to be incorrect, and the second made us aware that we need to explicitly add more details to it. Signed-off-by:
Davidlohr Bueso <davidlohr@hp.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-