- Apr 08, 2021
-
-
Sami Tolvanen authored
This change adds support for Clang’s forward-edge Control Flow Integrity (CFI) checking. With CONFIG_CFI_CLANG, the compiler injects a runtime check before each indirect function call to ensure the target is a valid function with the correct static type. This restricts possible call targets and makes it more difficult for an attacker to exploit bugs that allow the modification of stored function pointers. For more details, see: https://clang.llvm.org/docs/ControlFlowIntegrity.html Clang requires CONFIG_LTO_CLANG to be enabled with CFI to gain visibility to possible call targets. Kernel modules are supported with Clang’s cross-DSO CFI mode, which allows checking between independently compiled components. With CFI enabled, the compiler injects a __cfi_check() function into the kernel and each module for validating local call targets. For cross-module calls that cannot be validated locally, the compiler calls the global __cfi_slowpath_diag() function, which determines the target module and calls the correct __cfi_check() function. This patch includes a slowpath implementation that uses __module_address() to resolve call targets, and with CONFIG_CFI_CLANG_SHADOW enabled, a shadow map that speeds up module look-ups by ~3x. Clang implements indirect call checking using jump tables and offers two methods of generating them. With canonical jump tables, the compiler renames each address-taken function to <function>.cfi and points the original symbol to a jump table entry, which passes __cfi_check() validation. This isn’t compatible with stand-alone assembly code, which the compiler doesn’t instrument, and would result in indirect calls to assembly code to fail. Therefore, we default to using non-canonical jump tables instead, where the compiler generates a local jump table entry <function>.cfi_jt for each address-taken function, and replaces all references to the function with the address of the jump table entry. Note that because non-canonical jump table addresses are local to each component, they break cross-module function address equality. Specifically, the address of a global function will be different in each module, as it's replaced with the address of a local jump table entry. If this address is passed to a different module, it won’t match the address of the same function taken there. This may break code that relies on comparing addresses passed from other components. CFI checking can be disabled in a function with the __nocfi attribute. Additionally, CFI can be disabled for an entire compilation unit by filtering out CC_FLAGS_CFI. By default, CFI failures result in a kernel panic to stop a potential exploit. CONFIG_CFI_PERMISSIVE enables a permissive mode, where the kernel prints out a rate-limited warning instead, and allows execution to continue. This option is helpful for locating type mismatches, but should only be enabled during development. Signed-off-by:
Sami Tolvanen <samitolvanen@google.com> Reviewed-by:
Kees Cook <keescook@chromium.org> Tested-by:
Nathan Chancellor <nathan@kernel.org> Signed-off-by:
Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210408182843.1754385-2-samitolvanen@google.com
-
- Apr 01, 2021
-
-
Steven Rostedt (VMware) authored
Commit cbc3b92c fixed an issue to modify the macros of the stack trace event so that user space could parse it properly. Originally the stack trace format to user space showed that the called stack was a dynamic array. But it is not actually a dynamic array, in the way that other dynamic event arrays worked, and this broke user space parsing for it. The update was to make the array look to have 8 entries in it. Helper functions were added to make it parse it correctly, as the stack was dynamic, but was determined by the size of the event stored. Although this fixed user space on how it read the event, it changed the internal structure used for the stack trace event. It changed the array size from [0] to [8] (added 8 entries). This increased the size of the stack trace event by 8 words. The size reserved on the ring buffer was the size of the stack trace event plus the number of stack entries found in the stack trace. That commit caused the amount to be 8 more than what was needed because it did not expect the caller field to have any size. This produced 8 entries of garbage (and reading random data) from the stack trace event: <idle>-0 [002] d... 1976396.837549: <stack trace> => trace_event_raw_event_sched_switch => __traceiter_sched_switch => __schedule => schedule_idle => do_idle => cpu_startup_entry => secondary_startup_64_no_verify => 0xc8c5e150ffff93de => 0xffff93de => 0 => 0 => 0xc8c5e17800000000 => 0x1f30affff93de => 0x00000004 => 0x200000000 Instead, subtract the size of the caller field from the size of the event to make sure that only the amount needed to store the stack trace is reserved. Link: https://lore.kernel.org/lkml/your-ad-here.call-01617191565-ext-9692@work.hours/ Cc: stable@vger.kernel.org Fixes: cbc3b92c ("tracing: Set kernel_stack's caller size properly") Reported-by:
Vasily Gorbik <gor@linux.ibm.com> Tested-by:
Vasily Gorbik <gor@linux.ibm.com> Acked-by:
Vasily Gorbik <gor@linux.ibm.com> Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org>
-
- Mar 30, 2021
-
-
Steven Rostedt (VMware) authored
It is possible that on error pg->size can be zero when getting its order, which would return a -1 value. It is dangerous to pass in an order of -1 to free_pages(). Check if order is greater than or equal to zero before calling free_pages(). Link: https://lore.kernel.org/lkml/20210330093916.432697c7@gandalf.local.home/ Reported-by:
Abaci Robot <abaci@linux.alibaba.com> Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org>
-
- Mar 27, 2021
-
-
Jens Axboe authored
This reverts commit 4db4b1a0. The IO threads allow and handle SIGSTOP now, so don't special case them anymore in task_set_jobctl_pending(). Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This reverts commit 15b2219f. Before IO threads accepted signals, the freezer using take signals to wake up an IO thread would cause them to loop without any way to clear the pending signal. That is no longer the case, so stop special casing PF_IO_WORKER in the freezer. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This reverts commit 6fb8f43c. The IO threads do allow signals now, including SIGSTOP, and we can allow ptrace attach. Attaching won't reveal anything interesting for the IO threads, but it will allow eg gdb to attach to a task with io_urings and IO threads without complaining. And once attached, it will allow the usual introspection into regular threads. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This reverts commit 5be28c8f. IO threads now take signals just fine, so there's no reason to limit them specifically. Revert the change that prevented that from happening. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This is racy - move the blocking into when the task is created and we're marking it as PF_IO_WORKER anyway. The IO threads are now prepared to handle signals like SIGSTOP as well, so clear that from the mask to allow proper stopping of IO threads. Acked-by:
"Eric W. Biederman" <ebiederm@xmission.com> Reported-by:
Oleg Nesterov <oleg@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 26, 2021
-
-
Jens Axboe authored
Right now we're never calling get_signal() from PF_IO_WORKER threads, but in preparation for doing so, don't handle a fatal signal for them. The workers have state they need to cleanup when exiting, so just return instead of calling do_exit() on their behalf. The threads themselves will detect a fatal signal and do proper shutdown. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 25, 2021
-
-
Nick Desaulniers authored
LLVM changed the expected function signatures for llvm_gcda_start_file() and llvm_gcda_emit_function() in the clang-11 release. Users of clang-11 or newer may have noticed their kernels failing to boot due to a panic when enabling CONFIG_GCOV_KERNEL=y +CONFIG_GCOV_PROFILE_ALL=y. Fix up the function signatures so calling these functions doesn't panic the kernel. Link: https://reviews.llvm.org/rGcdd683b516d147925212724b09ec6fb792a40041 Link: https://reviews.llvm.org/rG13a633b438b6500ecad9e4f936ebadf3411d0f44 Link: https://lkml.kernel.org/r/20210312224132.3413602-2-ndesaulniers@google.com Signed-off-by:
Nick Desaulniers <ndesaulniers@google.com> Reported-by:
Prasad Sodagudi <psodagud@quicinc.com> Suggested-by:
Nathan Chancellor <nathan@kernel.org> Reviewed-by:
Fangrui Song <maskray@google.com> Tested-by:
Nathan Chancellor <nathan@kernel.org> Acked-by:
Peter Oberparleiter <oberpar@linux.ibm.com> Reviewed-by:
Nathan Chancellor <nathan@kernel.org> Cc: <stable@vger.kernel.org> [5.4+] Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Mar 23, 2021
-
-
Lukasz Luba authored
The debugfs directory '/sys/kernel/debug/energy_model' is needed before the Energy Model registration can happen. With the recent change in debugfs subsystem it's not allowed to create this directory at early stage (core_initcall). Thus creating this directory would fail. Postpone the creation of the EM debug dir to later stage: fs_initcall. It should be safe since all clients: CPUFreq drivers, Devfreq drivers will be initialized in later stages. The custom debug log below prints the time of creation the EM debug dir at fs_initcall and successful registration of EMs at later stages. [ 1.505717] energy_model: creating rootdir [ 3.698307] cpu cpu0: EM: created perf domain [ 3.709022] cpu cpu1: EM: created perf domain Fixes: 56348560 ("debugfs: do not attempt to create a new file before the filesystem is initalized") Reported-by:
Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by:
Lukasz Luba <lukasz.luba@arm.com> Reviewed-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- Mar 21, 2021
-
-
Eric W. Biederman authored
Just like we don't allow normal signals to IO threads, don't deliver a STOP to a task that has PF_IO_WORKER set. The IO threads don't take signals in general, and have no means of flushing out a stop either. Longer term, we may want to look into allowing stop of these threads, as it relates to eg process freezing. For now, this prevents a spin issue if a SIGSTOP is delivered to the parent task. Reported-by:
Stefan Metzmacher <metze@samba.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com>
-
Jens Axboe authored
They don't take signals individually, and even if they share signals with the parent task, don't allow them to be delivered through the worker thread. Linux does allow this kind of behavior for regular threads, but it's really a compatability thing that we need not care about for the IO threads. Reported-by:
Stefan Metzmacher <metze@samba.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 20, 2021
-
-
Thomas Gleixner authored
With interrupt force threading all device interrupt handlers are invoked from kernel threads. Contrary to hard interrupt context the invocation only disables bottom halfs, but not interrupts. This was an oversight back then because any code like this will have an issue: thread(irq_A) irq_handler(A) spin_lock(&foo->lock); interrupt(irq_B) irq_handler(B) spin_lock(&foo->lock); This has been triggered with networking (NAPI vs. hrtimers) and console drivers where printk() happens from an interrupt which interrupted the force threaded handler. Now people noticed and started to change the spin_lock() in the handler to spin_lock_irqsave() which affects performance or add IRQF_NOTHREAD to the interrupt request which in turn breaks RT. Fix the root cause and not the symptom and disable interrupts before invoking the force threaded handler which preserves the regular semantics and the usefulness of the interrupt force threading as a general debugging tool. For not RT this is not changing much, except that during the execution of the threaded handler interrupts are delayed until the handler returns. Vs. scheduling and softirq processing there is no difference. For RT kernels there is no issue. Fixes: 8d32a307 ("genirq: Provide forced interrupt threading") Reported-by:
Johan Hovold <johan@kernel.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Johan Hovold <johan@kernel.org> Acked-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20210317143859.513307808@linutronix.de
-
- Mar 19, 2021
-
-
Zqiang authored
The syzbot reported a memleak as follows: BUG: memory leak unreferenced object 0xffff888101b41d00 (size 120): comm "kworker/u4:0", pid 8, jiffies 4294944270 (age 12.780s) backtrace: [<ffffffff8125dc56>] alloc_pid+0x66/0x560 [<ffffffff81226405>] copy_process+0x1465/0x25e0 [<ffffffff81227943>] kernel_clone+0xf3/0x670 [<ffffffff812281a1>] kernel_thread+0x61/0x80 [<ffffffff81253464>] call_usermodehelper_exec_work [<ffffffff81253464>] call_usermodehelper_exec_work+0xc4/0x120 [<ffffffff812591c9>] process_one_work+0x2c9/0x600 [<ffffffff81259ab9>] worker_thread+0x59/0x5d0 [<ffffffff812611c8>] kthread+0x178/0x1b0 [<ffffffff8100227f>] ret_from_fork+0x1f/0x30 unreferenced object 0xffff888110ef5c00 (size 232): comm "kworker/u4:0", pid 8414, jiffies 4294944270 (age 12.780s) backtrace: [<ffffffff8154a0cf>] kmem_cache_zalloc [<ffffffff8154a0cf>] __alloc_file+0x1f/0xf0 [<ffffffff8154a809>] alloc_empty_file+0x69/0x120 [<ffffffff8154a8f3>] alloc_file+0x33/0x1b0 [<ffffffff8154ab22>] alloc_file_pseudo+0xb2/0x140 [<ffffffff81559218>] create_pipe_files+0x138/0x2e0 [<ffffffff8126c793>] umd_setup+0x33/0x220 [<ffffffff81253574>] call_usermodehelper_exec_async+0xb4/0x1b0 [<ffffffff8100227f>] ret_from_fork+0x1f/0x30 After the UMD process exits, the pipe_to_umh/pipe_from_umh and tgid need to be released. Fixes: d71fa5c9 ("bpf: Add kernel module with user mode driver that populates bpffs.") Reported-by:
<syzbot+44908bb56d2bfe56b28e@syzkaller.appspotmail.com> Signed-off-by:
Zqiang <qiang.zhang@windriver.com> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210317030915.2865-1-qiang.zhang@windriver.com
-
Peter Zijlstra authored
Sites that match init_section_contains() get marked as INIT. For built-in code init_sections contains both __init and __exit text. OTOH kernel_text_address() only explicitly includes __init text (and there are no __exit text markers). Match what jump_label already does and ignore the warning for INIT sites. Also see the excellent changelog for commit: 8f35eaa5 ("jump_label: Don't warn on __exit jump entries") Fixes: 9183c3f9 ("static_call: Add inline static call infrastructure") Reported-by:
Sumit Garg <sumit.garg@linaro.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by:
Jarkko Sakkinen <jarkko@kernel.org> Tested-by:
Sumit Garg <sumit.garg@linaro.org> Link: https://lkml.kernel.org/r/20210318113610.739542434@infradead.org
-
Peter Zijlstra authored
The intent is to avoid writing init code after init (because the text might have been freed). The code is needlessly different between jump_label and static_call and not obviously correct. The existing code relies on the fact that the module loader clears the init layout, such that within_module_init() always fails, while jump_label relies on the module state which is more obvious and matches the kernel logic. Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by:
Jarkko Sakkinen <jarkko@kernel.org> Tested-by:
Sumit Garg <sumit.garg@linaro.org> Link: https://lkml.kernel.org/r/20210318113610.636651340@infradead.org
-
Peter Zijlstra authored
It turns out that static_call_set_init() does not preserve the other flags; IOW. it clears TAIL if it was set. Fixes: 9183c3f9 ("static_call: Add inline static call infrastructure") Reported-by:
Sumit Garg <sumit.garg@linaro.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by:
Jarkko Sakkinen <jarkko@kernel.org> Tested-by:
Sumit Garg <sumit.garg@linaro.org> Link: https://lkml.kernel.org/r/20210318113610.519406371@infradead.org
-
- Mar 18, 2021
-
-
Josef Bacik authored
This reverts commit d60cd063. This patch causes a panic when rebooting my Dell Poweredge r440. I do not have the full panic log as it's lost at that stage of the reboot and I do not have a serial console. Reverting this patch makes my system able to reboot again. Signed-off-by:
Josef Bacik <josef@toxicpanda.com> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- Mar 17, 2021
-
-
Alexei Starovoitov authored
The fexit/fmod_ret programs can be attached to kernel functions that can sleep. The synchronize_rcu_tasks() will not wait for such tasks to complete. In such case the trampoline image will be freed and when the task wakes up the return IP will point to freed memory causing the crash. Solve this by adding percpu_ref_get/put for the duration of trampoline and separate trampoline vs its image life times. The "half page" optimization has to be removed, since first_half->second_half->first_half transition cannot be guaranteed to complete in deterministic time. Every trampoline update becomes a new image. The image with fmod_ret or fexit progs will be freed via percpu_ref_kill and call_rcu_tasks. Together they will wait for the original function and trampoline asm to complete. The trampoline is patched from nop to jmp to skip fexit progs. They are freed independently from the trampoline. The image with fentry progs only will be freed via call_rcu_tasks_trace+call_rcu_tasks which will wait for both sleepable and non-sleepable progs to complete. Fixes: fec56f58 ("bpf: Introduce BPF trampoline") Reported-by:
Andrii Nakryiko <andrii@kernel.org> Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Acked-by: Paul E. McKenney <paulmck@kernel.org> # for RCU Link: https://lore.kernel.org/bpf/20210316210007.38949-1-alexei.starovoitov@gmail.com
-
Piotr Krysiuk authored
Given we know the max possible value of ptr_limit at the time of retrieving the latter, add basic assertions, so that the verifier can bail out if anything looks odd and reject the program. Nothing triggered this so far, but it also does not hurt to have these. Signed-off-by:
Piotr Krysiuk <piotras@gmail.com> Co-developed-by:
Daniel Borkmann <daniel@iogearbox.net> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Acked-by:
Alexei Starovoitov <ast@kernel.org>
-
Piotr Krysiuk authored
Instead of having the mov32 with aux->alu_limit - 1 immediate, move this operation to retrieve_ptr_limit() instead to simplify the logic and to allow for subsequent sanity boundary checks inside retrieve_ptr_limit(). This avoids in future that at the time of the verifier masking rewrite we'd run into an underflow which would not sign extend due to the nature of mov32 instruction. Signed-off-by:
Piotr Krysiuk <piotras@gmail.com> Co-developed-by:
Daniel Borkmann <daniel@iogearbox.net> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Acked-by:
Alexei Starovoitov <ast@kernel.org>
-
Piotr Krysiuk authored
retrieve_ptr_limit() computes the ptr_limit for registers with stack and map_value type. ptr_limit is the size of the memory area that is still valid / in-bounds from the point of the current position and direction of the operation (add / sub). This size will later be used for masking the operation such that attempting out-of-bounds access in the speculative domain is redirected to remain within the bounds of the current map value. When masking to the right the size is correct, however, when masking to the left, the size is off-by-one which would lead to an incorrect mask and thus incorrect arithmetic operation in the non-speculative domain. Piotr found that if the resulting alu_limit value is zero, then the BPF_MOV32_IMM() from the fixup_bpf_calls() rewrite will end up loading 0xffffffff into AX instead of sign-extending to the full 64 bit range, and as a result, this allows abuse for executing speculatively out-of- bounds loads against 4GB window of address space and thus extracting the contents of kernel memory via side-channel. Fixes: 979d63d5 ("bpf: prevent out of bounds speculation on pointer arithmetic") Signed-off-by:
Piotr Krysiuk <piotras@gmail.com> Co-developed-by:
Daniel Borkmann <daniel@iogearbox.net> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Acked-by:
Alexei Starovoitov <ast@kernel.org>
-
Piotr Krysiuk authored
The purpose of this patch is to streamline error propagation and in particular to propagate retrieve_ptr_limit() errors for pointer types that are not defining a ptr_limit such that register-based alu ops against these types can be rejected. The main rationale is that a gap has been identified by Piotr in the existing protection against speculatively out-of-bounds loads, for example, in case of ctx pointers, unprivileged programs can still perform pointer arithmetic. This can be abused to execute speculatively out-of-bounds loads without restrictions and thus extract contents of kernel memory. Fix this by rejecting unprivileged programs that attempt any pointer arithmetic on unprotected pointer types. The two affected ones are pointer to ctx as well as pointer to map. Field access to a modified ctx' pointer is rejected at a later point in time in the verifier, and 7c696732 ("bpf: Permit map_ptr arithmetic with opcode add and offset 0") only relevant for root-only use cases. Risk of unprivileged program breakage is considered very low. Fixes: 7c696732 ("bpf: Permit map_ptr arithmetic with opcode add and offset 0") Fixes: b2157399 ("bpf: prevent out-of-bounds speculation") Signed-off-by:
Piotr Krysiuk <piotras@gmail.com> Co-developed-by:
Daniel Borkmann <daniel@iogearbox.net> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Acked-by:
Alexei Starovoitov <ast@kernel.org>
-
Waiman Long authored
The use_ww_ctx flag is passed to mutex_optimistic_spin(), but the function doesn't use it. The frequent use of the (use_ww_ctx && ww_ctx) combination is repetitive. In fact, ww_ctx should not be used at all if !use_ww_ctx. Simplify ww_mutex code by dropping use_ww_ctx from mutex_optimistic_spin() an clear ww_ctx if !use_ww_ctx. In this way, we can replace (use_ww_ctx && ww_ctx) by just (ww_ctx). Signed-off-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Acked-by:
Davidlohr Bueso <dbueso@suse.de> Link: https://lore.kernel.org/r/20210316153119.13802-2-longman@redhat.com
-
- Mar 16, 2021
-
-
Alexei Starovoitov authored
The following sequence of commands: register_ftrace_direct(ip, addr1); modify_ftrace_direct(ip, addr1, addr2); unregister_ftrace_direct(ip, addr2); will cause the kernel to warn: [ 30.179191] WARNING: CPU: 2 PID: 1961 at kernel/trace/ftrace.c:5223 unregister_ftrace_direct+0x130/0x150 [ 30.180556] CPU: 2 PID: 1961 Comm: test_progs W O 5.12.0-rc2-00378-g86bc10a0a711-dirty #3246 [ 30.182453] RIP: 0010:unregister_ftrace_direct+0x130/0x150 When modify_ftrace_direct() changes the addr from old to new it should update the addr stored in ftrace_direct_funcs. Otherwise the final unregister_ftrace_direct() won't find the address and will cause the splat. Fixes: 0567d680 ("ftrace: Add modify_ftrace_direct()") Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Reviewed-by:
Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/bpf/20210316195815.34714-1-alexei.starovoitov@gmail.com
-
Oleg Nesterov authored
Preparation for fixing get_nr_restart_syscall() on X86 for COMPAT. Add a new helper which sets restart_block->fn and calls a dummy arch_set_restart_data() helper. Fixes: 609c19a3 ("x86/ptrace: Stop setting TS_COMPAT in ptrace code") Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210201174641.GA17871@redhat.com
-
Andy Shevchenko authored
Fix typos in kernel doc, otherwise validation script complains: .../irq_sim.c:170: warning: Function parameter or member 'fwnode' not described in 'irq_domain_create_sim' .../irq_sim.c:170: warning: Excess function parameter 'fnode' description in 'irq_domain_create_sim' .../irq_sim.c:240: warning: Function parameter or member 'fwnode' not described in 'devm_irq_domain_create_sim' .../irq_sim.c:240: warning: Excess function parameter 'fnode' description in 'devm_irq_domain_create_sim' Signed-off-by:
Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210302161453.28540-1-andriy.shevchenko@linux.intel.com
-
- Mar 14, 2021
-
-
Alexey Dobriyan authored
Doing a prctl(PR_SET_MM, PR_SET_MM_AUXV, addr, 1); will copy 1 byte from userspace to (quite big) on-stack array and then stash everything to mm->saved_auxv. AT_NULL terminator will be inserted at the very end. /proc/*/auxv handler will find that AT_NULL terminator and copy original stack contents to userspace. This devious scheme requires CAP_SYS_RESOURCE. Signed-off-by:
Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Mar 13, 2021
-
-
Fenghua Yu authored
When a new mm is created, its PASID should be cleared, i.e. the PASID is initialized to its init state 0 on both ARM and X86. This patch was part of the series introducing mm->pasid, but got lost along the way [1]. It still makes sense to have it, because each address space has a different PASID. And the IOMMU code in iommu_sva_alloc_pasid() expects the pasid field of a new mm struct to be cleared. [1] https://lore.kernel.org/linux-iommu/YDgh53AcQHT+T3L0@otcwcpicx3.sc.intel.com/ Link: https://lkml.kernel.org/r/20210302103837.2562625-1-jean-philippe@linaro.org Signed-off-by:
Fenghua Yu <fenghua.yu@intel.com> Signed-off-by:
Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by:
Tony Luck <tony.luck@intel.com> Cc: Jacob Pan <jacob.jun.pan@intel.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Jens Axboe authored
With the freezer using the proper signaling to notify us of when it's time to freeze a thread, we can re-enable normal freezer usage for the IO threads. Ensure that SQPOLL, io-wq, and the io-wq manager call try_to_freeze() appropriately, and remove the default setting of PF_NOFREEZE from create_io_thread(). Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Don't send fake signals to PF_IO_WORKER threads, they don't accept signals. Just treat them like kthreads in this regard, all they need is a wakeup as no forced kernel/user transition is needed. Suggested-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 10, 2021
-
-
Jens Axboe authored
The io-wq threads were already marked as no-freeze, but the manager was not. On resume, we perpetually have signal_pending() being true, and hence the manager will loop and spin 100% of the time. Just mark the tasks created by create_io_thread() as PF_NOFREEZE by default, and remove any knowledge of it in io-wq and io_uring. Reported-by:
Kevin Locke <kevin@kevinlocke.name> Tested-by:
Kevin Locke <kevin@kevinlocke.name> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 08, 2021
-
-
Greg Kroah-Hartman authored
There's no need to keep around a dentry pointer to a simple file that debugfs itself can look up when we need to remove it from the system. So simplify the code by deleting the variable and cleaning up the logic around the debugfs file. Cc: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/YCvYV53ZdzQSWY6w@kroah.com
-
Tal Lossos authored
bpf_fd_inode_storage_lookup_elem() returned NULL when getting a bad FD, which caused -ENOENT in bpf_map_copy_value. -EBADF error is better than -ENOENT for a bad FD behaviour. The patch was partially contributed by CyberArk Software, Inc. Fixes: 8ea63684 ("bpf: Implement bpf_local_storage for inodes") Signed-off-by:
Tal Lossos <tallossos@gmail.com> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Acked-by:
Yonghong Song <yhs@fb.com> Acked-by:
KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/bpf/20210307120948.61414-1-tallossos@gmail.com
-
Alexei Starovoitov authored
The syzbot got FD of vmlinux BTF and passed it into map_create which caused crash in btf_type_id_size() when it tried to access resolved_ids. The vmlinux BTF doesn't have 'resolved_ids' and 'resolved_sizes' initialized to save memory. To avoid such issues disallow using vmlinux BTF in prog_load and map_create commands. Fixes: 53297220 ("bpf: Assign ID to vmlinux BTF and return extra info for BTF in GET_OBJ_INFO") Reported-by:
<syzbot+8bab8ed346746e7540e8@syzkaller.appspotmail.com> Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Acked-by:
Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210307225248.79031-1-alexei.starovoitov@gmail.com
-
Anna-Maria Behnsen authored
hrtimer_force_reprogram() and hrtimer_interrupt() invokes __hrtimer_get_next_event() to find the earliest expiry time of hrtimer bases. __hrtimer_get_next_event() does not update cpu_base::[softirq_]_expires_next to preserve reprogramming logic. That needs to be done at the callsites. hrtimer_force_reprogram() updates cpu_base::softirq_expires_next only when the first expiring timer is a softirq timer and the soft interrupt is not activated. That's wrong because cpu_base::softirq_expires_next is left stale when the first expiring timer of all bases is a timer which expires in hard interrupt context. hrtimer_interrupt() does never update cpu_base::softirq_expires_next which is wrong too. That becomes a problem when clock_settime() sets CLOCK_REALTIME forward and the first soft expiring timer is in the CLOCK_REALTIME_SOFT base. Setting CLOCK_REALTIME forward moves the clock MONOTONIC based expiry time of that timer before the stale cpu_base::softirq_expires_next. cpu_base::softirq_expires_next is cached to make the check for raising the soft interrupt fast. In the above case the soft interrupt won't be raised until clock monotonic reaches the stale cpu_base::softirq_expires_next value. That's incorrect, but what's worse it that if the softirq timer becomes the first expiring timer of all clock bases after the hard expiry timer has been handled the reprogramming of the clockevent from hrtimer_interrupt() will result in an interrupt storm. That happens because the reprogramming does not use cpu_base::softirq_expires_next, it uses __hrtimer_get_next_event() which returns the actual expiry time. Once clock MONOTONIC reaches cpu_base::softirq_expires_next the soft interrupt is raised and the storm subsides. Change the logic in hrtimer_force_reprogram() to evaluate the soft and hard bases seperately, update softirq_expires_next and handle the case when a soft expiring timer is the first of all bases by comparing the expiry times and updating the required cpu base fields. Split this functionality into a separate function to be able to use it in hrtimer_interrupt() as well without copy paste. Fixes: 5da70160 ("hrtimer: Implement support for softirq based hrtimers") Reported-by:
Mikael Beckius <mikael.beckius@windriver.com> Suggested-by:
Thomas Gleixner <tglx@linutronix.de> Tested-by:
Mikael Beckius <mikael.beckius@windriver.com> Signed-off-by:
Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210223160240.27518-1-anna-maria@linutronix.de
-
- Mar 06, 2021
-
-
Kan Liang authored
Sometimes the PMU internal buffers have to be flushed for per-CPU events during a context switch, e.g., large PEBS. Otherwise, the perf tool may report samples in locations that do not belong to the process where the samples are processed in, because PEBS does not tag samples with PID/TID. The current code only flush the buffers for a per-task event. It doesn't check a per-CPU event. Add a new event state flag, PERF_ATTACH_SCHED_CB, to indicate that the PMU internal buffers have to be flushed for this event during a context switch. Add sched_cb_entry and perf_sched_cb_usages back to track the PMU/cpuctx which is required to be flushed. Only need to invoke the sched_task() for per-CPU events in this patch. The per-task events have been handled in perf_event_context_sched_in/out already. Fixes: 9c964efa ("perf/x86/intel: Drain the PEBS buffer during context switches") Reported-by:
Gabriel Marin <gmx@google.com> Originally-by:
Namhyung Kim <namhyung@kernel.org> Signed-off-by:
Kan Liang <kan.liang@linux.intel.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20201130193842.10569-1-kan.liang@linux.intel.com
-
Peter Zijlstra authored
Provided the target address of a R_X86_64_PC32 relocation is aligned, the low two bits should be invariant between the relative and absolute value. Turns out the address is not aligned and things go sideways, ensure we transfer the bits in the absolute form when fixing up the key address. Fixes: 73f44fe1 ("static_call: Allow module use without exposing static_call_key") Reported-by:
Steven Rostedt <rostedt@goodmis.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Tested-by:
Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/20210225220351.GE4746@worktop.programming.kicks-ass.net
-
Mathieu Desnoyers authored
The function sync_runqueues_membarrier_state() should copy the membarrier state from the @mm received as parameter to each runqueue currently running tasks using that mm. However, the use of smp_call_function_many() skips the current runqueue, which is unintended. Replace by a call to on_each_cpu_mask(). Fixes: 227a4aad ("sched/membarrier: Fix p->mm->membarrier_state racy load") Reported-by:
Nadav Amit <nadav.amit@gmail.com> Signed-off-by:
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org # 5.4.x+ Link: https://lore.kernel.org/r/74F1E842-4A84-47BF-B6C2-5407DFDD4A4A@gmail.com
-