- Dec 22, 2020
-
-
Andrey Konovalov authored
This is a preparatory commit for the upcoming addition of a new hardware tag-based (MTE-based) KASAN mode. Hardware tag-based KASAN won't use kasan_depth. Only define and use it when one of the software KASAN modes are enabled. No functional changes for software modes. Link: https://lkml.kernel.org/r/e16f15aeda90bc7fb4dfc2e243a14b74cc5c8219.1606161801.git.andreyknvl@google.com Signed-off-by:
Andrey Konovalov <andreyknvl@google.com> Signed-off-by:
Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by:
Catalin Marinas <catalin.marinas@arm.com> Reviewed-by:
Alexander Potapenko <glider@google.com> Tested-by:
Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Marco Elver <elver@google.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Dec 15, 2020
-
-
Vlastimil Babka authored
Patch series "cleanup page poisoning", v3. I have identified a number of issues and opportunities for cleanup with CONFIG_PAGE_POISON and friends: - interaction with init_on_alloc and init_on_free parameters depends on the order of parameters (Patch 1) - the boot time enabling uses static key, but inefficienty (Patch 2) - sanity checking is incompatible with hibernation (Patch 3) - CONFIG_PAGE_POISONING_NO_SANITY can be removed now that we have init_on_free (Patch 4) - CONFIG_PAGE_POISONING_ZERO can be most likely removed now that we have init_on_free (Patch 5) This patch (of 5): Enabling page_poison=1 together with init_on_alloc=1 or init_on_free=1 produces a warning in dmesg that page_poison takes precedence. However, as these warnings are printed in early_param handlers for init_on_alloc/free, they are not printed if page_poison is enabled later on the command line (handlers are called in the order of their parameters), or when init_on_alloc/free is always enabled by the respective config option - before the page_poison early param handler is called, it is not considered to be enabled. This is inconsistent. We can remove the dependency on order by making the init_on_* parameters only set a boolean variable, and postponing the evaluation after all early params have been processed. Introduce a new init_mem_debugging_and_hardening() function for that, and move the related debug_pagealloc processing there as well. As a result init_mem_debugging_and_hardening() knows always accurately if init_on_* and/or page_poison options were enabled. Thus we can also optimize want_init_on_alloc() and want_init_on_free(). We don't need to check page_poisoning_enabled() there, we can instead not enable the init_on_* static keys at all, if page poisoning is enabled. This results in a simpler and more effective code. Link: https://lkml.kernel.org/r/20201113104033.22907-1-vbabka@suse.cz Link: https://lkml.kernel.org/r/20201113104033.22907-2-vbabka@suse.cz Signed-off-by:
Vlastimil Babka <vbabka@suse.cz> Reviewed-by:
David Hildenbrand <david@redhat.com> Reviewed-by:
Mike Rapoport <rppt@linux.ibm.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mateusz Nosek <mateusznosek0@gmail.com> Cc: Laura Abbott <labbott@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Lin Feng authored
In the booting phase if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set, we have following callchain: start_kernel ... mm_init mem_init memblock_free_all reset_all_zones_managed_pages free_low_memory_core_early ... buffer_init nr_free_buffer_pages zone->managed_pages ... rest_init kernel_init kernel_init_freeable page_alloc_init_late kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid); wait_for_completion(&pgdat_init_all_done_comp); ... files_maxfiles_init It's clear that buffer_init depends on zone->managed_pages, but it's reset in reset_all_zones_managed_pages after that pages are readded into zone->managed_pages, but when buffer_init runs this process is half done and most of them will finally be added till deferred_init_memmap done. In large memory couting of nr_free_buffer_pages drifts too much, also drifting from kernels to kernels on same hardware. Fix is simple, it delays buffer_init run till deferred_init_memmap all done. But as corrected by this patch, max_buffer_heads becomes very large, the value is roughly as many as 4 times of totalram_pages, formula: max_buffer_heads = nrpages * (10%) * (PAGE_SIZE / sizeof(struct buffer_head)); Say in a 64GB memory box we have 16777216 pages, then max_buffer_heads turns out to be roughly 67,108,864. In common cases, should a buffer_head be mapped to one page/block(4KB)? So max_buffer_heads never exceeds totalram_pages. IMO it's likely to make buffer_heads_over_limit bool value alwasy false, then make codes 'if (buffer_heads_over_limit)' test in vmscan unnecessary. So this patch will change the original behavior related to buffer_heads_over_limit in vmscan since we used a half done value of zone->managed_pages before, or should we use a smaller factor(<10%) in previous formula. akpm: I think this is OK - the max_buffer_heads code is only needed on highmem machines, to prevent ZONE_NORMAL from being consumed by large amounts of buffer_heads attached to highmem pagecache. This problem will not occur on 64-bit machines, so this feature's non-functionality on such machines is a feature, not a bug. Link: https://lkml.kernel.org/r/20201123110500.103523-1-linf@wangsu.com Signed-off-by:
Lin Feng <linf@wangsu.com> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Zhenhua Huang authored
Page owner of pages used by page owner itself used is missing on arm32 targets. The reason is dummy_handle and failure_handle is not initialized correctly. Buddy allocator is used to initialize these two handles. However, buddy allocator is not ready when page owner calls it. This change fixed that by initializing page owner after buddy initialization. The working flow before and after this change are: original logic: 1. allocated memory for page_ext(using memblock). 2. invoke the init callback of page_ext_ops like page_owner(using buddy allocator). 3. initialize buddy. after this change: 1. allocated memory for page_ext(using memblock). 2. initialize buddy. 3. invoke the init callback of page_ext_ops like page_owner(using buddy allocator). with the change, failure/dummy_handle can get its correct value and page owner output for example has the one for page owner itself: Page allocated via order 2, mask 0x6202c0(GFP_USER|__GFP_NOWARN), pid 1006, ts 67278156558 ns PFN 543776 type Unmovable Block 531 type Unmovable Flags 0x0() init_page_owner+0x28/0x2f8 invoke_init_callbacks_flatmem+0x24/0x34 start_kernel+0x33c/0x5d8 Link: https://lkml.kernel.org/r/1603104925-5888-1-git-send-email-zhenhuah@codeaurora.org Signed-off-by:
Zhenhua Huang <zhenhuah@codeaurora.org> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Dec 11, 2020
-
-
Arnd Bergmann authored
There is only one function in init/initramfs.c that is in the .text section, and it is marked __weak. When building with clang-12 and the integrated assembler, this leads to a bug with recordmcount: ./scripts/recordmcount "init/initramfs.o" Cannot find symbol for section 2: .text. init/initramfs.o: failed I'm not quite sure what exactly goes wrong, but I notice that this function is only ever called from an __init function, and normally inlined. Marking it __init as well is clearly correct and it leads to recordmcount no longer complaining. Link: https://lkml.kernel.org/r/20201204165742.3815221-1-arnd@kernel.org Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Barret Rhoden <brho@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Dec 10, 2020
-
-
Eric W. Biederman authored
Recently syzbot reported[0] that there is a deadlock amongst the users of exec_update_mutex. The problematic lock ordering found by lockdep was: perf_event_open (exec_update_mutex -> ovl_i_mutex) chown (ovl_i_mutex -> sb_writes) sendfile (sb_writes -> p->lock) by reading from a proc file and writing to overlayfs proc_pid_syscall (p->lock -> exec_update_mutex) While looking at possible solutions it occured to me that all of the users and possible users involved only wanted to state of the given process to remain the same. They are all readers. The only writer is exec. There is no reason for readers to block on each other. So fix this deadlock by transforming exec_update_mutex into a rw_semaphore named exec_update_lock that only exec takes for writing. Cc: Jann Horn <jannh@google.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Bernd Edlinger <bernd.edlinger@hotmail.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christopher Yeoh <cyeoh@au1.ibm.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Sargun Dhillon <sargun@sargun.me> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Fixes: eea96732 ("exec: Add exec_update_mutex to replace cred_guard_mutex") [0] https://lkml.kernel.org/r/00000000000063640c05ade8e3de@google.com Reported-by:
<syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com> Link: https://lkml.kernel.org/r/87ft4mbqen.fsf@x220.int.ebiederm.org Signed-off-by:
Eric W. Biederman <ebiederm@xmission.com>
-
- Dec 01, 2020
-
-
Christoph Hellwig authored
Instead of having two structures that represent each block device with different life time rules, merge them into a single one. This also greatly simplifies the reference counting rules, as we can use the inode reference count as the main reference count for the new struct block_device, with the device model reference front ending it for device model interaction. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Hannes Reinecke <hare@suse.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Just use the bd_partno field in struct block_device everywhere. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Hannes Reinecke <hare@suse.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Move the partition_meta_info to struct block_device in preparation for killing struct hd_struct. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Hannes Reinecke <hare@suse.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Avoid a totally pointless goto label, and use the same style of comparism for both helpers. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Hannes Reinecke <hare@suse.de> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
The code in devt_from_partuuid is very convoluted. Refactor a bit by sanitizing the goto and variable name usage. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Hannes Reinecke <hare@suse.de> Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Split each case into a self-contained helper, and move the block dependent code entirely under the pre-existing #ifdef CONFIG_BLOCK. This allows to remove the blk_lookup_devt stub in genhd.h. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Hannes Reinecke <hare@suse.de> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Nathan Chancellor authored
ld.lld 10.0.1 spews a bunch of various warnings about .rela sections, along with a few others. Newer versions of ld.lld do not have these warnings. As a result, do not add '--orphan-handling=warn' to LDFLAGS_vmlinux if ld.lld's version is not new enough. Link: https://github.com/ClangBuiltLinux/linux/issues/1187 Link: https://github.com/ClangBuiltLinux/linux/issues/1193 Reported-by:
Arvind Sankar <nivedita@alum.mit.edu> Reported-by:
kernelci.org bot <bot@kernelci.org> Reported-by:
Mark Brown <broonie@kernel.org> Reviewed-by:
Kees Cook <keescook@chromium.org> Signed-off-by:
Nathan Chancellor <natechancellor@gmail.com> Reviewed-by:
Nick Desaulniers <ndesaulniers@google.com> Signed-off-by:
Masahiro Yamada <masahiroy@kernel.org>
-
Nathan Chancellor authored
Currently, '--orphan-handling=warn' is spread out across four different architectures in their respective Makefiles, which makes it a little unruly to deal with in case it needs to be disabled for a specific linker version (in this case, ld.lld 10.0.1). To make it easier to control this, hoist this warning into Kconfig and the main Makefile so that disabling it is simpler, as the warning will only be enabled in a couple places (main Makefile and a couple of compressed boot folders that blow away LDFLAGS_vmlinx) and making it conditional is easier due to Kconfig syntax. One small additional benefit of this is saving a call to ld-option on incremental builds because we will have already evaluated it for CONFIG_LD_ORPHAN_WARN. To keep the list of supported architectures the same, introduce CONFIG_ARCH_WANT_LD_ORPHAN_WARN, which an architecture can select to gain this automatically after all of the sections are specified and size asserted. A special thanks to Kees Cook for the help text on this config. Link: https://github.com/ClangBuiltLinux/linux/issues/1187 Acked-by:
Kees Cook <keescook@chromium.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Reviewed-by:
Nick Desaulniers <ndesaulniers@google.com> Tested-by:
Nick Desaulniers <ndesaulniers@google.com> Signed-off-by:
Nathan Chancellor <natechancellor@gmail.com> Signed-off-by:
Masahiro Yamada <masahiroy@kernel.org>
-
Masami Hiramatsu authored
Load the size and the checksum fields in the footer as le32 instead of u32. This will allow us to apply bootconfig to the cross build initrd without caring the endianness. Link: https://lkml.kernel.org/r/160583934457.547349.10504070298990791074.stgit@devnote2 Reported-by:
Steven Rostedt <rostedt@goodmis.org> Suggested-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org>
-
- Nov 20, 2020
-
-
Heiko Carstens authored
While allmodconfig and allyesconfig build for s390 there are also various bots running compile tests with randconfig, where PCI is disabled. This reveals that a lot of drivers should actually depend on HAS_IOMEM. Adding this to each device driver would be a never ending story, therefore just disable COMPILE_TEST for s390. The reasoning is more or less the same as described in commit bc083a64 ("init/Kconfig: make COMPILE_TEST depend on !UML"). Reported-by:
kernel test robot <lkp@intel.com> Suggested-by:
Arnd Bergmann <arnd@kernel.org> Signed-off-by:
Heiko Carstens <hca@linux.ibm.com>
-
Petr Mladek authored
stdin, stdout, and stderr standard I/O stream are created for the init process. They are not available when there is no console registered for /dev/console. It might lead to a crash when the init process tries to use them, see the commit 48021f98 ("printk: handle blank console arguments passed in."). Normally, ttySX and ttyX consoles are used as a fallback when no consoles are defined via the command line, device tree, or SPCR. But there will be no console registered when an invalid console name is configured or when the configured consoles do not exist on the system. Users even try to avoid the console intentionally, for example, by using console="" or console=null. It is used on production systems where the serial port or terminal are not visible to users. Pushing messages to these consoles would just unnecessary slowdown the system. Make sure that stdin, stdout, stderr, and /dev/console are always available by a fallback to the existing ttynull driver. It has been implemented for exactly this purpose but it was used only when explicitly configured. Reviewed-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by:
Guenter Roeck <linux@roeck-us.net> Tested-by:
Guenter Roeck <linux@roeck-us.net> Acked-by:
Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by:
Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20201111135450.11214-2-pmladek@suse.com
-
- Nov 13, 2020
-
-
Masami Hiramatsu authored
Since Grub may align the size of initrd to 4 if user pass initrd from cpio, we have to check the preceding 3 bytes as well. Link: https://lkml.kernel.org/r/160520205132.303174.4876760192433315429.stgit@devnote2 Cc: stable@vger.kernel.org Fixes: 85c46b78 ("bootconfig: Add bootconfig magic word for indicating bootconfig explicitly") Reported-by:
Chen Yu <yu.chen.surf@gmail.com> Tested-by:
Chen Yu <yu.chen.surf@gmail.com> Signed-off-by:
Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org>
-
- Nov 03, 2020
-
-
Paul Menzel authored
Currently, LOG_BUF_SHIFT defaults to 17, which is 2 ^ 17 bytes = 128 KB, and LOG_CPU_MAX_BUF_SHIFT defaults to 12, which is 2 ^ 12 bytes = 4 KB. Half of 128 KB is 64 KB, so more than 16 CPUs are required for the value to be used, as then the sum of contributions is greater than 64 KB for the first time. My guess is, that the description was written with the configuration values used in the SUSE in mind. Fixes: 23b2899f ("printk: allow increasing the ring buffer depending on the number of CPUs") Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: linux-kernel@vger.kernel.org Signed-off-by:
Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by:
Petr Mladek <pmladek@suse.com> Signed-off-by:
Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20200811092924.6256-1-pmenzel@molgen.mpg.de
-
- Oct 09, 2020
-
-
Brendan Higgins authored
Although we have not seen any actual examples where KUnit doesn't work because it runs in the late init phase of the kernel, it has been a concern for some time that this could potentially be an issue in the future. So, remove KUnit from init calls entirely, instead call directly from kernel_init() so that KUnit runs after late init. Co-developed-by:
Alan Maguire <alan.maguire@oracle.com> Signed-off-by:
Alan Maguire <alan.maguire@oracle.com> Signed-off-by:
Brendan Higgins <brendanhiggins@google.com> Reviewed-by:
Stephen Boyd <sboyd@kernel.org> Reviewed-by:
Kees Cook <keescook@chromium.org> Reviewed-by:
Luis Chamberlain <mcgrof@kernel.org> Signed-off-by:
Shuah Khan <skhan@linuxfoundation.org>
-
- Oct 01, 2020
-
-
Jens Axboe authored
Grab actual references to the files_struct. To avoid circular references issues due to this, we add a per-task note that keeps track of what io_uring contexts a task has used. When the tasks execs or exits its assigned files, we cancel requests based on this tracking. With that, we can grab proper references to the files table, and no longer need to rely on stashing away ring_fd and ring_file to check if the ring_fd may have been closed. Cc: stable@vger.kernel.org # v5.5+ Reviewed-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 24, 2020
-
-
Stephen Kitt authored
nommu-mmap.rst was moved to Documentation/admin-guide/mm; this patch updates the remaining stale references to Documentation/mm. Fixes: 800c02f5 ("docs: move nommu-mmap.txt to admin-guide and rename to ReST") Signed-off-by:
Stephen Kitt <steve@sk2.org> Link: https://lore.kernel.org/r/20200812092230.27541-1-steve@sk2.org Signed-off-by:
Jonathan Corbet <corbet@lwn.net>
-
- Sep 19, 2020
-
-
Jason Yan authored
This eliminates the following sparse warning: init/main.c:306:6: warning: symbol 'xbc_namebuf' was not declared. Should it be static? Link: https://lkml.kernel.org/r/20200915070324.2239473-1-yanaijie@huawei.com Reported-by:
Hulk Robot <hulkci@huawei.com> Acked-by:
Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by:
Jason Yan <yanaijie@huawei.com> Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org>
-
- Sep 18, 2020
-
-
Masami Hiramatsu authored
Since kprobe_event= cmdline option allows user to put kprobes on the functions in initmem, kprobe has to make such probes gone after boot. Currently the probes on the init functions in modules will be handled by module callback, but the kernel init text isn't handled. Without this, kprobes may access non-exist text area to disable or remove it. Link: https://lkml.kernel.org/r/159972810544.428528.1839307531600646955.stgit@devnote2 Fixes: 970988e1 ("tracing/kprobe: Add kprobe_event= boot parameter") Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Signed-off-by:
Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org>
-
- Sep 08, 2020
-
-
John Ogness authored
The .bss section for the h8300 is relatively small. A value of CONFIG_LOG_BUF_SHIFT that is larger than 19 will create a static printk ringbuffer that is too large. Limit the range appropriately for the H8300. Reported-by:
kernel test robot <lkp@intel.com> Signed-off-by:
John Ogness <john.ogness@linutronix.de> Reviewed-by:
Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by:
Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by:
Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20200812073122.25412-1-john.ogness@linutronix.de
-
- Sep 04, 2020
-
-
Barret Rhoden authored
init_stat() returns 0 on success, same as vfs_lstat(). When it replaced vfs_lstat(), the '!' was dropped. Fixes: 716308a5 ("init: add an init_stat helper") Signed-off-by:
Barret Rhoden <brho@google.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Sep 01, 2020
-
-
Shaokun Zhang authored
Fix up one typo: CONFIG_BOOTCONFIG -> CONFIG_BOOT_CONFIG Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by:
Shaokun Zhang <zhangshaokun@hisilicon.com> Acked-by:
Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by:
Jiri Kosina <jkosina@suse.cz>
-
- Aug 28, 2020
-
-
Alexei Starovoitov authored
Introduce sleepable BPF programs that can request such property for themselves via BPF_F_SLEEPABLE flag at program load time. In such case they will be able to use helpers like bpf_copy_from_user() that might sleep. At present only fentry/fexit/fmod_ret and lsm programs can request to be sleepable and only when they are attached to kernel functions that are known to allow sleeping. The non-sleepable programs are relying on implicit rcu_read_lock() and migrate_disable() to protect life time of programs, maps that they use and per-cpu kernel structures used to pass info between bpf programs and the kernel. The sleepable programs cannot be enclosed into rcu_read_lock(). migrate_disable() maps to preempt_disable() in non-RT kernels, so the progs should not be enclosed in migrate_disable() as well. Therefore rcu_read_lock_trace is used to protect the life time of sleepable progs. There are many networking and tracing program types. In many cases the 'struct bpf_prog *' pointer itself is rcu protected within some other kernel data structure and the kernel code is using rcu_dereference() to load that program pointer and call BPF_PROG_RUN() on it. All these cases are not touched. Instead sleepable bpf programs are allowed with bpf trampoline only. The program pointers are hard-coded into generated assembly of bpf trampoline and synchronize_rcu_tasks_trace() is used to protect the life time of the program. The same trampoline can hold both sleepable and non-sleepable progs. When rcu_read_lock_trace is held it means that some sleepable bpf program is running from bpf trampoline. Those programs can use bpf arrays and preallocated hash/lru maps. These map types are waiting on programs to complete via synchronize_rcu_tasks_trace(); Updates to trampoline now has to do synchronize_rcu_tasks_trace() and synchronize_rcu_tasks() to wait for sleepable progs to finish and for trampoline assembly to finish. This is the first step of introducing sleepable progs. Eventually dynamically allocated hash maps can be allowed and networking program types can become sleepable too. Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Acked-by:
Andrii Nakryiko <andriin@fb.com> Acked-by:
KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20200827220114.69225-3-alexei.starovoitov@gmail.com
-
- Aug 20, 2020
-
-
Alexei Starovoitov authored
Add kernel module with user mode driver that populates bpffs with BPF iterators. $ mount bpffs /my/bpffs/ -t bpf $ ls -la /my/bpffs/ total 4 drwxrwxrwt 2 root root 0 Jul 2 00:27 . drwxr-xr-x 19 root root 4096 Jul 2 00:09 .. -rw------- 1 root root 0 Jul 2 00:27 maps.debug -rw------- 1 root root 0 Jul 2 00:27 progs.debug The user mode driver will load BPF Type Formats, create BPF maps, populate BPF maps, load two BPF programs, attach them to BPF iterators, and finally send two bpf_link IDs back to the kernel. The kernel will pin two bpf_links into newly mounted bpffs instance under names "progs.debug" and "maps.debug". These two files become human readable. $ cat /my/bpffs/progs.debug id name attached 11 dump_bpf_map bpf_iter_bpf_map 12 dump_bpf_prog bpf_iter_bpf_prog 27 test_pkt_access 32 test_main test_pkt_access test_pkt_access 33 test_subprog1 test_pkt_access_subprog1 test_pkt_access 34 test_subprog2 test_pkt_access_subprog2 test_pkt_access 35 test_subprog3 test_pkt_access_subprog3 test_pkt_access 36 new_get_skb_len get_skb_len test_pkt_access 37 new_get_skb_ifindex get_skb_ifindex test_pkt_access 38 new_get_constant get_constant test_pkt_access The BPF program dump_bpf_prog() in iterators.bpf.c is printing this data about all BPF programs currently loaded in the system. This information is unstable and will change from kernel to kernel as ".debug" suffix conveys. Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200819042759.51280-4-alexei.starovoitov@gmail.com
-
- Aug 19, 2020
-
-
Kirill Tkhai authored
Switch over uts namespaces to use the newly introduced common lifetime counter. Currently every namespace type has its own lifetime counter which is stored in the specific namespace struct. The lifetime counters are used identically for all namespaces types. Namespaces may of course have additional unrelated counters and these are not altered. This introduces a common lifetime counter into struct ns_common. The ns_common struct encompasses information that all namespaces share. That should include the lifetime counter since its common for all of them. It also allows us to unify the type of the counters across all namespaces. Most of them use refcount_t but one uses atomic_t and at least one uses kref. Especially the last one doesn't make much sense since it's just a wrapper around refcount_t since 2016 and actually complicates cleanup operations by having to use container_of() to cast the correct namespace struct out of struct ns_common. Having the lifetime counter for the namespaces in one place reduces maintenance cost. Not just because after switching all namespaces over we will have removed more code than we added but also because the logic is more easily understandable and we indicate to the user that the basic lifetime requirements for all namespaces are currently identical. Signed-off-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by:
Kees Cook <keescook@chromium.org> Acked-by:
Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/159644978167.604812.1773586504374412107.stgit@localhost.localdomain Signed-off-by:
Christian Brauner <christian.brauner@ubuntu.com>
-
- Aug 07, 2020
-
-
Andrey Konovalov authored
This patch prepares Software Tag-Based KASAN for stack tagging support. With stack tagging enabled, KASAN tags stack variable in each function in its prologue. In start_kernel() stack variables get tagged before KASAN is enabled via setup_arch()->kasan_init(). As the result the tags for start_kernel()'s stack variables end up in the temporary shadow memory. Later when KASAN gets enabled, switched to normal shadow, and starts checking tags, this leads to false-positive reports, as proper tags are missing in normal shadow. Disable KASAN instrumentation for start_kernel(). Also disable it for arm64's setup_arch() as a precaution (it doesn't have any stack variables right now). [andreyknvl@google.com: reorder attributes for start_kernel()] Link: http://lkml.kernel.org/r/26fb6165a17abcf61222eda5184c030fb6b133d1.1596544734.git.andreyknvl@google.com Signed-off-by:
Andrey Konovalov <andreyknvl@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Elena Petrova <lenaptr@google.com> Cc: Marco Elver <elver@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Walter Wu <walter-zh.wu@mediatek.com> Cc: Ard Biesheuvel <ardb@kernel.org> Link: http://lkml.kernel.org/r/55d432671a92e931ab8234b03dc36b14d4c21bfb.1596199677.git.andreyknvl@google.com Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Kees Cook authored
Patch series "mm: Expand CONFIG_SLAB_FREELIST_HARDENED to include SLAB" In reviewing Vlastimil Babka's latest slub debug series, I realized[1] that several checks under CONFIG_SLAB_FREELIST_HARDENED weren't being applied to SLAB. Fix this by expanding the Kconfig coverage, and adding a simple double-free test for SLAB. This patch (of 2): Include SLAB caches when performing kmem_cache pointer verification. A defense against such corruption[1] should be applied to all the allocators. With this added, the "SLAB_FREE_CROSS" and "SLAB_FREE_PAGE" LKDTM tests now pass on SLAB: lkdtm: Performing direct entry SLAB_FREE_CROSS lkdtm: Attempting cross-cache slab free ... ------------[ cut here ]------------ cache_from_obj: Wrong slab cache. lkdtm-heap-b but object is from lkdtm-heap-a WARNING: CPU: 2 PID: 2195 at mm/slab.h:530 kmem_cache_free+0x8d/0x1d0 ... lkdtm: Performing direct entry SLAB_FREE_PAGE lkdtm: Attempting non-Slab slab free ... ------------[ cut here ]------------ virt_to_cache: Object is not a Slab page! WARNING: CPU: 1 PID: 2202 at mm/slab.h:489 kmem_cache_free+0x196/0x1d0 Additionally clean up neighboring Kconfig entries for clarity, readability, and redundant option removal. [1] https://github.com/ThomasKing2014/slides/raw/master/Building%20universal%20Android%20rooting%20with%20a%20type%20confusion%20vulnerability.pdf Fixes: 598a0717 ("mm/slab: validate cache membership under freelist hardening") Signed-off-by:
Kees Cook <keescook@chromium.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Cc: Alexander Popov <alex.popov@linux.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Jann Horn <jannh@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Garrett <mjg59@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: Vinayak Menon <vinmenon@codeaurora.org> Link: http://lkml.kernel.org/r/20200625215548.389774-1-keescook@chromium.org Link: http://lkml.kernel.org/r/20200625215548.389774-2-keescook@chromium.org Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Aug 05, 2020
-
-
Christoph Hellwig authored
Add a simple helper to grab a reference to a file and install it at the next available fd, and switch the early init code over to it. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Aug 04, 2020
-
-
Masami Hiramatsu authored
Since the parse_args() stops parsing at '--', bootconfig_params() will never get the '--' as param and initargs_found never be true. In the result, if we pass some init arguments via the bootconfig, those are always appended to the kernel command line with '--' even if the kernel command line already has '--'. To fix this correctly, check the return value of parse_args() and set initargs_found true if the return value is not an error but a valid address. Link: https://lkml.kernel.org/r/159650953285.270383.14822353843556363851.stgit@devnote2 Fixes: f61872bb ("bootconfig: Use parse_args() to find bootconfig and '--'") Cc: stable@vger.kernel.org Reported-by:
Arvind Sankar <nivedita@alum.mit.edu> Suggested-by:
Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by:
Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Stafford Horne authored
When booting on 32-bit machines (seen on OpenRISC) I saw this warning with CONFIG_DEBUG_MUTEXES turned on. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at kernel/locking/mutex.c:1242 __mutex_unlock_slowpath+0x328/0x3ec DEBUG_LOCKS_WARN_ON(__owner_task(owner) != current) Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.8.0-rc1-simple-smp-00005-g2864e2171db4-dirty #179 Call trace: [<(ptrval)>] dump_stack+0x34/0x48 [<(ptrval)>] __warn+0x104/0x158 [<(ptrval)>] ? __mutex_unlock_slowpath+0x328/0x3ec [<(ptrval)>] warn_slowpath_fmt+0x7c/0x94 [<(ptrval)>] __mutex_unlock_slowpath+0x328/0x3ec [<(ptrval)>] mutex_unlock+0x18/0x28 [<(ptrval)>] __cpuhp_setup_state_cpuslocked.part.0+0x29c/0x2f4 [<(ptrval)>] ? page_alloc_cpu_dead+0x0/0x30 [<(ptrval)>] ? start_kernel+0x0/0x684 [<(ptrval)>] __cpuhp_setup_state+0x4c/0x5c [<(ptrval)>] page_alloc_init+0x34/0x68 [<(ptrval)>] ? start_kernel+0x1a0/0x684 [<(ptrval)>] ? early_init_dt_scan_nodes+0x60/0x70 irq event stamp: 0 I traced this to kernel/locking/mutex.c storing 3 bits of MUTEX_FLAGS in the task_struct pointer (mutex.owner). There is a comment saying that task_structs are always aligned to L1_CACHE_BYTES. This is not true for the init_task. On 64-bit machines this is not a problem because symbol addresses are naturally aligned to 64-bits providing 3 bits for MUTEX_FLAGS. Howerver, for 32-bit machines the symbol address only has 2 bits available. Fix this by setting init_task alignment to at least L1_CACHE_BYTES. Signed-off-by:
Stafford Horne <shorne@gmail.com> Acked-by:
Peter Zijlstra (Intel) <peterz@infradead.org>
-
- Jul 31, 2020
-
-
Nick Terrell authored
- Add the zstd and zstd22 cmds to scripts/Makefile.lib - Add the HAVE_KERNEL_ZSTD and KERNEL_ZSTD options Architecture specific support is still needed for decompression. Signed-off-by:
Nick Terrell <terrelln@fb.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Tested-by:
Sedat Dilek <sedat.dilek@gmail.com> Reviewed-by:
Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20200730190841.2071656-4-nickrterrell@gmail.com
-
Christoph Hellwig authored
Add a simple helper to set timestamps with a kernel space file name and switch the early init code over to it. Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
Christoph Hellwig authored
Add a simple helper to stat with a kernel space file name and switch the early init code over to it. Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
Christoph Hellwig authored
Add a simple helper to mknod with a kernel space file name and switch the early init code over to it. Remove the now unused ksys_mknod. Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
Christoph Hellwig authored
Add a simple helper to mkdir with a kernel space file name and switch the early init code over to it. Remove the now unused ksys_mkdir. Signed-off-by:
Christoph Hellwig <hch@lst.de>
-