- May 21, 2021
-
-
Jan Beulich authored
xen_setup_gdt(), via xen_load_gdt_boot(), wants to adjust page tables. For this to work when NX is not available, x86_configure_nx() needs to be called first. [jgross] Note that this is a revert of 36104cb9 ("x86/xen: Delay get_cpu_cap until stack canary is established"), which is possible now that we no longer support running as PV guest in 32-bit mode. Cc: <stable.vger.kernel.org> # 5.9 Fixes: 36104cb9 ("x86/xen: Delay get_cpu_cap until stack canary is established") Reported-by:
Olaf Hering <olaf@aepfle.de> Signed-off-by:
Jan Beulich <jbeulich@suse.com> Reviewed-by:
Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/12a866b0-9e89-59f7-ebeb-a2a6cec0987a@suse.com Signed-off-by:
Juergen Gross <jgross@suse.com>
-
- May 14, 2021
-
-
Stefano Stabellini authored
xen_swiotlb_init calls swiotlb_late_init_with_tbl, which fails with -ENOMEM if the swiotlb has already been initialized. Add an explicit check io_tlb_default_mem != NULL at the beginning of xen_swiotlb_init. If the swiotlb is already initialized print a warning and return -EEXIST. On x86, the error propagates. On ARM, we don't actually need a special swiotlb buffer (yet), any buffer would do. So ignore the error and continue. CC: boris.ostrovsky@oracle.com CC: jgross@suse.com Signed-off-by:
Stefano Stabellini <stefano.stabellini@xilinx.com> Reviewed-by:
Boris Ostrovsky <boris.ostrvsky@oracle.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210512201823.1963-3-sstabellini@kernel.org Signed-off-by:
Juergen Gross <jgross@suse.com>
-
Christoph Hellwig authored
Although SWIOTLB_NO_FORCE is meant to allow later calls to swiotlb_init, today dma_direct_map_page returns error if SWIOTLB_NO_FORCE. For now, without a larger overhaul of SWIOTLB_NO_FORCE, the best we can do is to avoid setting SWIOTLB_NO_FORCE in mem_init when we know that it is going to be required later (e.g. Xen requires it). CC: boris.ostrovsky@oracle.com CC: jgross@suse.com CC: catalin.marinas@arm.com CC: will@kernel.org CC: linux-arm-kernel@lists.infradead.org Fixes: 2726bf3f ("swiotlb: Make SWIOTLB_NO_FORCE perform no allocation") Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Stefano Stabellini <stefano.stabellini@xilinx.com> Reviewed-by:
Juergen Gross <jgross@suse.com> Acked-by:
Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20210512201823.1963-2-sstabellini@kernel.org Signed-off-by:
Juergen Gross <jgross@suse.com>
-
Stefano Stabellini authored
Move xen_swiotlb_detect to a static inline function to make it available to !CONFIG_XEN builds. CC: boris.ostrovsky@oracle.com CC: jgross@suse.com Signed-off-by:
Stefano Stabellini <stefano.stabellini@xilinx.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20210512201823.1963-1-sstabellini@kernel.org Signed-off-by:
Juergen Gross <jgross@suse.com>
-
- May 07, 2021
-
-
Masahiro Yamada authored
The section "19) Editor modelines and other cruft" in Documentation/process/coding-style.rst clearly says, "Do not include any of these in source files." I recently receive a patch to explicitly add a new one. Let's do treewide cleanups, otherwise some people follow the existing code and attempt to upstream their favoriate editor setups. It is even nicer if scripts/checkpatch.pl can check it. If we like to impose coding style in an editor-independent manner, I think editorconfig (patch [1]) is a saner solution. [1] https://lore.kernel.org/lkml/20200703073143.423557-1-danny@kdrag0n.dev/ Link: https://lkml.kernel.org/r/20210324054457.1477489-1-masahiroy@kernel.org Signed-off-by:
Masahiro Yamada <masahiroy@kernel.org> Acked-by:
Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> [auxdisplay] Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Maninder Singh authored
In case of a use after free kernel oops, the freeing path of the object is required to debug futher. In most of cases the object address is present in one of the registers. Thus check the register's address and if it belongs to slab, print its alloc and free path. e.g. in the below issue register r6 belongs to slab, and a use after free issue occurred on one of its dereferenced values: Unable to handle kernel paging request at virtual address 6b6b6b6f .... pc : [<c0538afc>] lr : [<c0465674>] psr: 60000013 sp : c8927d40 ip : ffffefff fp : c8aa8020 r10: c8927e10 r9 : 00000001 r8 : 00400cc0 r7 : 00000000 r6 : c8ab0180 r5 : c1804a80 r4 : c8aa8008 r3 : c1a5661c r2 : 00000000 r1 : 6b6b6b6b r0 : c139bf48 ..... Register r6 information: slab kmalloc-64 start c8ab0140 data offset 64 pointer offset 0 size 64 allocated at meminfo_proc_show+0x40/0x4fc meminfo_proc_show+0x40/0x4fc seq_read_iter+0x18c/0x4c4 proc_reg_read_iter+0x84/0xac generic_file_splice_read+0xe8/0x17c splice_direct_to_actor+0xb8/0x290 do_splice_direct+0xa0/0xe0 do_sendfile+0x2d0/0x438 sys_sendfile64+0x12c/0x140 ret_fast_syscall+0x0/0x58 0xbeeacde4 Free path: meminfo_proc_show+0x5c/0x4fc seq_read_iter+0x18c/0x4c4 proc_reg_read_iter+0x84/0xac generic_file_splice_read+0xe8/0x17c splice_direct_to_actor+0xb8/0x290 do_splice_direct+0xa0/0xe0 do_sendfile+0x2d0/0x438 sys_sendfile64+0x12c/0x140 ret_fast_syscall+0x0/0x58 0xbeeacde4 Link: https://lkml.kernel.org/r/1615891032-29160-3-git-send-email-maninder1.s@samsung.com Co-developed-by:
Vaneet Narang <v.narang@samsung.com> Signed-off-by:
Vaneet Narang <v.narang@samsung.com> Signed-off-by:
Maninder Singh <maninder1.s@samsung.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
David Hildenbrand authored
Since /dev/kmem has been removed, let's remove the xlate_dev_kmem_ptr() leftovers. Link: https://lkml.kernel.org/r/20210324102351.6932-3-david@redhat.com Signed-off-by:
David Hildenbrand <david@redhat.com> Acked-by:
Geert Uytterhoeven <geert@linux-m68k.org> Acked-by:
Michal Hocko <mhocko@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Brian Cain <bcain@codeaurora.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Hildenbrand <david@redhat.com> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Niklas Schnelle <schnelle@linux.ibm.com> Cc: Pierre Morel <pmorel@linux.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
David Hildenbrand authored
Patch series "drivers/char: remove /dev/kmem for good". Exploring /dev/kmem and /dev/mem in the context of memory hot(un)plug and memory ballooning, I started questioning the existence of /dev/kmem. Comparing it with the /proc/kcore implementation, it does not seem to be able to deal with things like a) Pages unmapped from the direct mapping (e.g., to be used by secretmem) -> kern_addr_valid(). virt_addr_valid() is not sufficient. b) Special cases like gart aperture memory that is not to be touched -> mem_pfn_is_ram() Unless I am missing something, it's at least broken in some cases and might fault/crash the machine. Looks like its existence has been questioned before in 2005 and 2010 [1], after ~11 additional years, it might make sense to revive the discussion. CONFIG_DEVKMEM is only enabled in a single defconfig (on purpose or by mistake?). All distributions disable it: in Ubuntu it has been disabled for more than 10 years, in Debian since 2.6.31, in Fedora at least starting with FC3, in RHEL starting with RHEL4, in SUSE starting from 15sp2, and OpenSUSE has it disabled as well. 1) /dev/kmem was popular for rootkits [2] before it got disabled basically everywhere. Ubuntu documents [3] "There is no modern user of /dev/kmem any more beyond attackers using it to load kernel rootkits.". RHEL documents in a BZ [5] "it served no practical purpose other than to serve as a potential security problem or to enable binary module drivers to access structures/functions they shouldn't be touching" 2) /proc/kcore is a decent interface to have a controlled way to read kernel memory for debugging puposes. (will need some extensions to deal with memory offlining/unplug, memory ballooning, and poisoned pages, though) 3) It might be useful for corner case debugging [1]. KDB/KGDB might be a better fit, especially, to write random memory; harder to shoot yourself into the foot. 4) "Kernel Memory Editor" [4] hasn't seen any updates since 2000 and seems to be incompatible with 64bit [1]. For educational purposes, /proc/kcore might be used to monitor value updates -- or older kernels can be used. 5) It's broken on arm64, and therefore, completely disabled there. Looks like it's essentially unused and has been replaced by better suited interfaces for individual tasks (/proc/kcore, KDB/KGDB). Let's just remove it. [1] https://lwn.net/Articles/147901/ [2] https://www.linuxjournal.com/article/10505 [3] https://wiki.ubuntu.com/Security/Features#A.2Fdev.2Fkmem_disabled [4] https://sourceforge.net/projects/kme/ [5] https://bugzilla.redhat.com/show_bug.cgi?id=154796 Link: https://lkml.kernel.org/r/20210324102351.6932-1-david@redhat.com Link: https://lkml.kernel.org/r/20210324102351.6932-2-david@redhat.com Signed-off-by:
David Hildenbrand <david@redhat.com> Acked-by:
Michal Hocko <mhocko@suse.com> Acked-by:
Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Alexander A. Klimov" <grandmaster@al2klimov.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Brian Cain <bcain@codeaurora.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Chris Zankel <chris@zankel.net> Cc: Corentin Labbe <clabbe@baylibre.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Greentime Hu <green.hu@gmail.com> Cc: Gregory Clement <gregory.clement@bootlin.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Hillf Danton <hdanton@sina.com> Cc: huang ying <huang.ying.caritas@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: James Troup <james.troup@canonical.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kairui Song <kasong@redhat.com> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Liviu Dudau <liviu.dudau@arm.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@kernel.org> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Niklas Schnelle <schnelle@linux.ibm.com> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com> Cc: openrisc@lists.librecores.org Cc: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Paul Mackerras <paulus@samba.org> Cc: "Pavel Machek (CIP)" <pavel@denx.de> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Pierre Morel <pmorel@linux.ibm.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Rich Felker <dalias@libc.org> Cc: Robert Richter <rric@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Cc: sparclinux@vger.kernel.org Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Theodore Dubois <tblodt@icloud.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: William Cohen <wcohen@redhat.com> Cc: Xiaoming Ni <nixiaoming@huawei.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Yury Norov authored
m68k and sh include bitmap/{find,le}.h prior to ffs/fls headers. New fast-path implementation in find.h requires ffs/fls. Reordering the headers inclusion sequence helps to prevent compile-time implicit function declaration error. [yury.norov@gmail.com: h8300: rearrange headers inclusion order in asm/bitops] Link: https://lkml.kernel.org/r/20210406183625.794227-1-yury.norov@gmail.com Link: https://lkml.kernel.org/r/20210401003153.97325-5-yury.norov@gmail.com Signed-off-by:
Yury Norov <yury.norov@gmail.com> Acked-by:
Geert Uytterhoeven <geert@linux-m68k.org> Acked-by:
Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by:
Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by:
Guenter Roeck <linux@roeck-us.net> Cc: Alexey Klimov <aklimov@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Sterba <dsterba@suse.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Jianpeng Ma <jianpeng.ma@intel.com> Cc: Joe Perches <joe@perches.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Rich Felker <dalias@libc.org> Cc: Stefano Brivio <sbrivio@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Wolfram Sang <wsa+renesas@sang-engineering.com> Cc: Yoshinori Sato <ysato@users.osdn.me> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Randy Dunlap authored
Fix "no previous prototype" W=1 warnings from the kernel test robot: arch/alpha/lib/csum_partial_copy.c:349:1: error: no previous prototype for 'csum_and_copy_from_user' [-Werror=missing-prototypes] 349 | csum_and_copy_from_user(const void __user *src, void *dst, int len) | ^~~~~~~~~~~~~~~~~~~~~~~ arch/alpha/lib/csum_partial_copy.c:358:1: error: no previous prototype for 'csum_partial_copy_nocheck' [-Werror=missing-prototypes] 358 | csum_partial_copy_nocheck(const void *src, void *dst, int len) | ^~~~~~~~~~~~~~~~~~~~~~~~~ Link: https://lkml.kernel.org/r/20210425235749.19113-1-rdunlap@infradead.org Fixes: 808b49da ("alpha: turn csum_partial_copy_from_user() into csum_and_copy_from_user()") Signed-off-by:
Randy Dunlap <rdunlap@infradead.org> Reported-by:
kernel test robot <lkp@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Randy Dunlap authored
'make ARCH=alpha W=1' reports a couple of old-style function definitions with missing parameter list, so fix those. arch/alpha/kernel/pc873xx.c: In function 'pc873xx_get_base': arch/alpha/kernel/pc873xx.c:16:21: warning: old-style function definition [-Wold-style-definition] 16 | unsigned int __init pc873xx_get_base() arch/alpha/kernel/pc873xx.c: In function 'pc873xx_get_model': arch/alpha/kernel/pc873xx.c:21:14: warning: old-style function definition [-Wold-style-definition] 21 | char *__init pc873xx_get_model() Link: https://lkml.kernel.org/r/20210421061312.30097-1-rdunlap@infradead.org Signed-off-by:
Randy Dunlap <rdunlap@infradead.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- May 06, 2021
-
-
Rouven Czerwinski authored
Since commit 79b1feba ("RISC-V: Setup exception vector early") exception vectors are setup early and the handle_exception symbol from the asm files is no longer referenced in traps.c. Remove it. Signed-off-by:
Rouven Czerwinski <rouven@czerwinskis.de> Signed-off-by:
Palmer Dabbelt <palmerdabbelt@google.com>
-
Geert Uytterhoeven authored
The various uses of protect_kernel_linear_mapping_text_rodata() are not consistent: - Its definition depends on "64BIT && !XIP_KERNEL", - Its forward declaration depends on MMU, - Its single caller depends on "STRICT_KERNEL_RWX && 64BIT && MMU && !XIP_KERNEL". Fix this by settling on the dependencies of the caller, which can be simplified as STRICT_KERNEL_RWX depends on "MMU && !XIP_KERNEL". Provide a dummy definition, as the caller is protected by "IS_ENABLED(CONFIG_STRICT_KERNEL_RWX)" instead of "#ifdef CONFIG_STRICT_KERNEL_RWX". Signed-off-by:
Geert Uytterhoeven <geert+renesas@glider.be> Tested-by:
Alexandre Ghiti <alex@ghiti.fr> Signed-off-by:
Palmer Dabbelt <palmerdabbelt@google.com>
-
Vincent Chen authored
The corresponding hardware issues of CONFIG_ERRATA_SIFIVE_CIP_453 and CONFIG_ERRATA_SIFIVE_CIP_1200 only exist in the SiFive 64bit CPU cores. Therefore, these two errata are required only if CONFIG_64BIT=y Reported-by:
kernel test robot <lkp@intel.com> Signed-off-by:
Vincent Chen <vincent.chen@sifive.com> Fixes: bff3ff52 ("riscv: sifive: Apply errata "cip-1200" patch") Fixes: 800149a7 ("riscv: sifive: Apply errata "cip-453" patch") Signed-off-by:
Palmer Dabbelt <palmerdabbelt@google.com>
-
Geert Uytterhoeven authored
When the kernel mapping was moved outside of the linear mapping, the kernel memory reservation was increased, to take into account mapping granularity. However, this is done unconditionally, regardless of whether the kernel memory is mapped read-only or not. If this extension is not needed, up to 2 MiB may be lost, which has a big impact on e.g. Canaan K210 (64-bit nommu) platforms with only 8 MiB of RAM. Reclaim the lost memory by only extending the reserved region when needed, i.e. depending on a simplified version of the conditional logic around the call to protect_kernel_linear_mapping_text_rodata(). Fixes: 2bfc6cd8 ("riscv: Move kernel mapping outside of linear mapping") Signed-off-by:
Geert Uytterhoeven <geert+renesas@glider.be> Tested-by:
Alexandre Ghiti <alex@ghiti.fr> Signed-off-by:
Palmer Dabbelt <palmerdabbelt@google.com>
-
Suravee Suthikulpanit authored
On certain AMD platforms, when the IOMMU performance counter source (csource) field is zero, power-gating for the counter is enabled, which prevents write access and returns zero for read access. This can cause invalid perf result especially when event multiplexing is needed (i.e. more number of events than available counters) since the current logic keeps track of the previously read counter value, and subsequently re-program the counter to continue counting the event. With power-gating enabled, we cannot gurantee successful re-programming of the counter. Workaround this issue by : 1. Modifying the ordering of setting/reading counters and enabing/ disabling csources to only access the counter when the csource is set to non-zero. 2. Since AMD IOMMU PMU does not support interrupt mode, the logic can be simplified to always start counting with value zero, and accumulate the counter value when stopping without the need to keep track and reprogram the counter with the previously read counter value. This has been tested on systems with and without power-gating. Fixes: 994d6608 ("iommu/amd: Remove performance counter pre-initialization test") Suggested-by:
Alexander Monakov <amonakov@ispras.ru> Signed-off-by:
Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210504065236.4415-1-suravee.suthikulpanit@amd.com
-
Shaokun Zhang authored
Commit af391b15 ("arm64: kernel: rename __cpu_suspend to keep it aligned with arm") has used @index instead of @arg, but the comment is stale, update it. Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by:
Shaokun Zhang <zhangshaokun@hisilicon.com> Reviewed-by:
Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/1620280462-21937-1-git-send-email-zhangshaokun@hisilicon.com Signed-off-by:
Catalin Marinas <catalin.marinas@arm.com>
-
- May 05, 2021
-
-
Stefan Metzmacher authored
As io_threads are fully set up USER threads it's clearer to separate the code path from the KTHREAD logic. The only remaining difference to user space threads is that io_threads never return to user space again. Instead they loop within the given worker function. The fact that they never return to user space means they don't have an user space thread stack. In order to indicate that to tools like gdb we reset the stack and instruction pointers to 0. This allows gdb attach to user space processes using io-uring, which like means that they have io_threads, without printing worrying message like this: warning: Selected architecture i386:x86-64 is not compatible with reported target architecture i386 warning: Architecture rejected target-supplied description The output will be something like this: (gdb) info threads Id Target Id Frame * 1 LWP 4863 "io_uring-cp-for" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 2 LWP 4864 "iou-mgr-4863" 0x0000000000000000 in ?? () 3 LWP 4865 "iou-wrk-4863" 0x0000000000000000 in ?? () (gdb) thread 3 [Switching to thread 3 (LWP 4865)] #0 0x0000000000000000 in ?? () (gdb) bt #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0 Fixes: 4727dc20 ("arch: setup PF_IO_WORKER threads like PF_KTHREAD") Link: https://lore.kernel.org/io-uring/044d0bad-6888-a211-e1d3-159a4aeed52d@polymtl.ca/T/#m1bbf5727e3d4e839603f6ec7ed79c7eebfba6267 Signed-off-by:
Stefan Metzmacher <metze@samba.org> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: Jens Axboe <axboe@kernel.dk> cc: Andy Lutomirski <luto@kernel.org> cc: linux-kernel@vger.kernel.org cc: io-uring@vger.kernel.org cc: x86@kernel.org Link: https://lore.kernel.org/r/20210505110310.237537-1-metze@samba.org Reviewed-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Sean Christopherson authored
Move the enter/exit logic in {svm,vmx}_vcpu_enter_exit() to common helpers. Opportunistically update the somewhat stale comment about the updates needing to occur immediately after VM-Exit. No functional change intended. Signed-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210505002735.1684165-9-seanjc@google.com
-
Wanpeng Li authored
Defer the call to account guest time until after servicing any IRQ(s) that happened in the guest or immediately after VM-Exit. Tick-based accounting of vCPU time relies on PF_VCPU being set when the tick IRQ handler runs, and IRQs are blocked throughout the main sequence of vcpu_enter_guest(), including the call into vendor code to actually enter and exit the guest. This fixes a bug where reported guest time remains '0', even when running an infinite loop in the guest: https://bugzilla.kernel.org/show_bug.cgi?id=209831 Fixes: 87fa7f3e ("x86/kvm: Move context tracking where it belongs") Suggested-by:
Thomas Gleixner <tglx@linutronix.de> Co-developed-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Wanpeng Li <wanpengli@tencent.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210505002735.1684165-4-seanjc@google.com
-
Lai Jiangshan authored
In VMX, the host NMI handler needs to be invoked after NMI VM-Exit. Before commit 1a5488ef ("KVM: VMX: Invoke NMI handler via indirect call instead of INTn"), this was done by INTn ("int $2"). But INTn microcode is relatively expensive, so the commit reworked NMI VM-Exit handling to invoke the kernel handler by function call. But this missed a detail. The NMI entry point for direct invocation is fetched from the IDT table and called on the kernel stack. But on 64-bit the NMI entry installed in the IDT expects to be invoked on the IST stack. It relies on the "NMI executing" variable on the IST stack to work correctly, which is at a fixed position in the IST stack. When the entry point is unexpectedly called on the kernel stack, the RSP-addressed "NMI executing" variable is obviously also on the kernel stack and is "uninitialized" and can cause the NMI entry code to run in the wrong way. Provide a non-ist entry point for VMX which shares the C-function with the regular NMI entry and invoke the new asm entry point instead. On 32-bit this just maps to the regular NMI entry point as 32-bit has no ISTs and is not affected. [ tglx: Made it independent for backporting, massaged changelog ] Fixes: 1a5488ef ("KVM: VMX: Invoke NMI handler via indirect call instead of INTn") Signed-off-by:
Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Tested-by:
Lai Jiangshan <laijs@linux.alibaba.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87r1imi8i1.ffs@nanos.tec.linutronix.de
-
Sean Christopherson authored
Drop write_tsc() and write_rdtscp_aux(); the former has no users, and the latter has only a single user and is slightly misleading since the only in-kernel consumer of MSR_TSC_AUX is RDPID, not RDTSCP. No functional change intended. Signed-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210504225632.1532621-3-seanjc@google.com
-
Sean Christopherson authored
Initialize MSR_TSC_AUX with CPU node information if RDTSCP or RDPID is supported. This fixes a bug where vdso_read_cpunode() will read garbage via RDPID if RDPID is supported but RDTSCP is not. While no known CPU supports RDPID but not RDTSCP, both Intel's SDM and AMD's APM allow for RDPID to exist without RDTSCP, e.g. it's technically a legal CPU model for a virtual machine. Note, technically MSR_TSC_AUX could be initialized if and only if RDPID is supported since RDTSCP is currently not used to retrieve the CPU node. But, the cost of the superfluous WRMSR is negigible, whereas leaving MSR_TSC_AUX uninitialized is just asking for future breakage if someone decides to utilize RDTSCP. Fixes: a582c540 ("x86/vdso: Use RDPID in preference to LSL when available") Signed-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210504225632.1532621-2-seanjc@google.com
-
Andi Kleen authored
const variable must be initconst, not initdata. Signed-off-by:
Andi Kleen <andi@firstfloor.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210425211229.3157674-1-ak@linux.intel.com
-
Alexey Dobriyan authored
Both instructions aren't used by kernel. Signed-off-by:
Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/YIHHYNKbiSf5N7+o@localhost.localdomain
-
Wan Jiabing authored
Signed-off-by:
Wan Jiabing <wanjiabing@vivo.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210427063835.9039-1-wanjiabing@vivo.com
-
Oscar Salvador authored
Enable arm64 platform to use the MHP_MEMMAP_ON_MEMORY feature. Link: https://lkml.kernel.org/r/20210421102701.25051-9-osalvador@suse.de Signed-off-by:
Oscar Salvador <osalvador@suse.de> Reviewed-by:
David Hildenbrand <david@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Oscar Salvador authored
Enable x86_64 platform to use the MHP_MEMMAP_ON_MEMORY feature. Link: https://lkml.kernel.org/r/20210421102701.25051-8-osalvador@suse.de Signed-off-by:
Oscar Salvador <osalvador@suse.de> Reviewed-by:
David Hildenbrand <david@redhat.com> Acked-by:
Michal Hocko <mhocko@suse.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Anshuman Khandual authored
HAVE_ARCH_TRANSPARENT_HUGEPAGE has duplicate definitions on platforms that subscribe it. Drop these reduntant definitions and instead just select it on applicable platforms. Link: https://lkml.kernel.org/r/1617259448-22529-7-git-send-email-anshuman.khandual@arm.com Signed-off-by:
Anshuman Khandual <anshuman.khandual@arm.com> Acked-by:
Arnd Bergmann <arnd@arndb.de> Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc] Cc: Russell King <linux@armlinux.org.uk> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Anshuman Khandual authored
ARCH_ENABLE_SPLIT_PMD_PTLOCKS has duplicate definitions on platforms that subscribe it. Drop these redundant definitions and instead just select it on applicable platforms. Link: https://lkml.kernel.org/r/1617259448-22529-6-git-send-email-anshuman.khandual@arm.com Signed-off-by:
Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Heiko Carstens <hca@linux.ibm.com> [s390] Cc: Will Deacon <will@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Anshuman Khandual authored
ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATION configs have duplicate definitions on platforms that subscribe them. Drop these reduntant definitions and instead just select them appropriately. [akpm@linux-foundation.org: s/x86_64/X86_64/, per Oscar] Link: https://lkml.kernel.org/r/1617259448-22529-5-git-send-email-anshuman.khandual@arm.com Signed-off-by:
Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Cc: Will Deacon <will@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Anshuman Khandual authored
ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE] configs have duplicate definitions on platforms that subscribe them. Instead, just make them generic options which can be selected on applicable platforms. Link: https://lkml.kernel.org/r/1617259448-22529-4-git-send-email-anshuman.khandual@arm.com Signed-off-by:
Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Heiko Carstens <hca@linux.ibm.com> [s390] Cc: Will Deacon <will@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Anshuman Khandual authored
SYS_SUPPORTS_HUGETLBFS config has duplicate definitions on platforms that subscribe it. Instead, just make it a generic option which can be selected on applicable platforms. Also rename it as ARCH_SUPPORTS_HUGETLBFS instead. This reduces code duplication and makes it cleaner. Link: https://lkml.kernel.org/r/1617259448-22529-3-git-send-email-anshuman.khandual@arm.com Signed-off-by:
Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> [riscv] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Anshuman Khandual authored
Patch series "mm: some config cleanups", v2. This series contains config cleanup patches which reduces code duplication across platforms and also improves maintainability. There is no functional change intended with this series. This patch (of 6): ARCH_HAS_CACHE_LINE_SIZE config has duplicate definitions on platforms that subscribe it. Instead, just make it a generic option which can be selected on applicable platforms. This change reduces code duplication and makes it cleaner. Link: https://lkml.kernel.org/r/1617259448-22529-1-git-send-email-anshuman.khandual@arm.com Link: https://lkml.kernel.org/r/1617259448-22529-2-git-send-email-anshuman.khandual@arm.com Signed-off-by:
Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc] Cc: Will Deacon <will@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Saravanan D authored
To help with debugging the sluggishness caused by TLB miss/reload, we introduce monotonic hugepage [direct mapped] split event counts since system state: SYSTEM_RUNNING to be displayed as part of /proc/vmstat in x86 servers The lifetime split event information will be displayed at the bottom of /proc/vmstat .... swap_ra 0 swap_ra_hit 0 direct_map_level2_splits 94 direct_map_level3_splits 4 nr_unstable 0 .... One of the many lasting sources of direct hugepage splits is kernel tracing (kprobes, tracepoints). Note that the kernel's code segment [512 MB] points to the same physical addresses that have been already mapped in the kernel's direct mapping range. Source : Documentation/x86/x86_64/mm.rst When we enable kernel tracing, the kernel has to modify attributes/permissions of the text segment hugepages that are direct mapped causing them to split. Kernel's direct mapped hugepages do not coalesce back after split and remain in place for the remainder of the lifetime. An instance of direct page splits when we turn on dynamic kernel tracing .... cat /proc/vmstat | grep -i direct_map_level direct_map_level2_splits 784 direct_map_level3_splits 12 bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @ [pid, comm] = count(); }' cat /proc/vmstat | grep -i direct_map_level direct_map_level2_splits 789 direct_map_level3_splits 12 .... Link: https://lkml.kernel.org/r/20210218235744.1040634-1-saravanand@fb.com Signed-off-by:
Saravanan D <saravanand@fb.com> Acked-by:
Tejun Heo <tj@kernel.org> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Acked-by:
Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Axel Rasmussen authored
Patch series "userfaultfd: add minor fault handling", v9. Overview ======== This series adds a new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS. When enabled (via the UFFDIO_API ioctl), this feature means that any hugetlbfs VMAs registered with UFFDIO_REGISTER_MODE_MISSING will *also* get events for "minor" faults. By "minor" fault, I mean the following situation: Let there exist two mappings (i.e., VMAs) to the same page(s) (shared memory). One of the mappings is registered with userfaultfd (in minor mode), and the other is not. Via the non-UFFD mapping, the underlying pages have already been allocated & filled with some contents. The UFFD mapping has not yet been faulted in; when it is touched for the first time, this results in what I'm calling a "minor" fault. As a concrete example, when working with hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing page. We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE. The idea is, userspace resolves the fault by either a) doing nothing if the contents are already correct, or b) updating the underlying contents using the second, non-UFFD mapping (via memcpy/memset or similar, or something fancier like RDMA, or etc...). In either case, userspace issues UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". Use Case ======== Consider the use case of VM live migration (e.g. under QEMU/KVM): 1. While a VM is still running, we copy the contents of its memory to a target machine. The pages are populated on the target by writing to the non-UFFD mapping, using the setup described above. The VM is still running (and therefore its memory is likely changing), so this may be repeated several times, until we decide the target is "up to date enough". 2. We pause the VM on the source, and start executing on the target machine. During this gap, the VM's user(s) will *see* a pause, so it is desirable to minimize this window. 3. Between the last time any page was copied from the source to the target, and when the VM was paused, the contents of that page may have changed - and therefore the copy we have on the target machine is out of date. Although we can keep track of which pages are out of date, for VMs with large amounts of memory, it is "slow" to transfer this information to the target machine. We want to resume execution before such a transfer would complete. 4. So, the guest begins executing on the target machine. The first time it touches its memory (via the UFFD-registered mapping), userspace wants to intercept this fault. Userspace checks whether or not the page is up to date, and if not, copies the updated page from the source machine, via the non-UFFD mapping. Finally, whether a copy was performed or not, userspace issues a UFFDIO_CONTINUE ioctl to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". We don't have to do all of the final updates on-demand. The userfaultfd manager can, in the background, also copy over updated pages once it receives the map of which pages are up-to-date or not. Interaction with Existing APIs ============================== Because this is a feature, a registered VMA could potentially receive both missing and minor faults. I spent some time thinking through how the existing API interacts with the new feature: UFFDIO_CONTINUE cannot be used to resolve non-minor faults, as it does not allocate a new page. If UFFDIO_CONTINUE is used on a non-minor fault: - For non-shared memory or shmem, -EINVAL is returned. - For hugetlb, -EFAULT is returned. UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults. Without modifications, the existing codepath assumes a new page needs to be allocated. This is okay, since userspace must have a second non-UFFD-registered mapping anyway, thus there isn't much reason to want to use these in any case (just memcpy or memset or similar). - If UFFDIO_COPY is used on a minor fault, -EEXIST is returned. - If UFFDIO_ZEROPAGE is used on a minor fault, -EEXIST is returned (or -EINVAL in the case of hugetlb, as UFFDIO_ZEROPAGE is unsupported in any case). - UFFDIO_WRITEPROTECT simply doesn't work with shared memory, and returns -ENOENT in that case (regardless of the kind of fault). Future Work =========== This series only supports hugetlbfs. I have a second series in flight to support shmem as well, extending the functionality. This series is more mature than the shmem support at this point, and the functionality works fully on hugetlbfs, so this series can be merged first and then shmem support will follow. This patch (of 6): This feature allows userspace to intercept "minor" faults. By "minor" faults, I mean the following situation: Let there exist two mappings (i.e., VMAs) to the same page(s). One of the mappings is registered with userfaultfd (in minor mode), and the other is not. Via the non-UFFD mapping, the underlying pages have already been allocated & filled with some contents. The UFFD mapping has not yet been faulted in; when it is touched for the first time, this results in what I'm calling a "minor" fault. As a concrete example, when working with hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing page. This commit adds the new registration mode, and sets the relevant flag on the VMAs being registered. In the hugetlb fault path, if we find that we have huge_pte_none(), but find_lock_page() does indeed find an existing page, then we have a "minor" fault, and if the VMA has the userfaultfd registration flag, we call into userfaultfd to handle it. This is implemented as a new registration mode, instead of an API feature. This is because the alternative implementation has significant drawbacks [1]. However, doing it this was requires we allocate a VM_* flag for the new registration mode. On 32-bit systems, there are no unused bits, so this feature is only supported on architectures with CONFIG_ARCH_USES_HIGH_VMA_FLAGS. When attempting to register a VMA in MINOR mode on 32-bit architectures, we return -EINVAL. [1] https://lore.kernel.org/patchwork/patch/1380226/ [peterx@redhat.com: fix minor fault page leak] Link: https://lkml.kernel.org/r/20210322175132.36659-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20210301222728.176417-1-axelrasmussen@google.com Link: https://lkml.kernel.org/r/20210301222728.176417-2-axelrasmussen@google.com Signed-off-by:
Axel Rasmussen <axelrasmussen@google.com> Reviewed-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Shaohua Li <shli@fb.com> Cc: Shawn Anastasio <shawn@anastas.io> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Steven Price <steven.price@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Adam Ruprecht <ruprecht@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Cannon Matthews <cannonmatthews@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Oliver Upton <oupton@google.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Anshuman Khandual authored
HUGETLB_PAGE_SIZE_VARIABLE need not be defined for each individual platform subscribing it. Instead just make it generic. Link: https://lkml.kernel.org/r/1614914928-22039-1-git-send-email-anshuman.khandual@arm.com Signed-off-by:
Anshuman Khandual <anshuman.khandual@arm.com> Suggested-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Christoph Hellwig <hch@lst.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Peter Xu authored
Huge pmd sharing could bring problem to userfaultfd. The thing is that userfaultfd is running its logic based on the special bits on page table entries, however the huge pmd sharing could potentially share page table entries for different address ranges. That could cause issues on either: - When sharing huge pmd page tables for an uffd write protected range, the newly mapped huge pmd range will also be write protected unexpectedly, or, - When we try to write protect a range of huge pmd shared range, we'll first do huge_pmd_unshare() in hugetlb_change_protection(), however that also means the UFFDIO_WRITEPROTECT could be silently skipped for the shared region, which could lead to data loss. While at it, a few other things are done altogether: - Move want_pmd_share() from mm/hugetlb.c into linux/hugetlb.h, because that's definitely something that arch code would like to use too - ARM64 currently directly check against CONFIG_ARCH_WANT_HUGE_PMD_SHARE when trying to share huge pmd. Switch to the want_pmd_share() helper. - Move vma_shareable() from huge_pmd_share() into want_pmd_share(). [peterx@redhat.com: fix build with !ARCH_WANT_HUGE_PMD_SHARE] Link: https://lkml.kernel.org/r/20210310185359.88297-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20210218231202.15426-1-peterx@redhat.com Signed-off-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by:
Axel Rasmussen <axelrasmussen@google.com> Tested-by:
Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Adam Ruprecht <ruprecht@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Cannon Matthews <cannonmatthews@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: David Rientjes <rientjes@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oliver Upton <oupton@google.com> Cc: Shaohua Li <shli@fb.com> Cc: Shawn Anastasio <shawn@anastas.io> Cc: Steven Price <steven.price@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Peter Xu authored
Patch series "hugetlb: Disable huge pmd unshare for uffd-wp", v4. This series tries to disable huge pmd unshare of hugetlbfs backed memory for uffd-wp. Although uffd-wp of hugetlbfs is still during rfc stage, the idea of this series may be needed for multiple tasks (Axel's uffd minor fault series, and Mike's soft dirty series), so I picked it out from the larger series. This patch (of 4): It is a preparation work to be able to behave differently in the per architecture huge_pte_alloc() according to different VMA attributes. Pass it deeper into huge_pmd_share() so that we can avoid the find_vma() call. [peterx@redhat.com: build fix] Link: https://lkml.kernel.org/r/20210304164653.GB397383@xz-x1Link: https://lkml.kernel.org/r/20210218230633.15028-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20210218230633.15028-2-peterx@redhat.com Signed-off-by:
Peter Xu <peterx@redhat.com> Suggested-by:
Mike Kravetz <mike.kravetz@oracle.com> Cc: Adam Ruprecht <ruprecht@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Cannon Matthews <cannonmatthews@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: David Rientjes <rientjes@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oliver Upton <oupton@google.com> Cc: Shaohua Li <shli@fb.com> Cc: Shawn Anastasio <shawn@anastas.io> Cc: Steven Price <steven.price@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Mark Rutland authored
Zenghui reports that booting a kernel with "irqchip.gicv3_pseudo_nmi=1" on the command line hits a warning during kernel entry, due to the way we manipulate the PMR. Early in the entry sequence, we call lockdep_hardirqs_off() to inform lockdep that interrupts have been masked (as the HW sets DAIF wqhen entering an exception). Architecturally PMR_EL1 is not affected by exception entry, and we don't set GIC_PRIO_PSR_I_SET in the PMR early in the exception entry sequence, so early in exception entry the PMR can indicate that interrupts are unmasked even though they are masked by DAIF. If DEBUG_LOCKDEP is selected, lockdep_hardirqs_off() will check that interrupts are masked, before we set GIC_PRIO_PSR_I_SET in any of the exception entry paths, and hence lockdep_hardirqs_off() will WARN() that something is amiss. We can avoid this by consistently setting GIC_PRIO_PSR_I_SET during exception entry so that kernel code sees a consistent environment. We must also update local_daif_inherit() to undo this, as currently only touches DAIF. For other paths, local_daif_restore() will update both DAIF and the PMR. With this done, we can remove the existing special cases which set this later in the entry code. We always use (GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET) for consistency with local_daif_save(), as this will warn if it ever encounters (GIC_PRIO_IRQOFF | GIC_PRIO_PSR_I_SET), and never sets this itself. This matches the gic_prio_kentry_setup that we have to retain for ret_to_user. The original splat from Zenghui's report was: | DEBUG_LOCKS_WARN_ON(!irqs_disabled()) | WARNING: CPU: 3 PID: 125 at kernel/locking/lockdep.c:4258 lockdep_hardirqs_off+0xd4/0xe8 | Modules linked in: | CPU: 3 PID: 125 Comm: modprobe Tainted: G W 5.12.0-rc8+ #463 | Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 | pstate: 604003c5 (nZCv DAIF +PAN -UAO -TCO BTYPE=--) | pc : lockdep_hardirqs_off+0xd4/0xe8 | lr : lockdep_hardirqs_off+0xd4/0xe8 | sp : ffff80002a39bad0 | pmr_save: 000000e0 | x29: ffff80002a39bad0 x28: ffff0000de214bc0 | x27: ffff0000de1c0400 x26: 000000000049b328 | x25: 0000000000406f30 x24: ffff0000de1c00a0 | x23: 0000000020400005 x22: ffff8000105f747c | x21: 0000000096000044 x20: 0000000000498ef9 | x19: ffff80002a39bc88 x18: ffffffffffffffff | x17: 0000000000000000 x16: ffff800011c61eb0 | x15: ffff800011700a88 x14: 0720072007200720 | x13: 0720072007200720 x12: 0720072007200720 | x11: 0720072007200720 x10: 0720072007200720 | x9 : ffff80002a39bad0 x8 : ffff80002a39bad0 | x7 : ffff8000119f0800 x6 : c0000000ffff7fff | x5 : ffff8000119f07a8 x4 : 0000000000000001 | x3 : 9bcdab23f2432800 x2 : ffff800011730538 | x1 : 9bcdab23f2432800 x0 : 0000000000000000 | Call trace: | lockdep_hardirqs_off+0xd4/0xe8 | enter_from_kernel_mode.isra.5+0x7c/0xa8 | el1_abort+0x24/0x100 | el1_sync_handler+0x80/0xd0 | el1_sync+0x6c/0x100 | __arch_clear_user+0xc/0x90 | load_elf_binary+0x9fc/0x1450 | bprm_execve+0x404/0x880 | kernel_execve+0x180/0x188 | call_usermodehelper_exec_async+0xdc/0x158 | ret_from_fork+0x10/0x18 Fixes: 23529049 ("arm64: entry: fix non-NMI user<->kernel transitions") Fixes: 7cd1ea10 ("arm64: entry: fix non-NMI kernel<->kernel transitions") Fixes: f0cd5ac1 ("arm64: entry: fix NMI {user, kernel}->kernel transitions") Fixes: 2a9b3e6a ("arm64: entry: fix EL1 debug transitions") Link: https://lore.kernel.org/r/f4012761-026f-4e51-3a0c-7524e434e8b3@huawei.com Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Reported-by:
Zenghui Yu <yuzenghui@huawei.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Acked-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210428111555.50880-1-mark.rutland@arm.com Signed-off-by:
Catalin Marinas <catalin.marinas@arm.com>
-