Skip to content
Snippets Groups Projects
  1. May 21, 2021
  2. May 14, 2021
  3. May 07, 2021
    • Masahiro Yamada's avatar
      treewide: remove editor modelines and cruft · fa60ce2c
      Masahiro Yamada authored
      The section "19) Editor modelines and other cruft" in
      Documentation/process/coding-style.rst clearly says, "Do not include any
      of these in source files."
      
      I recently receive a patch to explicitly add a new one.
      
      Let's do treewide cleanups, otherwise some people follow the existing code
      and attempt to upstream their favoriate editor setups.
      
      It is even nicer if scripts/checkpatch.pl can check it.
      
      If we like to impose coding style in an editor-independent manner, I think
      editorconfig (patch [1]) is a saner solution.
      
      [1] https://lore.kernel.org/lkml/20200703073143.423557-1-danny@kdrag0n.dev/
      
      Link: https://lkml.kernel.org/r/20210324054457.1477489-1-masahiroy@kernel.org
      
      
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Acked-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Reviewed-by: Miguel Ojeda <ojeda@kernel.org>	[auxdisplay]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fa60ce2c
    • Maninder Singh's avatar
      arm: print alloc free paths for address in registers · 5aa6b70e
      Maninder Singh authored
      In case of a use after free kernel oops, the freeing path of the object
      is required to debug futher.  In most of cases the object address is
      present in one of the registers.
      
      Thus check the register's address and if it belongs to slab, print its
      alloc and free path.
      
      e.g. in the below issue register r6 belongs to slab, and a use after
      free issue occurred on one of its dereferenced values:
      
        Unable to handle kernel paging request at virtual address 6b6b6b6f
        ....
        pc : [<c0538afc>]    lr : [<c0465674>]    psr: 60000013
        sp : c8927d40  ip : ffffefff  fp : c8aa8020
        r10: c8927e10  r9 : 00000001  r8 : 00400cc0
        r7 : 00000000  r6 : c8ab0180  r5 : c1804a80  r4 : c8aa8008
        r3 : c1a5661c  r2 : 00000000  r1 : 6b6b6b6b  r0 : c139bf48
        .....
        Register r6 information: slab kmalloc-64 start c8ab0140 data offset 64 pointer offset 0 size 64 allocated at meminfo_proc_show+0x40/0x4fc
            meminfo_proc_show+0x40/0x4fc
            seq_read_iter+0x18c/0x4c4
            proc_reg_read_iter+0x84/0xac
            generic_file_splice_read+0xe8/0x17c
            splice_direct_to_actor+0xb8/0x290
            do_splice_direct+0xa0/0xe0
            do_sendfile+0x2d0/0x438
            sys_sendfile64+0x12c/0x140
            ret_fast_syscall+0x0/0x58
            0xbeeacde4
         Free path:
            meminfo_proc_show+0x5c/0x4fc
            seq_read_iter+0x18c/0x4c4
            proc_reg_read_iter+0x84/0xac
            generic_file_splice_read+0xe8/0x17c
            splice_direct_to_actor+0xb8/0x290
            do_splice_direct+0xa0/0xe0
            do_sendfile+0x2d0/0x438
            sys_sendfile64+0x12c/0x140
            ret_fast_syscall+0x0/0x58
            0xbeeacde4
      
      Link: https://lkml.kernel.org/r/1615891032-29160-3-git-send-email-maninder1.s@samsung.com
      
      
      Co-developed-by: default avatarVaneet Narang <v.narang@samsung.com>
      Signed-off-by: default avatarVaneet Narang <v.narang@samsung.com>
      Signed-off-by: default avatarManinder Singh <maninder1.s@samsung.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5aa6b70e
    • David Hildenbrand's avatar
      mm: remove xlate_dev_kmem_ptr() · f2e762ba
      David Hildenbrand authored
      Since /dev/kmem has been removed, let's remove the xlate_dev_kmem_ptr()
      leftovers.
      
      Link: https://lkml.kernel.org/r/20210324102351.6932-3-david@redhat.com
      
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Palmer Dabbelt <palmerdabbelt@google.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
      Cc: Niklas Schnelle <schnelle@linux.ibm.com>
      Cc: Pierre Morel <pmorel@linux.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f2e762ba
    • David Hildenbrand's avatar
      drivers/char: remove /dev/kmem for good · bbcd53c9
      David Hildenbrand authored
      Patch series "drivers/char: remove /dev/kmem for good".
      
      Exploring /dev/kmem and /dev/mem in the context of memory hot(un)plug and
      memory ballooning, I started questioning the existence of /dev/kmem.
      
      Comparing it with the /proc/kcore implementation, it does not seem to be
      able to deal with things like
      
      a) Pages unmapped from the direct mapping (e.g., to be used by secretmem)
        -> kern_addr_valid(). virt_addr_valid() is not sufficient.
      
      b) Special cases like gart aperture memory that is not to be touched
        -> mem_pfn_is_ram()
      
      Unless I am missing something, it's at least broken in some cases and might
      fault/crash the machine.
      
      Looks like its existence has been questioned before in 2005 and 2010 [1],
      after ~11 additional years, it might make sense to revive the discussion.
      
      CONFIG_DEVKMEM is only enabled in a single defconfig (on purpose or by
      mistake?).  All distributions disable it: in Ubuntu it has been disabled
      for more than 10 years, in Debian since 2.6.31, in Fedora at least
      starting with FC3, in RHEL starting with RHEL4, in SUSE starting from
      15sp2, and OpenSUSE has it disabled as well.
      
      1) /dev/kmem was popular for rootkits [2] before it got disabled
         basically everywhere. Ubuntu documents [3] "There is no modern user of
         /dev/kmem any more beyond attackers using it to load kernel rootkits.".
         RHEL documents in a BZ [5] "it served no practical purpose other than to
         serve as a potential security problem or to enable binary module drivers
         to access structures/functions they shouldn't be touching"
      
      2) /proc/kcore is a decent interface to have a controlled way to read
         kernel memory for debugging puposes. (will need some extensions to
         deal with memory offlining/unplug, memory ballooning, and poisoned
         pages, though)
      
      3) It might be useful for corner case debugging [1]. KDB/KGDB might be a
         better fit, especially, to write random memory; harder to shoot
         yourself into the foot.
      
      4) "Kernel Memory Editor" [4] hasn't seen any updates since 2000 and seems
         to be incompatible with 64bit [1]. For educational purposes,
         /proc/kcore might be used to monitor value updates -- or older
         kernels can be used.
      
      5) It's broken on arm64, and therefore, completely disabled there.
      
      Looks like it's essentially unused and has been replaced by better
      suited interfaces for individual tasks (/proc/kcore, KDB/KGDB). Let's
      just remove it.
      
      [1] https://lwn.net/Articles/147901/
      [2] https://www.linuxjournal.com/article/10505
      [3] https://wiki.ubuntu.com/Security/Features#A.2Fdev.2Fkmem_disabled
      [4] https://sourceforge.net/projects/kme/
      [5] https://bugzilla.redhat.com/show_bug.cgi?id=154796
      
      Link: https://lkml.kernel.org/r/20210324102351.6932-1-david@redhat.com
      Link: https://lkml.kernel.org/r/20210324102351.6932-2-david@redhat.com
      
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Alexander A. Klimov" <grandmaster@al2klimov.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Corentin Labbe <clabbe@baylibre.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Gregory Clement <gregory.clement@bootlin.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: huang ying <huang.ying.caritas@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: James Troup <james.troup@canonical.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kairui Song <kasong@redhat.com>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
      Cc: Liviu Dudau <liviu.dudau@arm.com>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Niklas Schnelle <schnelle@linux.ibm.com>
      Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
      Cc: openrisc@lists.librecores.org
      Cc: Palmer Dabbelt <palmerdabbelt@google.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Pavel Machek (CIP)" <pavel@denx.de>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: Pierre Morel <pmorel@linux.ibm.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
      Cc: sparclinux@vger.kernel.org
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sudeep Holla <sudeep.holla@arm.com>
      Cc: Theodore Dubois <tblodt@icloud.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: William Cohen <wcohen@redhat.com>
      Cc: Xiaoming Ni <nixiaoming@huawei.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bbcd53c9
    • Yury Norov's avatar
      arch: rearrange headers inclusion order in asm/bitops for m68k, sh and h8300 · bb8bc36e
      Yury Norov authored
      m68k and sh include bitmap/{find,le}.h prior to ffs/fls headers.  New
      fast-path implementation in find.h requires ffs/fls.  Reordering the
      headers inclusion sequence helps to prevent compile-time implicit function
      declaration error.
      
      [yury.norov@gmail.com: h8300: rearrange headers inclusion order in asm/bitops]
        Link: https://lkml.kernel.org/r/20210406183625.794227-1-yury.norov@gmail.com
      
      Link: https://lkml.kernel.org/r/20210401003153.97325-5-yury.norov@gmail.com
      
      
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      Acked-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Reviewed-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Cc: Alexey Klimov <aklimov@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: David Sterba <dsterba@suse.com>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Jianpeng Ma <jianpeng.ma@intel.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Stefano Brivio <sbrivio@redhat.com>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
      Cc: Yoshinori Sato <ysato@users.osdn.me>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bb8bc36e
    • Randy Dunlap's avatar
      alpha: csum_partial_copy.c: add function prototypes from <net/checksum.h> · 0214967a
      Randy Dunlap authored
      Fix "no previous prototype" W=1 warnings from the kernel test robot:
      
        arch/alpha/lib/csum_partial_copy.c:349:1: error: no previous prototype for 'csum_and_copy_from_user' [-Werror=missing-prototypes]
        349 | csum_and_copy_from_user(const void __user *src, void *dst, int len)
            | ^~~~~~~~~~~~~~~~~~~~~~~
        arch/alpha/lib/csum_partial_copy.c:358:1: error: no previous prototype for 'csum_partial_copy_nocheck' [-Werror=missing-prototypes]
        358 | csum_partial_copy_nocheck(const void *src, void *dst, int len)
            | ^~~~~~~~~~~~~~~~~~~~~~~~~
      
      Link: https://lkml.kernel.org/r/20210425235749.19113-1-rdunlap@infradead.org
      
      
      Fixes: 808b49da ("alpha: turn csum_partial_copy_from_user() into csum_and_copy_from_user()")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0214967a
    • Randy Dunlap's avatar
      alpha: eliminate old-style function definitions · 543203d2
      Randy Dunlap authored
      'make ARCH=alpha W=1' reports a couple of old-style function
      definitions with missing parameter list, so fix those.
      
        arch/alpha/kernel/pc873xx.c: In function 'pc873xx_get_base':
        arch/alpha/kernel/pc873xx.c:16:21: warning: old-style function definition [-Wold-style-definition]
         16 | unsigned int __init pc873xx_get_base()
      
        arch/alpha/kernel/pc873xx.c: In function 'pc873xx_get_model':
        arch/alpha/kernel/pc873xx.c:21:14: warning: old-style function definition [-Wold-style-definition]
         21 | char *__init pc873xx_get_model()
      
      Link: https://lkml.kernel.org/r/20210421061312.30097-1-rdunlap@infradead.org
      
      
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      543203d2
  4. May 06, 2021
  5. May 05, 2021
    • Stefan Metzmacher's avatar
      x86/process: setup io_threads more like normal user space threads · 50b7b6f2
      Stefan Metzmacher authored
      As io_threads are fully set up USER threads it's clearer to separate the
      code path from the KTHREAD logic.
      
      The only remaining difference to user space threads is that io_threads
      never return to user space again. Instead they loop within the given
      worker function.
      
      The fact that they never return to user space means they don't have an
      user space thread stack. In order to indicate that to tools like gdb we
      reset the stack and instruction pointers to 0.
      
      This allows gdb attach to user space processes using io-uring, which like
      means that they have io_threads, without printing worrying message like
      this:
      
        warning: Selected architecture i386:x86-64 is not compatible with reported target architecture i386
      
        warning: Architecture rejected target-supplied description
      
      The output will be something like this:
      
        (gdb) info threads
          Id   Target Id                  Frame
        * 1    LWP 4863 "io_uring-cp-for" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
          2    LWP 4864 "iou-mgr-4863"    0x0000000000000000 in ?? ()
          3    LWP 4865 "iou-wrk-4863"    0x0000000000000000 in ?? ()
        (gdb) thread 3
        [Switching to thread 3 (LWP 4865)]
        #0  0x0000000000000000 in ?? ()
        (gdb) bt
        #0  0x0000000000000000 in ?? ()
        Backtrace stopped: Cannot access memory at address 0x0
      
      Fixes: 4727dc20 ("arch: setup PF_IO_WORKER threads like PF_KTHREAD")
      Link: https://lore.kernel.org/io-uring/044d0bad-6888-a211-e1d3-159a4aeed52d@polymtl.ca/T/#m1bbf5727e3d4e839603f6ec7ed79c7eebfba6267
      
      
      Signed-off-by: default avatarStefan Metzmacher <metze@samba.org>
      cc: Linus Torvalds <torvalds@linux-foundation.org>
      cc: Jens Axboe <axboe@kernel.dk>
      cc: Andy Lutomirski <luto@kernel.org>
      cc: linux-kernel@vger.kernel.org
      cc: io-uring@vger.kernel.org
      cc: x86@kernel.org
      Link: https://lore.kernel.org/r/20210505110310.237537-1-metze@samba.org
      
      
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      50b7b6f2
    • Sean Christopherson's avatar
      KVM: x86: Consolidate guest enter/exit logic to common helpers · bc908e09
      Sean Christopherson authored
      
      Move the enter/exit logic in {svm,vmx}_vcpu_enter_exit() to common
      helpers.  Opportunistically update the somewhat stale comment about the
      updates needing to occur immediately after VM-Exit.
      
      No functional change intended.
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/20210505002735.1684165-9-seanjc@google.com
      bc908e09
    • Wanpeng Li's avatar
      KVM: x86: Defer vtime accounting 'til after IRQ handling · 16045714
      Wanpeng Li authored
      Defer the call to account guest time until after servicing any IRQ(s)
      that happened in the guest or immediately after VM-Exit.  Tick-based
      accounting of vCPU time relies on PF_VCPU being set when the tick IRQ
      handler runs, and IRQs are blocked throughout the main sequence of
      vcpu_enter_guest(), including the call into vendor code to actually
      enter and exit the guest.
      
      This fixes a bug where reported guest time remains '0', even when
      running an infinite loop in the guest:
      
        https://bugzilla.kernel.org/show_bug.cgi?id=209831
      
      
      
      Fixes: 87fa7f3e ("x86/kvm: Move context tracking where it belongs")
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Co-developed-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210505002735.1684165-4-seanjc@google.com
      16045714
    • Lai Jiangshan's avatar
      KVM/VMX: Invoke NMI non-IST entry instead of IST entry · a217a659
      Lai Jiangshan authored
      
      In VMX, the host NMI handler needs to be invoked after NMI VM-Exit.
      Before commit 1a5488ef ("KVM: VMX: Invoke NMI handler via indirect
      call instead of INTn"), this was done by INTn ("int $2"). But INTn
      microcode is relatively expensive, so the commit reworked NMI VM-Exit
      handling to invoke the kernel handler by function call.
      
      But this missed a detail. The NMI entry point for direct invocation is
      fetched from the IDT table and called on the kernel stack.  But on 64-bit
      the NMI entry installed in the IDT expects to be invoked on the IST stack.
      It relies on the "NMI executing" variable on the IST stack to work
      correctly, which is at a fixed position in the IST stack.  When the entry
      point is unexpectedly called on the kernel stack, the RSP-addressed "NMI
      executing" variable is obviously also on the kernel stack and is
      "uninitialized" and can cause the NMI entry code to run in the wrong way.
      
      Provide a non-ist entry point for VMX which shares the C-function with
      the regular NMI entry and invoke the new asm entry point instead.
      
      On 32-bit this just maps to the regular NMI entry point as 32-bit has no
      ISTs and is not affected.
      
      [ tglx: Made it independent for backporting, massaged changelog ]
      
      Fixes: 1a5488ef ("KVM: VMX: Invoke NMI handler via indirect call instead of INTn")
      Signed-off-by: default avatarLai Jiangshan <laijs@linux.alibaba.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarLai Jiangshan <laijs@linux.alibaba.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/87r1imi8i1.ffs@nanos.tec.linutronix.de
      a217a659
    • Sean Christopherson's avatar
      x86/cpu: Remove write_tsc() and write_rdtscp_aux() wrappers · fc48a6d1
      Sean Christopherson authored
      
      Drop write_tsc() and write_rdtscp_aux(); the former has no users, and the
      latter has only a single user and is slightly misleading since the only
      in-kernel consumer of MSR_TSC_AUX is RDPID, not RDTSCP.
      
      No functional change intended.
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/20210504225632.1532621-3-seanjc@google.com
      fc48a6d1
    • Sean Christopherson's avatar
      x86/cpu: Initialize MSR_TSC_AUX if RDTSCP *or* RDPID is supported · b6b4fbd9
      Sean Christopherson authored
      
      Initialize MSR_TSC_AUX with CPU node information if RDTSCP or RDPID is
      supported.  This fixes a bug where vdso_read_cpunode() will read garbage
      via RDPID if RDPID is supported but RDTSCP is not.  While no known CPU
      supports RDPID but not RDTSCP, both Intel's SDM and AMD's APM allow for
      RDPID to exist without RDTSCP, e.g. it's technically a legal CPU model
      for a virtual machine.
      
      Note, technically MSR_TSC_AUX could be initialized if and only if RDPID
      is supported since RDTSCP is currently not used to retrieve the CPU node.
      But, the cost of the superfluous WRMSR is negigible, whereas leaving
      MSR_TSC_AUX uninitialized is just asking for future breakage if someone
      decides to utilize RDTSCP.
      
      Fixes: a582c540 ("x86/vdso: Use RDPID in preference to LSL when available")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210504225632.1532621-2-seanjc@google.com
      b6b4fbd9
    • Andi Kleen's avatar
      x86/resctrl: Fix init const confusion · 4029b970
      Andi Kleen authored
      
      const variable must be initconst, not initdata.
      
      Signed-off-by: default avatarAndi Kleen <andi@firstfloor.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/20210425211229.3157674-1-ak@linux.intel.com
      4029b970
    • Alexey Dobriyan's avatar
      x86: Delete UD0, UD1 traces · 790d1ce7
      Alexey Dobriyan authored
      
      Both instructions aren't used by kernel.
      
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/YIHHYNKbiSf5N7+o@localhost.localdomain
      790d1ce7
    • Wan Jiabing's avatar
    • Oscar Salvador's avatar
      arm64/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE · ca6e51d5
      Oscar Salvador authored
      Enable arm64 platform to use the MHP_MEMMAP_ON_MEMORY feature.
      
      Link: https://lkml.kernel.org/r/20210421102701.25051-9-osalvador@suse.de
      
      
      Signed-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ca6e51d5
    • Oscar Salvador's avatar
      x86/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE · f91ef222
      Oscar Salvador authored
      Enable x86_64 platform to use the MHP_MEMMAP_ON_MEMORY feature.
      
      Link: https://lkml.kernel.org/r/20210421102701.25051-8-osalvador@suse.de
      
      
      Signed-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f91ef222
    • Anshuman Khandual's avatar
      mm: drop redundant HAVE_ARCH_TRANSPARENT_HUGEPAGE · e8003bf6
      Anshuman Khandual authored
      HAVE_ARCH_TRANSPARENT_HUGEPAGE has duplicate definitions on platforms
      that subscribe it.  Drop these reduntant definitions and instead just
      select it on applicable platforms.
      
      Link: https://lkml.kernel.org/r/1617259448-22529-7-git-send-email-anshuman.khandual@arm.com
      
      
      Signed-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: Vineet Gupta <vgupta@synopsys.com>		[arc]
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmerdabbelt@google.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e8003bf6
    • Anshuman Khandual's avatar
      mm: drop redundant ARCH_ENABLE_SPLIT_PMD_PTLOCK · 66f24fa7
      Anshuman Khandual authored
      ARCH_ENABLE_SPLIT_PMD_PTLOCKS has duplicate definitions on platforms
      that subscribe it.  Drop these redundant definitions and instead just
      select it on applicable platforms.
      
      Link: https://lkml.kernel.org/r/1617259448-22529-6-git-send-email-anshuman.khandual@arm.com
      
      
      Signed-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Heiko Carstens <hca@linux.ibm.com>		[s390]
      Cc: Will Deacon <will@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Palmer Dabbelt <palmerdabbelt@google.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      66f24fa7
    • Anshuman Khandual's avatar
      mm: drop redundant ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATION · 1e866974
      Anshuman Khandual authored
      ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATION configs have duplicate definitions on
      platforms that subscribe them.  Drop these reduntant definitions and
      instead just select them appropriately.
      
      [akpm@linux-foundation.org: s/x86_64/X86_64/, per Oscar]
      
      Link: https://lkml.kernel.org/r/1617259448-22529-5-git-send-email-anshuman.khandual@arm.com
      
      
      Signed-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Cc: Will Deacon <will@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Palmer Dabbelt <palmerdabbelt@google.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1e866974
    • Anshuman Khandual's avatar
      mm: generalize ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE] · 91024b3c
      Anshuman Khandual authored
      ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE] configs have duplicate
      definitions on platforms that subscribe them.  Instead, just make them
      generic options which can be selected on applicable platforms.
      
      Link: https://lkml.kernel.org/r/1617259448-22529-4-git-send-email-anshuman.khandual@arm.com
      
      
      Signed-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Heiko Carstens <hca@linux.ibm.com>		[s390]
      Cc: Will Deacon <will@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Palmer Dabbelt <palmerdabbelt@google.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      91024b3c
    • Anshuman Khandual's avatar
      mm: generalize SYS_SUPPORTS_HUGETLBFS (rename as ARCH_SUPPORTS_HUGETLBFS) · 855f9a8e
      Anshuman Khandual authored
      SYS_SUPPORTS_HUGETLBFS config has duplicate definitions on platforms
      that subscribe it.  Instead, just make it a generic option which can be
      selected on applicable platforms.
      
      Also rename it as ARCH_SUPPORTS_HUGETLBFS instead.  This reduces code
      duplication and makes it cleaner.
      
      Link: https://lkml.kernel.org/r/1617259448-22529-3-git-send-email-anshuman.khandual@arm.com
      
      
      Signed-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>	[riscv]
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      855f9a8e
    • Anshuman Khandual's avatar
      mm: generalize ARCH_HAS_CACHE_LINE_SIZE · c2280be8
      Anshuman Khandual authored
      Patch series "mm: some config cleanups", v2.
      
      This series contains config cleanup patches which reduces code
      duplication across platforms and also improves maintainability.  There
      is no functional change intended with this series.
      
      This patch (of 6):
      
      ARCH_HAS_CACHE_LINE_SIZE config has duplicate definitions on platforms
      that subscribe it.  Instead, just make it a generic option which can be
      selected on applicable platforms.  This change reduces code duplication
      and makes it cleaner.
      
      Link: https://lkml.kernel.org/r/1617259448-22529-1-git-send-email-anshuman.khandual@arm.com
      Link: https://lkml.kernel.org/r/1617259448-22529-2-git-send-email-anshuman.khandual@arm.com
      
      
      Signed-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Vineet Gupta <vgupta@synopsys.com>		[arc]
      Cc: Will Deacon <will@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmerdabbelt@google.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c2280be8
    • Saravanan D's avatar
      x86/mm: track linear mapping split events · 575299ea
      Saravanan D authored
      To help with debugging the sluggishness caused by TLB miss/reload, we
      introduce monotonic hugepage [direct mapped] split event counts since
      system state: SYSTEM_RUNNING to be displayed as part of /proc/vmstat in
      x86 servers
      
      The lifetime split event information will be displayed at the bottom of
      /proc/vmstat
        ....
        swap_ra 0
        swap_ra_hit 0
        direct_map_level2_splits 94
        direct_map_level3_splits 4
        nr_unstable 0
        ....
      
      One of the many lasting sources of direct hugepage splits is kernel
      tracing (kprobes, tracepoints).
      
      Note that the kernel's code segment [512 MB] points to the same physical
      addresses that have been already mapped in the kernel's direct mapping
      range.
      
      Source : Documentation/x86/x86_64/mm.rst
      
      When we enable kernel tracing, the kernel has to modify
      attributes/permissions of the text segment hugepages that are direct
      mapped causing them to split.
      
      Kernel's direct mapped hugepages do not coalesce back after split and
      remain in place for the remainder of the lifetime.
      
      An instance of direct page splits when we turn on dynamic kernel tracing
      ....
      cat /proc/vmstat | grep -i direct_map_level
      direct_map_level2_splits 784
      direct_map_level3_splits 12
      bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @ [pid, comm] =
      count(); }'
      cat /proc/vmstat | grep -i
      direct_map_level
      direct_map_level2_splits 789
      direct_map_level3_splits 12
      ....
      
      Link: https://lkml.kernel.org/r/20210218235744.1040634-1-saravanand@fb.com
      
      
      Signed-off-by: default avatarSaravanan D <saravanand@fb.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      575299ea
    • Axel Rasmussen's avatar
      userfaultfd: add minor fault registration mode · 7677f7fd
      Axel Rasmussen authored
      Patch series "userfaultfd: add minor fault handling", v9.
      
      Overview
      ========
      
      This series adds a new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS.
      When enabled (via the UFFDIO_API ioctl), this feature means that any
      hugetlbfs VMAs registered with UFFDIO_REGISTER_MODE_MISSING will *also*
      get events for "minor" faults.  By "minor" fault, I mean the following
      situation:
      
      Let there exist two mappings (i.e., VMAs) to the same page(s) (shared
      memory).  One of the mappings is registered with userfaultfd (in minor
      mode), and the other is not.  Via the non-UFFD mapping, the underlying
      pages have already been allocated & filled with some contents.  The UFFD
      mapping has not yet been faulted in; when it is touched for the first
      time, this results in what I'm calling a "minor" fault.  As a concrete
      example, when working with hugetlbfs, we have huge_pte_none(), but
      find_lock_page() finds an existing page.
      
      We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE.  The idea
      is, userspace resolves the fault by either a) doing nothing if the
      contents are already correct, or b) updating the underlying contents using
      the second, non-UFFD mapping (via memcpy/memset or similar, or something
      fancier like RDMA, or etc...).  In either case, userspace issues
      UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are
      correct, carry on setting up the mapping".
      
      Use Case
      ========
      
      Consider the use case of VM live migration (e.g. under QEMU/KVM):
      
      1. While a VM is still running, we copy the contents of its memory to a
         target machine. The pages are populated on the target by writing to the
         non-UFFD mapping, using the setup described above. The VM is still running
         (and therefore its memory is likely changing), so this may be repeated
         several times, until we decide the target is "up to date enough".
      
      2. We pause the VM on the source, and start executing on the target machine.
         During this gap, the VM's user(s) will *see* a pause, so it is desirable to
         minimize this window.
      
      3. Between the last time any page was copied from the source to the target, and
         when the VM was paused, the contents of that page may have changed - and
         therefore the copy we have on the target machine is out of date. Although we
         can keep track of which pages are out of date, for VMs with large amounts of
         memory, it is "slow" to transfer this information to the target machine. We
         want to resume execution before such a transfer would complete.
      
      4. So, the guest begins executing on the target machine. The first time it
         touches its memory (via the UFFD-registered mapping), userspace wants to
         intercept this fault. Userspace checks whether or not the page is up to date,
         and if not, copies the updated page from the source machine, via the non-UFFD
         mapping. Finally, whether a copy was performed or not, userspace issues a
         UFFDIO_CONTINUE ioctl to tell the kernel "I have ensured the page contents
         are correct, carry on setting up the mapping".
      
      We don't have to do all of the final updates on-demand. The userfaultfd manager
      can, in the background, also copy over updated pages once it receives the map of
      which pages are up-to-date or not.
      
      Interaction with Existing APIs
      ==============================
      
      Because this is a feature, a registered VMA could potentially receive both
      missing and minor faults.  I spent some time thinking through how the
      existing API interacts with the new feature:
      
      UFFDIO_CONTINUE cannot be used to resolve non-minor faults, as it does not
      allocate a new page.  If UFFDIO_CONTINUE is used on a non-minor fault:
      
      - For non-shared memory or shmem, -EINVAL is returned.
      - For hugetlb, -EFAULT is returned.
      
      UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults.
      Without modifications, the existing codepath assumes a new page needs to
      be allocated.  This is okay, since userspace must have a second
      non-UFFD-registered mapping anyway, thus there isn't much reason to want
      to use these in any case (just memcpy or memset or similar).
      
      - If UFFDIO_COPY is used on a minor fault, -EEXIST is returned.
      - If UFFDIO_ZEROPAGE is used on a minor fault, -EEXIST is returned (or -EINVAL
        in the case of hugetlb, as UFFDIO_ZEROPAGE is unsupported in any case).
      - UFFDIO_WRITEPROTECT simply doesn't work with shared memory, and returns
        -ENOENT in that case (regardless of the kind of fault).
      
      Future Work
      ===========
      
      This series only supports hugetlbfs.  I have a second series in flight to
      support shmem as well, extending the functionality.  This series is more
      mature than the shmem support at this point, and the functionality works
      fully on hugetlbfs, so this series can be merged first and then shmem
      support will follow.
      
      This patch (of 6):
      
      This feature allows userspace to intercept "minor" faults.  By "minor"
      faults, I mean the following situation:
      
      Let there exist two mappings (i.e., VMAs) to the same page(s).  One of the
      mappings is registered with userfaultfd (in minor mode), and the other is
      not.  Via the non-UFFD mapping, the underlying pages have already been
      allocated & filled with some contents.  The UFFD mapping has not yet been
      faulted in; when it is touched for the first time, this results in what
      I'm calling a "minor" fault.  As a concrete example, when working with
      hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing
      page.
      
      This commit adds the new registration mode, and sets the relevant flag on
      the VMAs being registered.  In the hugetlb fault path, if we find that we
      have huge_pte_none(), but find_lock_page() does indeed find an existing
      page, then we have a "minor" fault, and if the VMA has the userfaultfd
      registration flag, we call into userfaultfd to handle it.
      
      This is implemented as a new registration mode, instead of an API feature.
      This is because the alternative implementation has significant drawbacks
      [1].
      
      However, doing it this was requires we allocate a VM_* flag for the new
      registration mode.  On 32-bit systems, there are no unused bits, so this
      feature is only supported on architectures with
      CONFIG_ARCH_USES_HIGH_VMA_FLAGS.  When attempting to register a VMA in
      MINOR mode on 32-bit architectures, we return -EINVAL.
      
      [1] https://lore.kernel.org/patchwork/patch/1380226/
      
      [peterx@redhat.com: fix minor fault page leak]
        Link: https://lkml.kernel.org/r/20210322175132.36659-1-peterx@redhat.com
      
      Link: https://lkml.kernel.org/r/20210301222728.176417-1-axelrasmussen@google.com
      Link: https://lkml.kernel.org/r/20210301222728.176417-2-axelrasmussen@google.com
      
      
      Signed-off-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chinwen Chang <chinwen.chang@mediatek.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Lokesh Gidra <lokeshgidra@google.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michal Koutn" <mkoutny@suse.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Shawn Anastasio <shawn@anastas.io>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Adam Ruprecht <ruprecht@google.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Cannon Matthews <cannonmatthews@google.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7677f7fd
    • Anshuman Khandual's avatar
      mm: generalize HUGETLB_PAGE_SIZE_VARIABLE · 4bfb68a0
      Anshuman Khandual authored
      HUGETLB_PAGE_SIZE_VARIABLE need not be defined for each individual
      platform subscribing it.  Instead just make it generic.
      
      Link: https://lkml.kernel.org/r/1614914928-22039-1-git-send-email-anshuman.khandual@arm.com
      
      
      Signed-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Suggested-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4bfb68a0
    • Peter Xu's avatar
      hugetlb/userfaultfd: forbid huge pmd sharing when uffd enabled · c1991e07
      Peter Xu authored
      Huge pmd sharing could bring problem to userfaultfd.  The thing is that
      userfaultfd is running its logic based on the special bits on page table
      entries, however the huge pmd sharing could potentially share page table
      entries for different address ranges.  That could cause issues on
      either:
      
       - When sharing huge pmd page tables for an uffd write protected range,
         the newly mapped huge pmd range will also be write protected
         unexpectedly, or,
      
       - When we try to write protect a range of huge pmd shared range, we'll
         first do huge_pmd_unshare() in hugetlb_change_protection(), however
         that also means the UFFDIO_WRITEPROTECT could be silently skipped for
         the shared region, which could lead to data loss.
      
      While at it, a few other things are done altogether:
      
       - Move want_pmd_share() from mm/hugetlb.c into linux/hugetlb.h, because
         that's definitely something that arch code would like to use too
      
       - ARM64 currently directly check against
         CONFIG_ARCH_WANT_HUGE_PMD_SHARE when trying to share huge pmd. Switch
         to the want_pmd_share() helper.
      
       - Move vma_shareable() from huge_pmd_share() into want_pmd_share().
      
      [peterx@redhat.com: fix build with !ARCH_WANT_HUGE_PMD_SHARE]
        Link: https://lkml.kernel.org/r/20210310185359.88297-1-peterx@redhat.com
      
      Link: https://lkml.kernel.org/r/20210218231202.15426-1-peterx@redhat.com
      
      
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Tested-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Cc: Adam Ruprecht <ruprecht@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Cannon Matthews <cannonmatthews@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chinwen Chang <chinwen.chang@mediatek.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Lokesh Gidra <lokeshgidra@google.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michal Koutn" <mkoutny@suse.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Shawn Anastasio <shawn@anastas.io>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c1991e07
    • Peter Xu's avatar
      hugetlb: pass vma into huge_pte_alloc() and huge_pmd_share() · aec44e0f
      Peter Xu authored
      Patch series "hugetlb: Disable huge pmd unshare for uffd-wp", v4.
      
      This series tries to disable huge pmd unshare of hugetlbfs backed memory
      for uffd-wp.  Although uffd-wp of hugetlbfs is still during rfc stage,
      the idea of this series may be needed for multiple tasks (Axel's uffd
      minor fault series, and Mike's soft dirty series), so I picked it out
      from the larger series.
      
      This patch (of 4):
      
      It is a preparation work to be able to behave differently in the per
      architecture huge_pte_alloc() according to different VMA attributes.
      
      Pass it deeper into huge_pmd_share() so that we can avoid the find_vma() call.
      
      [peterx@redhat.com: build fix]
        Link: https://lkml.kernel.org/r/20210304164653.GB397383@xz-x1Link: https://lkml.kernel.org/r/20210218230633.15028-1-peterx@redhat.com
      
      Link: https://lkml.kernel.org/r/20210218230633.15028-2-peterx@redhat.com
      
      
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Suggested-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Adam Ruprecht <ruprecht@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Cannon Matthews <cannonmatthews@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chinwen Chang <chinwen.chang@mediatek.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Lokesh Gidra <lokeshgidra@google.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michal Koutn" <mkoutny@suse.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Shawn Anastasio <shawn@anastas.io>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aec44e0f
    • Mark Rutland's avatar
      arm64: entry: always set GIC_PRIO_PSR_I_SET during entry · 4d6a38da
      Mark Rutland authored
      Zenghui reports that booting a kernel with "irqchip.gicv3_pseudo_nmi=1"
      on the command line hits a warning during kernel entry, due to the way
      we manipulate the PMR.
      
      Early in the entry sequence, we call lockdep_hardirqs_off() to inform
      lockdep that interrupts have been masked (as the HW sets DAIF wqhen
      entering an exception). Architecturally PMR_EL1 is not affected by
      exception entry, and we don't set GIC_PRIO_PSR_I_SET in the PMR early in
      the exception entry sequence, so early in exception entry the PMR can
      indicate that interrupts are unmasked even though they are masked by
      DAIF.
      
      If DEBUG_LOCKDEP is selected, lockdep_hardirqs_off() will check that
      interrupts are masked, before we set GIC_PRIO_PSR_I_SET in any of the
      exception entry paths, and hence lockdep_hardirqs_off() will WARN() that
      something is amiss.
      
      We can avoid this by consistently setting GIC_PRIO_PSR_I_SET during
      exception entry so that kernel code sees a consistent environment. We
      must also update local_daif_inherit() to undo this, as currently only
      touches DAIF. For other paths, local_daif_restore() will update both
      DAIF and the PMR. With this done, we can remove the existing special
      cases which set this later in the entry code.
      
      We always use (GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET) for consistency with
      local_daif_save(), as this will warn if it ever encounters
      (GIC_PRIO_IRQOFF | GIC_PRIO_PSR_I_SET), and never sets this itself. This
      matches the gic_prio_kentry_setup that we have to retain for
      ret_to_user.
      
      The original splat from Zenghui's report was:
      
      | DEBUG_LOCKS_WARN_ON(!irqs_disabled())
      | WARNING: CPU: 3 PID: 125 at kernel/locking/lockdep.c:4258 lockdep_hardirqs_off+0xd4/0xe8
      | Modules linked in:
      | CPU: 3 PID: 125 Comm: modprobe Tainted: G        W         5.12.0-rc8+ #463
      | Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
      | pstate: 604003c5 (nZCv DAIF +PAN -UAO -TCO BTYPE=--)
      | pc : lockdep_hardirqs_off+0xd4/0xe8
      | lr : lockdep_hardirqs_off+0xd4/0xe8
      | sp : ffff80002a39bad0
      | pmr_save: 000000e0
      | x29: ffff80002a39bad0 x28: ffff0000de214bc0
      | x27: ffff0000de1c0400 x26: 000000000049b328
      | x25: 0000000000406f30 x24: ffff0000de1c00a0
      | x23: 0000000020400005 x22: ffff8000105f747c
      | x21: 0000000096000044 x20: 0000000000498ef9
      | x19: ffff80002a39bc88 x18: ffffffffffffffff
      | x17: 0000000000000000 x16: ffff800011c61eb0
      | x15: ffff800011700a88 x14: 0720072007200720
      | x13: 0720072007200720 x12: 0720072007200720
      | x11: 0720072007200720 x10: 0720072007200720
      | x9 : ffff80002a39bad0 x8 : ffff80002a39bad0
      | x7 : ffff8000119f0800 x6 : c0000000ffff7fff
      | x5 : ffff8000119f07a8 x4 : 0000000000000001
      | x3 : 9bcdab23f2432800 x2 : ffff800011730538
      | x1 : 9bcdab23f2432800 x0 : 0000000000000000
      | Call trace:
      |  lockdep_hardirqs_off+0xd4/0xe8
      |  enter_from_kernel_mode.isra.5+0x7c/0xa8
      |  el1_abort+0x24/0x100
      |  el1_sync_handler+0x80/0xd0
      |  el1_sync+0x6c/0x100
      |  __arch_clear_user+0xc/0x90
      |  load_elf_binary+0x9fc/0x1450
      |  bprm_execve+0x404/0x880
      |  kernel_execve+0x180/0x188
      |  call_usermodehelper_exec_async+0xdc/0x158
      |  ret_from_fork+0x10/0x18
      
      Fixes: 23529049 ("arm64: entry: fix non-NMI user<->kernel transitions")
      Fixes: 7cd1ea10 ("arm64: entry: fix non-NMI kernel<->kernel transitions")
      Fixes: f0cd5ac1 ("arm64: entry: fix NMI {user, kernel}->kernel transitions")
      Fixes: 2a9b3e6a ("arm64: entry: fix EL1 debug transitions")
      Link: https://lore.kernel.org/r/f4012761-026f-4e51-3a0c-7524e434e8b3@huawei.com
      
      
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reported-by: default avatarZenghui Yu <yuzenghui@huawei.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Acked-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20210428111555.50880-1-mark.rutland@arm.com
      
      
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      4d6a38da
Loading