- Mar 12, 2019
-
-
Mike Rapoport authored
As all the memblock allocation functions return NULL in case of error rather than panic(), the duplicates with _nopanic suffix can be removed. Link: http://lkml.kernel.org/r/1548057848-15136-22-git-send-email-rppt@linux.ibm.com Signed-off-by:
Mike Rapoport <rppt@linux.ibm.com> Acked-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Petr Mladek <pmladek@suse.com> [printk] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Guo Ren <ren_guo@c-sky.com> [c-sky] Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Juergen Gross <jgross@suse.com> [Xen] Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Burton <paul.burton@mips.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Mike Rapoport authored
Add check for the return value of memblock_alloc*() functions and call panic() in case of error. The panic message repeats the one used by panicing memblock allocators with adjustment of parameters to include only relevant ones. The replacement was mostly automated with semantic patches like the one below with manual massaging of format strings. @@ expression ptr, size, align; @@ ptr = memblock_alloc(size, align); + if (!ptr) + panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__, size, align); [anders.roxell@linaro.org: use '%pa' with 'phys_addr_t' type] Link: http://lkml.kernel.org/r/20190131161046.21886-1-anders.roxell@linaro.org [rppt@linux.ibm.com: fix format strings for panics after memblock_alloc] Link: http://lkml.kernel.org/r/1548950940-15145-1-git-send-email-rppt@linux.ibm.com [rppt@linux.ibm.com: don't panic if the allocation in sparse_buffer_init fails] Link: http://lkml.kernel.org/r/20190131074018.GD28876@rapoport-lnx [akpm@linux-foundation.org: fix xtensa printk warning] Link: http://lkml.kernel.org/r/1548057848-15136-20-git-send-email-rppt@linux.ibm.com Signed-off-by:
Mike Rapoport <rppt@linux.ibm.com> Signed-off-by:
Anders Roxell <anders.roxell@linaro.org> Reviewed-by: Guo Ren <ren_guo@c-sky.com> [c-sky] Acked-by: Paul Burton <paul.burton@mips.com> [MIPS] Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390] Reviewed-by: Juergen Gross <jgross@suse.com> [Xen] Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Max Filippov <jcmvbkbc@gmail.com> [xtensa] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Mike Rapoport authored
Add panic() calls if memblock_alloc() returns NULL. The panic() format duplicates the one used by memblock itself and in order to avoid explosion with long parameters list replace open coded allocation size calculations with a local variable. Link: http://lkml.kernel.org/r/1548057848-15136-19-git-send-email-rppt@linux.ibm.com Signed-off-by:
Mike Rapoport <rppt@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Guo Ren <ren_guo@c-sky.com> [c-sky] Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Juergen Gross <jgross@suse.com> [Xen] Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Burton <paul.burton@mips.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Zev Weiss authored
do_proc_do[u]intvec_minmax_conv() had included open-coded versions of do_proc_do[u]intvec_conv(); the duplication led to buggy inconsistencies (missing range checks). To reduce the likelihood of such problems in the future, we can instead refactor both to be defined in terms of their non-bounded counterparts (plus the added check). Link: http://lkml.kernel.org/r/20190207165138.5oud57vq4ozwb4kh@hatter.bewilderbeest.net Signed-off-by:
Zev Weiss <zev@bewilderbeest.net> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Zev Weiss authored
This bug has apparently existed since the introduction of this function in the pre-git era (4500e91754d3 in Thomas Gleixner's history.git, "[NET]: Add proc_dointvec_userhz_jiffies, use it for proper handling of neighbour sysctls."). As a minimal fix we can simply duplicate the corresponding check in do_proc_dointvec_conv(). Link: http://lkml.kernel.org/r/20190207123426.9202-3-zev@bewilderbeest.net Signed-off-by:
Zev Weiss <zev@bewilderbeest.net> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: <stable@vger.kernel.org> [2.6.2+] Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Mar 09, 2019
-
-
Qian Cai authored
The following commit: 669de8bd ("kernel/workqueue: Use dynamic lockdep keys for workqueues") introduced a memory leak as wq_free_lockdep() calls kfree(wq->lock_name), but wq_init_lockdep() does not point wq->lock_name to the newly allocated slab object. This can be reproduced by running LTP fallocate04 followed by oom01 tests: unreferenced object 0xc0000005876384d8 (size 64): comm "fallocate04", pid 26972, jiffies 4297139141 (age 40370.480s) hex dump (first 32 bytes): 28 77 71 5f 63 6f 6d 70 6c 65 74 69 6f 6e 29 65 (wq_completion)e 78 74 34 2d 72 73 76 2d 63 6f 6e 76 65 72 73 69 xt4-rsv-conversi backtrace: [<00000000cb452883>] kvasprintf+0x6c/0xe0 [<000000004654ddac>] kasprintf+0x34/0x60 [<000000001c68f311>] alloc_workqueue+0x1f8/0x6ac [<0000000003c2ad83>] ext4_fill_super+0x23d4/0x3c80 [ext4] [<0000000006610538>] mount_bdev+0x25c/0x290 [<00000000bcf955ec>] ext4_mount+0x28/0x50 [ext4] [<0000000016e08fd3>] legacy_get_tree+0x4c/0xb0 [<0000000042b6a5fc>] vfs_get_tree+0x6c/0x190 [<00000000268ab022>] do_mount+0xb9c/0x1100 [<00000000698e6898>] ksys_mount+0x158/0x180 [<0000000064e391fd>] sys_mount+0x20/0x30 [<00000000ba378f12>] system_call+0x5c/0x70 Signed-off-by:
Qian Cai <cai@lca.pw> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: catalin.marinas@arm.com Cc: jiangshanlai@gmail.com Cc: tj@kernel.org Fixes: 669de8bd ("kernel/workqueue: Use dynamic lockdep keys for workqueues") Link: https://lkml.kernel.org/r/20190307002731.47371-1-cai@lca.pw Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Bart Van Assche authored
This patch fixes a use-after-free and a memory leak in an alloc_workqueue() error path. Repoted by syzkaller and KASAN: BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:197 [inline] BUG: KASAN: use-after-free in lockdep_register_key+0x3b9/0x490 kernel/locking/lockdep.c:1023 Read of size 8 at addr ffff888090fc2698 by task syz-executor134/7858 CPU: 1 PID: 7858 Comm: syz-executor134 Not tainted 5.0.0-rc8-next-20190301 #1 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132 __read_once_size include/linux/compiler.h:197 [inline] lockdep_register_key+0x3b9/0x490 kernel/locking/lockdep.c:1023 wq_init_lockdep kernel/workqueue.c:3444 [inline] alloc_workqueue+0x427/0xe70 kernel/workqueue.c:4263 ucma_open+0x76/0x290 drivers/infiniband/core/ucma.c:1732 misc_open+0x398/0x4c0 drivers/char/misc.c:141 chrdev_open+0x247/0x6b0 fs/char_dev.c:417 do_dentry_open+0x488/0x1160 fs/open.c:771 vfs_open+0xa0/0xd0 fs/open.c:880 do_last fs/namei.c:3416 [inline] path_openat+0x10e9/0x46e0 fs/namei.c:3533 do_filp_open+0x1a1/0x280 fs/namei.c:3563 do_sys_open+0x3fe/0x5d0 fs/open.c:1063 __do_sys_openat fs/open.c:1090 [inline] __se_sys_openat fs/open.c:1084 [inline] __x64_sys_openat+0x9d/0x100 fs/open.c:1084 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Allocated by task 7789: save_stack+0x45/0xd0 mm/kasan/common.c:75 set_track mm/kasan/common.c:87 [inline] __kasan_kmalloc mm/kasan/common.c:497 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:470 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:511 __do_kmalloc mm/slab.c:3726 [inline] __kmalloc+0x15c/0x740 mm/slab.c:3735 kmalloc include/linux/slab.h:553 [inline] kzalloc include/linux/slab.h:743 [inline] alloc_workqueue+0x13c/0xe70 kernel/workqueue.c:4236 ucma_open+0x76/0x290 drivers/infiniband/core/ucma.c:1732 misc_open+0x398/0x4c0 drivers/char/misc.c:141 chrdev_open+0x247/0x6b0 fs/char_dev.c:417 do_dentry_open+0x488/0x1160 fs/open.c:771 vfs_open+0xa0/0xd0 fs/open.c:880 do_last fs/namei.c:3416 [inline] path_openat+0x10e9/0x46e0 fs/namei.c:3533 do_filp_open+0x1a1/0x280 fs/namei.c:3563 do_sys_open+0x3fe/0x5d0 fs/open.c:1063 __do_sys_openat fs/open.c:1090 [inline] __se_sys_openat fs/open.c:1084 [inline] __x64_sys_openat+0x9d/0x100 fs/open.c:1084 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 7789: save_stack+0x45/0xd0 mm/kasan/common.c:75 set_track mm/kasan/common.c:87 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:459 kasan_slab_free+0xe/0x10 mm/kasan/common.c:467 __cache_free mm/slab.c:3498 [inline] kfree+0xcf/0x230 mm/slab.c:3821 alloc_workqueue+0xc3e/0xe70 kernel/workqueue.c:4295 ucma_open+0x76/0x290 drivers/infiniband/core/ucma.c:1732 misc_open+0x398/0x4c0 drivers/char/misc.c:141 chrdev_open+0x247/0x6b0 fs/char_dev.c:417 do_dentry_open+0x488/0x1160 fs/open.c:771 vfs_open+0xa0/0xd0 fs/open.c:880 do_last fs/namei.c:3416 [inline] path_openat+0x10e9/0x46e0 fs/namei.c:3533 do_filp_open+0x1a1/0x280 fs/namei.c:3563 do_sys_open+0x3fe/0x5d0 fs/open.c:1063 __do_sys_openat fs/open.c:1090 [inline] __se_sys_openat fs/open.c:1084 [inline] __x64_sys_openat+0x9d/0x100 fs/open.c:1084 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff888090fc2580 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 280 bytes inside of 512-byte region [ffff888090fc2580, ffff888090fc2780) Reported-by:
<syzbot+17335689e239ce135d8b@syzkaller.appspotmail.com> Signed-off-by:
Bart Van Assche <bvanassche@acm.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Fixes: 669de8bd ("kernel/workqueue: Use dynamic lockdep keys for workqueues") Link: https://lkml.kernel.org/r/20190303220046.29448-1-bvanassche@acm.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Bart Van Assche authored
init_data_structures_once() is called for the first time before RCU has been initialized. Make sure that init_rcu_head() is called before the RCU head is used and after RCU has been initialized. Signed-off-by:
Bart Van Assche <bvanassche@acm.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: longman@redhat.com Link: https://lkml.kernel.org/r/c20aa0f0-42ab-a884-d931-7d4ec2bf0cdc@acm.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Arnd Bergmann authored
Clang warns about a tentative array definition without a length: kernel/locking/lockdep.c:845:12: error: tentative array definition assumed to have one element [-Werror] There is no real reason to do this here, so just set the same length as in the real definition later in the same file. It has to be hidden in an #ifdef or annotated __maybe_unused though, to avoid the unused-variable warning if CONFIG_PROVE_LOCKING is disabled. Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Link: https://lkml.kernel.org/r/20190307075222.3424524-1-arnd@arndb.de Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Gustavo A. R. Silva authored
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This patch fixes the following warning: kernel/events/core.c: In function ‘perf_event_parse_addr_filter’: kernel/events/core.c:9154:11: warning: this statement may fall through [-Wimplicit-fallthrough=] kernel = 1; ~~~~~~~^~~ kernel/events/core.c:9156:3: note: here case IF_SRC_FILEADDR: ^~~~ Warning level 3 was used: -Wimplicit-fallthrough=3 This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough. Signed-off-by:
Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/20190212205430.GA8446@embeddedor Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Alexander Shishkin authored
Currently, the AUX buffer allocator will use high-order allocations for PMUs that don't support hardware scatter-gather chaining to ensure large contiguous blocks of pages, and always use an array of single pages otherwise. There is, however, a tangible performance benefit in using larger chunks of contiguous memory even in the latter case, that comes from not having to fetch the next page's address at every page boundary. In particular, a task running under Intel PT on an Atom CPU shows 1.5%-2% less runtime penalty with a single multi-page output region in snapshot mode (no PMI) than with multiple single-page output regions, from ~6% down to ~4%. For the snapshot mode it does make a difference as it is intended to run over long periods of time. For this reason, change the allocation policy to always optimistically start with the highest possible order when allocating pages for the AUX buffer, desceding until the allocation succeeds or order zero allocation fails. Signed-off-by:
Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/20190215114727.62648-2-alexander.shishkin@linux.intel.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Mar 08, 2019
-
-
YueHaibing authored
Remove duplicated include. Link: http://lkml.kernel.org/r/20181209062952.17736-1-yuehaibing@huawei.com Signed-off-by:
YueHaibing <yuehaibing@huawei.com> Reviewed-by:
Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Elena Reshetova authored
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable kcov.refcount is used as pure reference counter. Convert it to refcount_t and fix up the operations. **Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the kcov.refcount it might make a difference in following places: - kcov_put(): decrement in refcount_dec_and_test() only provides RELEASE ordering and control dependency on success vs. fully ordered atomic counterpart Link: http://lkml.kernel.org/r/1547634429-772-1-git-send-email-elena.reshetova@intel.com Signed-off-by:
Elena Reshetova <elena.reshetova@intel.com> Suggested-by:
Kees Cook <keescook@chromium.org> Reviewed-by:
David Windsor <dwindsor@gmail.com> Reviewed-by:
Hans Liljestrand <ishkamiel@gmail.com> Reviewed-by:
Dmitry Vyukov <dvyukov@google.com> Reviewed-by:
Andrea Parri <andrea.parri@amarulasolutions.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Greg Kroah-Hartman authored
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Link: http://lkml.kernel.org/r/20190122152151.16139-46-gregkh@linuxfoundation.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Anders Roxell <anders.roxell@linaro.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Masahiro Yamada authored
This slightly optimizes the kernel/configs.c build. bin2c is not very efficient because it converts a data file into a huge array to embed it into a *.c file. Instead, we can use the .incbin directive. Also, this simplifies the code; Makefile is cleaner, and the way to get the offset/size of the config_data.gz is more straightforward. I used the "asm" statement in *.c instead of splitting it into *.S because MODULE_* tags are not supported in *.S files. I also cleaned up kernel/.gitignore; "config_data.gz" is unneeded because the top-level .gitignore takes care of the "*.gz" pattern. [yamada.masahiro@socionext.com: v2] Link: http://lkml.kernel.org/r/1550108893-21226-1-git-send-email-yamada.masahiro@socionext.com Link: http://lkml.kernel.org/r/1549941160-8084-1-git-send-email-yamada.masahiro@socionext.com Signed-off-by:
Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Alexander Popov <alex.popov@linux.com> Cc: Kees Cook <keescook@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Richard Guy Briggs <rgb@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Gustavo A. R. Silva authored
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; void *entry[]; }; instance = kzalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Link: http://lkml.kernel.org/r/20190109172445.GA15908@embeddedor Signed-off-by:
Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by:
Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Christian Brauner authored
Currently, when writing echo 18446744073709551616 > /proc/sys/fs/file-max /proc/sys/fs/file-max will overflow and be set to 0. That quickly crashes the system. This commit sets the max and min value for file-max. The max value is set to long int. Any higher value cannot currently be used as the percpu counters are long ints and not unsigned integers. Note that the file-max value is ultimately parsed via __do_proc_doulongvec_minmax(). This function does not report error when min or max are exceeded. Which means if a value largen that long int is written userspace will not receive an error instead the old value will be kept. There is an argument to be made that this should be changed and __do_proc_doulongvec_minmax() should return an error when a dedicated min or max value are exceeded. However this has the potential to break userspace so let's defer this to an RFC patch. Link: http://lkml.kernel.org/r/20190107222700.15954-3-christian@brauner.io Signed-off-by:
Christian Brauner <christian@brauner.io> Acked-by:
Kees Cook <keescook@chromium.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Waiman Long <longman@redhat.com> [christian@brauner.io: v4] Link: http://lkml.kernel.org/r/20190210203943.8227-3-christian@brauner.io Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Christian Brauner authored
proc_get_long() is a funny function. It uses simple_strtoul() and for a good reason. proc_get_long() wants to always succeed the parse and return the maybe incorrect value and the trailing characters to check against a pre-defined list of acceptable trailing values. However, simple_strtoul() explicitly ignores overflows which can cause funny things like the following to happen: echo 18446744073709551616 > /proc/sys/fs/file-max cat /proc/sys/fs/file-max 0 (Which will cause your system to silently die behind your back.) On the other hand kstrtoul() does do overflow detection but does not return the trailing characters, and also fails the parse when anything other than '\n' is a trailing character whereas proc_get_long() wants to be more lenient. Now, before adding another kstrtoul() function let's simply add a static parse strtoul_lenient() which: - fails on overflow with -ERANGE - returns the trailing characters to the caller The reason why we should fail on ERANGE is that we already do a partial fail on overflow right now. Namely, when the TMPBUFLEN is exceeded. So we already reject values such as 184467440737095516160 (21 chars) but accept values such as 18446744073709551616 (20 chars) but both are overflows. So we should just always reject 64bit overflows and not special-case this based on the number of chars. Link: http://lkml.kernel.org/r/20190107222700.15954-2-christian@brauner.io Signed-off-by:
Christian Brauner <christian@brauner.io> Acked-by:
Kees Cook <keescook@chromium.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Waiman Long <longman@redhat.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Johannes Weiner authored
This function can only be called safely from very specific scheduler contexts. Document those. Link: http://lkml.kernel.org/r/20190206150528.31198-1-hannes@cmpxchg.org Signed-off-by:
Johannes Weiner <hannes@cmpxchg.org> Suggested-by:
Andrew Morton <akpm@linux-foundation.org> Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Rasmus Villemoes authored
For symmetry with ddebug_remove_module, and to avoid a bit of ifdeffery in module.c, move the declaration of ddebug_add_module inside #if defined(CONFIG_DYNAMIC_DEBUG) and add a corresponding no-op stub in the #else branch. Link: http://lkml.kernel.org/r/20190212214150.4807-10-linux@rasmusvillemoes.dk Signed-off-by:
Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by:
Jason Baron <jbaron@akamai.com> Cc: David Sterba <dsterba@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: "Rafael J . Wysocki" <rafael.j.wysocki@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Rasmus Villemoes authored
This serves two purposes: First, we get a diagnostic if (though extremely unlikely), any of the calls of ddebug_add_module for built-in code fails, effectively disabling dynamic_debug. Second, I want to make struct _ddebug opaque, and avoid accessing any of its members outside dynamic_debug.[ch]. Link: http://lkml.kernel.org/r/20190212214150.4807-9-linux@rasmusvillemoes.dk Signed-off-by:
Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by:
Jason Baron <jbaron@akamai.com> Cc: David Sterba <dsterba@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: "Rafael J . Wysocki" <rafael.j.wysocki@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Mathieu Malaterre authored
There is a plan to build the kernel with -Wimplicit-fallthrough and this place in the code produced a warning (W=1). This commit remove the following warning: kernel/sys.c:1748:6: warning: this statement may fall through [-Wimplicit-fallthrough=] Link: http://lkml.kernel.org/r/20190114203347.17530-1-malat@debian.org Signed-off-by:
Mathieu Malaterre <malat@debian.org> Reviewed-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Tetsuo Handa authored
Since commit a2e51445 ("kernel/hung_task.c: allow to set checking interval separately from timeout") added hung_task_check_interval_secs, setting a value different from hung_task_timeout_secs echo 0 > /proc/sys/kernel/hung_task_panic echo 120 > /proc/sys/kernel/hung_task_timeout_secs echo 5 > /proc/sys/kernel/hung_task_check_interval_secs causes confusing output as if the task was blocked for hung_task_timeout_secs seconds from the previous report. [ 399.395930] INFO: task kswapd0:75 blocked for more than 120 seconds. [ 405.027637] INFO: task kswapd0:75 blocked for more than 120 seconds. [ 410.659725] INFO: task kswapd0:75 blocked for more than 120 seconds. [ 416.292860] INFO: task kswapd0:75 blocked for more than 120 seconds. [ 421.932305] INFO: task kswapd0:75 blocked for more than 120 seconds. Although we could update t->last_switch_time after sched_show_task(t) if we want to report only every 120 seconds, reporting every 5 seconds might not be very bad for monitoring after a problematic situation has started. Thus, let's use continuously blocked time instead of updating previously reported time. [ 677.985011] INFO: task kswapd0:80 blocked for more than 122 seconds. [ 693.856126] INFO: task kswapd0:80 blocked for more than 138 seconds. [ 709.728075] INFO: task kswapd0:80 blocked for more than 154 seconds. [ 725.600018] INFO: task kswapd0:80 blocked for more than 170 seconds. [ 741.473133] INFO: task kswapd0:80 blocked for more than 186 seconds. Link: http://lkml.kernel.org/r/1551175083-10669-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by:
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by:
Dmitry Vyukov <dvyukov@google.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Valdis Kletnieks authored
sparse complains: CHECK kernel/hung_task.c kernel/hung_task.c:28:19: warning: symbol 'sysctl_hung_task_check_count' was not declared. Should it be static? kernel/hung_task.c:42:29: warning: symbol 'sysctl_hung_task_timeout_secs' was not declared. Should it be static? kernel/hung_task.c:47:29: warning: symbol 'sysctl_hung_task_check_interval_secs' was not declared. Should it be static? kernel/hung_task.c:49:19: warning: symbol 'sysctl_hung_task_warnings' was not declared. Should it be static? kernel/hung_task.c:61:28: warning: symbol 'sysctl_hung_task_panic' was not declared. Should it be static? kernel/hung_task.c:219:5: warning: symbol 'proc_dohung_task_timeout_secs' was not declared. Should it be static? Add the appropriate header file to provide declarations. Link: http://lkml.kernel.org/r/467.1548649525@turing-police.cc.vt.edu Signed-off-by:
Valdis Kletnieks <valdis.kletnieks@vt.edu> Cc: "Paul E. McKenney" <paulmck@linux.ibm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
YueHaibing authored
Use DEFINE_DEBUGFS_ATTRIBUTE rather than DEFINE_SIMPLE_ATTRIBUTE for debugfs files. Semantic patch information: Rationale: DEFINE_SIMPLE_ATTRIBUTE + debugfs_create_file() imposes some significant overhead as compared to DEFINE_DEBUGFS_ATTRIBUTE + debugfs_create_file_unsafe(). Generated by: scripts/coccinelle/api/debugfs/debugfs_simple_attr.cocci The _unsafe() part suggests that some of them "safeness responsibilities" are now panic.c responsibilities. The patch is OK since panic's clear_warn_once_fops struct file_operations is safe against removal, so we don't have to use otherwise necessary debugfs_file_get()/debugfs_file_put(). [sergey.senozhatsky.work@gmail.com: changelog addition] Link: http://lkml.kernel.org/r/1545990861-158097-1-git-send-email-yuehaibing@huawei.com Signed-off-by:
YueHaibing <yuehaibing@huawei.com> Reviewed-by:
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@suse.de> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Petr Mladek <pmladek@suse.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Mar 07, 2019
-
-
Daniel Borkmann authored
Non-zero imm value in the second part of the ldimm64 instruction for BPF_PSEUDO_MAP_FD is invalid, and thus must be rejected. The map fd only ever sits in the first instructions' imm field. None of the BPF loaders known to us are using it, so risk of regression is minimal. For clarity and consistency, the few insn->{src_reg,imm} occurrences are rewritten into insn[0].{src_reg,imm}. Add a test case to the BPF selftest suite as well. Fixes: 0246e64d ("bpf: handle pseudo BPF_LD_IMM64 insn") Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Acked-by:
Song Liu <songliubraving@fb.com> Signed-off-by:
Alexei Starovoitov <ast@kernel.org>
-
Arnd Bergmann authored
When CONFIG_BPF_SYSCALL or CONFIG_SYSCTL is disabled, we get a warning about an unused function: kernel/sysctl.c:3331:12: error: 'proc_dointvec_minmax_bpf_stats' defined but not used [-Werror=unused-function] static int proc_dointvec_minmax_bpf_stats(struct ctl_table *table, int write, The CONFIG_BPF_SYSCALL check was already handled, but the SYSCTL check is needed on top. Fixes: 492ecee8 ("bpf: enable program stats") Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Reviewed-by:
Kees Cook <keescook@chromium.org> Reviewed-by:
Christian Brauner <christian@brauner.io> Acked-by:
Song Liu <songliubraving@fb.com> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net>
-
- Mar 06, 2019
-
-
Joerg Roedel authored
The function returns the maximum size that can be mapped using DMA-API functions. The patch also adds the implementation for direct DMA and a new dma_map_ops pointer so that other implementations can expose their limit. Cc: stable@vger.kernel.org Reviewed-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Joerg Roedel <jroedel@suse.de> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
Joerg Roedel authored
This function will be used from dma_direct code to determine the maximum segment size of a dma mapping. Cc: stable@vger.kernel.org Reviewed-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Joerg Roedel <jroedel@suse.de> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
Joerg Roedel authored
The function returns the maximum size that can be remapped by the SWIOTLB implementation. This function will be later exposed to users through the DMA-API. Cc: stable@vger.kernel.org Reviewed-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Joerg Roedel <jroedel@suse.de> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
Arnd Bergmann authored
As John Stultz noticed, my y2038 syscall series caused a link failure when CONFIG_SYSVIPC is disabled but CONFIG_COMPAT is enabled: arch/arm64/kernel/sys32.o:(.rodata+0x960): undefined reference to `__arm64_compat_sys_old_semctl' arch/arm64/kernel/sys32.o:(.rodata+0x980): undefined reference to `__arm64_compat_sys_old_msgctl' arch/arm64/kernel/sys32.o:(.rodata+0x9a0): undefined reference to `__arm64_compat_sys_old_shmctl' Add the missing entries in kernel/sys_ni.c for the new system calls. Cc: Laura Abbott <labbott@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Arnd Bergmann <arnd@arndb.de>
-
Johannes Weiner authored
Cgroup has a standardized poll/notification mechanism for waking all pollers on all fds when a filesystem node changes. To allow polling for custom events, add a .poll callback that can override the default. This is in preparation for pollable cgroup pressure files which have per-fd trigger configurations. Link: http://lkml.kernel.org/r/20190124211518.244221-3-surenb@google.com Signed-off-by:
Johannes Weiner <hannes@cmpxchg.org> Signed-off-by:
Suren Baghdasaryan <surenb@google.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Mel Gorman authored
Compaction is inherently race-prone as a suitable page freed during compaction can be allocated by any parallel task. This patch uses a capture_control structure to isolate a page immediately when it is freed by a direct compactor in the slow path of the page allocator. The intent is to avoid redundant scanning. 5.0.0-rc1 5.0.0-rc1 selective-v3r17 capture-v3r19 Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%* Amean fault-both-3 2582.11 ( 0.00%) 2563.68 ( 0.71%) Amean fault-both-5 4500.26 ( 0.00%) 4233.52 ( 5.93%) Amean fault-both-7 5819.53 ( 0.00%) 6333.65 ( -8.83%) Amean fault-both-12 9321.18 ( 0.00%) 9759.38 ( -4.70%) Amean fault-both-18 9782.76 ( 0.00%) 10338.76 ( -5.68%) Amean fault-both-24 15272.81 ( 0.00%) 13379.55 * 12.40%* Amean fault-both-30 15121.34 ( 0.00%) 16158.25 ( -6.86%) Amean fault-both-32 18466.67 ( 0.00%) 18971.21 ( -2.73%) Latency is only moderately affected but the devil is in the details. A closer examination indicates that base page fault latency is reduced but latency of huge pages is increased as it takes creater care to succeed. Part of the "problem" is that allocation success rates are close to 100% even when under pressure and compaction gets harder 5.0.0-rc1 5.0.0-rc1 selective-v3r17 capture-v3r19 Percentage huge-3 96.70 ( 0.00%) 98.23 ( 1.58%) Percentage huge-5 96.99 ( 0.00%) 95.30 ( -1.75%) Percentage huge-7 94.19 ( 0.00%) 97.24 ( 3.24%) Percentage huge-12 94.95 ( 0.00%) 97.35 ( 2.53%) Percentage huge-18 96.74 ( 0.00%) 97.30 ( 0.58%) Percentage huge-24 97.07 ( 0.00%) 97.55 ( 0.50%) Percentage huge-30 95.69 ( 0.00%) 98.50 ( 2.95%) Percentage huge-32 96.70 ( 0.00%) 99.27 ( 2.65%) And scan rates are reduced as expected by 6% for the migration scanner and 29% for the free scanner indicating that there is less redundant work. Compaction migrate scanned 20815362 19573286 Compaction free scanned 16352612 11510663 [mgorman@techsingularity.net: remove redundant check] Link: http://lkml.kernel.org/r/20190201143853.GH9565@techsingularity.net Link: http://lkml.kernel.org/r/20190118175136.31341-23-mgorman@techsingularity.net Signed-off-by:
Mel Gorman <mgorman@techsingularity.net> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
sysctl_extfrag_handler() neglects to propagate the return value from proc_dointvec_minmax() to its caller. It's a wrapper that doesn't need to exist, so just use proc_dointvec_minmax() directly. Link: http://lkml.kernel.org/r/20190104032557.3056-1-willy@infradead.org Signed-off-by:
Matthew Wilcox <willy@infradead.org> Reported-by:
Aditya Pakki <pakki001@umn.edu> Acked-by:
Mel Gorman <mgorman@techsingularity.net> Acked-by:
Randy Dunlap <rdunlap@infradead.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Anshuman Khandual authored
Patch series "Replace all open encodings for NUMA_NO_NODE", v3. All these places for replacement were found by running the following grep patterns on the entire kernel code. Please let me know if this might have missed some instances. This might also have replaced some false positives. I will appreciate suggestions, inputs and review. 1. git grep "nid == -1" 2. git grep "node == -1" 3. git grep "nid = -1" 4. git grep "node = -1" This patch (of 2): At present there are multiple places where invalid node number is encoded as -1. Even though implicitly understood it is always better to have macros in there. Replace these open encodings for an invalid node number with the global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like 'invalid node' from various places redirecting them to a common definition. Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.com Signed-off-by:
Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [ixgbe] Acked-by: Jens Axboe <axboe@kernel.dk> [mtip32xx] Acked-by: Vinod Koul <vkoul@kernel.org> [dmaengine.c] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Doug Ledford <dledford@redhat.com> [drivers/infiniband] Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Hans Verkuil <hverkuil@xs4all.nl> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
David Hildenbrand authored
The content of pages that are marked PG_offline is not of interest (e.g. inflated by a balloon driver), let's skip these pages. In saveable_highmem_page(), move the PageReserved() check to a new check along with the PageOffline() check to separate it from the swsusp checks. [david@redhat.com: v2] Link: http://lkml.kernel.org/r/20181122100627.5189-9-david@redhat.com Link: http://lkml.kernel.org/r/20181119101616.8901-9-david@redhat.com Signed-off-by:
David Hildenbrand <david@redhat.com> Acked-by:
Pavel Machek <pavel@ucw.cz> Acked-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Len Brown <len.brown@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Christian Hansen <chansen3@cisco.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Juergen Gross <jgross@suse.com> Cc: Julien Freche <jfreche@vmware.com> Cc: Kairui Song <kasong@redhat.com> Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Lianbo Jiang <lijiang@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Miles Chen <miles.chen@mediatek.com> Cc: Nadav Amit <namit@vmware.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Pankaj gupta <pagupta@redhat.com> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xavier Deguillard <xdeguillard@vmware.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
David Hildenbrand authored
Let's use pfn_to_online_page() instead of pfn_to_page() when checking for saveable pages to not save/restore offline memory sections. Link: http://lkml.kernel.org/r/20181119101616.8901-8-david@redhat.com Signed-off-by:
David Hildenbrand <david@redhat.com> Suggested-by:
Michal Hocko <mhocko@kernel.org> Acked-by:
Michal Hocko <mhocko@suse.com> Acked-by:
Pavel Machek <pavel@ucw.cz> Acked-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Len Brown <len.brown@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Christian Hansen <chansen3@cisco.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Juergen Gross <jgross@suse.com> Cc: Julien Freche <jfreche@vmware.com> Cc: Kairui Song <kasong@redhat.com> Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Lianbo Jiang <lijiang@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Miles Chen <miles.chen@mediatek.com> Cc: Nadav Amit <namit@vmware.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Pankaj gupta <pagupta@redhat.com> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xavier Deguillard <xdeguillard@vmware.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
David Hildenbrand authored
Right now, pages inflated as part of a balloon driver will be dumped by dump tools like makedumpfile. While XEN is able to check in the crash kernel whether a certain pfn is actuall backed by memory in the hypervisor (see xen_oldmem_pfn_is_ram) and optimize this case, dumps of other balloon inflated memory will essentially result in zero pages getting allocated by the hypervisor and the dump getting filled with this data. The allocation and reading of zero pages can directly be avoided if a dumping tool could know which pages only contain stale information not to be dumped. We now have PG_offline which can be (and already is by virtio-balloon) used for marking pages as logically offline. Follow up patches will make use of this flag also in other balloon implementations. Let's export PG_offline via PAGE_OFFLINE_MAPCOUNT_VALUE, so makedumpfile can directly skip pages that are logically offline and the content therefore stale. Please note that this is also helpful for a problem we were seeing under Hyper-V: Dumping logically offline memory (pages kept fake offline while onlining a section via online_page_callback) would under some condicions result in a kernel panic when dumping them. Link: http://lkml.kernel.org/r/20181119101616.8901-4-david@redhat.com Signed-off-by:
David Hildenbrand <david@redhat.com> Acked-by:
Michael S. Tsirkin <mst@redhat.com> Acked-by:
Dave Young <dyoung@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Baoquan He <bhe@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Lianbo Jiang <lijiang@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Christian Hansen <chansen3@cisco.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Juergen Gross <jgross@suse.com> Cc: Julien Freche <jfreche@vmware.com> Cc: Kairui Song <kasong@redhat.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Len Brown <len.brown@intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Miles Chen <miles.chen@mediatek.com> Cc: Nadav Amit <namit@vmware.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Pankaj gupta <pagupta@redhat.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xavier Deguillard <xdeguillard@vmware.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Mar 05, 2019
-
-
Tom Zanussi authored
Because there may be random garbage beyond a string's null terminator, code that might use the entire comm array e.g. histogram keys, can give unexpected results if that garbage is copied in too, so avoid that possibility by using strncpy instead of memcpy. Link: http://lkml.kernel.org/r/1d6ebac26570c2a29ce9fb575379f17ef5c8b81b.1551802084.git.tom.zanussi@linux.intel.com Signed-off-by:
Tom Zanussi <tom.zanussi@linux.intel.com> Suggested-by:
Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Tom Zanussi authored
Because there may be random garbage beyond a string's null terminator, code that might use the entire comm array e.g. histogram keys, can give unexpected results if that garbage is copied in too, so avoid that possibility by using strncpy instead of memcpy. Link: http://lkml.kernel.org/r/1eb9f096a8086c3c82c7fc087c900005143cec54.1551802084.git.tom.zanussi@linux.intel.com Signed-off-by:
Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org>
-