1. 19 May, 2017 3 commits
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 2fe296a6
      Linus Torvalds authored
      Pull arm64 fixes/cleanups from Catalin Marinas:
       - Avoid taking a mutex in the secondary CPU bring-up path when
         interrupts are disabled
       - Ignore perf exclude_hv when the kernel is running in Hyp mode
       - Remove redundant instruction in cmpxchg
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64/cpufeature: don't use mutex in bringup path
        arm64: perf: Ignore exclude_hv when kernel is running in HYP
        arm64: Remove redundant mov from LL/SC cmpxchg
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · e5a489ab
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "The headliner is a fix for FP/VMX register corruption when using
        transactional memory, and a new selftest to go with it.
        Then there's the virt_addr_valid() fix, currently HARDENDED_USERCOPY
        is tripping on that causing some machines to crash.
        A few other fairly minor fixes for long tail things, and a couple of
        fixes for code we just merged.
        Thanks to: Breno Leitao, Gautham Shenoy, Michael Neuling, Naveen Rao.
        Nicholas Piggin, Paul Mackerras"
      * tag 'powerpc-4.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/mm: Fix virt_addr_valid() etc. on 64-bit hash
        powerpc/mm: Fix crash in page table dump with huge pages
        powerpc/kprobes: Fix handling of instruction emulation on probe re-entry
        powerpc/powernv: Set NAPSTATELOST after recovering paca on P9 DD1
        selftests/powerpc: Test TM and VMX register state
        powerpc/tm: Fix FP and VMX register corruption
        powerpc/modules: If mprofile-kernel is enabled add it to vermagic
    • Michael Ellerman's avatar
      powerpc/mm: Fix virt_addr_valid() etc. on 64-bit hash · e41e53cd
      Michael Ellerman authored
      virt_addr_valid() is supposed to tell you if it's OK to call virt_to_page() on
      an address. What this means in practice is that it should only return true for
      addresses in the linear mapping which are backed by a valid PFN.
      We are failing to properly check that the address is in the linear mapping,
      because virt_to_pfn() will return a valid looking PFN for more or less any
      address. That bug is actually caused by __pa(), used in virt_to_pfn().
      eg: __pa(0xc000000000010000) = 0x10000  # Good
          __pa(0xd000000000010000) = 0x10000  # Bad!
          __pa(0x0000000000010000) = 0x10000  # Bad!
      This started happening after commit bdbc29c1 ("powerpc: Work around gcc
      miscompilation of __pa() on 64-bit") (Aug 2013), where we changed the definition
      of __pa() to work around a GCC bug. Prior to that we subtracted PAGE_OFFSET from
      the value passed to __pa(), meaning __pa() of a 0xd or 0x0 address would give
      you something bogus back.
      Until we can verify if that GCC bug is no longer an issue, or come up with
      another solution, this commit does the minimal fix to make virt_addr_valid()
      work, by explicitly checking that the address is in the linear mapping region.
      Fixes: bdbc29c1
       ("powerpc: Work around gcc miscompilation of __pa() on 64-bit")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Tested-by: default avatarBreno Leitao <breno.leitao@gmail.com>
  2. 18 May, 2017 21 commits
  3. 17 May, 2017 16 commits
    • Yonghong Song's avatar
      selftests/bpf: fix broken build due to types.h · 579f1d92
      Yonghong Song authored
      Commit 0a5539f6 ("bpf: Provide a linux/types.h override
      for bpf selftests.") caused a build failure for tools/testing/selftest/bpf
      because of some missing types:
          $ make -C tools/testing/selftests/bpf/
          In file included from /home/yhs/work/net-next/tools/testing/selftests/bpf/test_pkt_access.c:8:
          ../../../include/uapi/linux/bpf.h:170:3: error: unknown type name '__aligned_u64'
                          __aligned_u64   key;
          /usr/include/linux/swab.h:160:8: error: unknown type name '__always_inline'
          static __always_inline __u16 __swab16p(const __u16 *p)
      The type __aligned_u64 is defined in linux:include/uapi/linux/types.h.
      The fix is to copy missing type definition into
      Adding additional include "string.h" resolves __always_inline issue.
      Fixes: 0a5539f6
       ("bpf: Provide a linux/types.h override for bpf selftests.")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Linus Torvalds's avatar
      Merge tag 'for-4.12/dm-fixes-2' of... · dac94e29
      Linus Torvalds authored
      Merge tag 'for-4.12/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      Pull device mapper fixes from Mike Snitzer:
       - a couple DM thin provisioning fixes
       - a few request-based DM and DM multipath fixes for issues that were
         made when merging Christoph's changes with Bart's changes for 4.12
       - a DM bufio unsigned overflow fix
       - a couple pure fixes for the DM cache target.
       - various very small tweaks to the DM cache target that enable
         considerable speed improvements in the face of continuous IO. Given
         that the cache target was significantly reworked for 4.12 I see no
         reason to sit on these advances until 4.13 considering the favorable
         results associated with such minimalist tweaks.
      * tag 'for-4.12/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm cache: handle kmalloc failure allocating background_tracker struct
        dm bufio: make the parameter "retain_bytes" unsigned long
        dm mpath: multipath_clone_and_map must not return -EIO
        dm mpath: don't return -EIO from dm_report_EIO
        dm rq: add a missing break to map_request
        dm space map disk: fix some book keeping in the disk space map
        dm thin metadata: call precommit before saving the roots
        dm cache policy smq: don't do any writebacks unless IDLE
        dm cache: simplify the IDLE vs BUSY state calculation
        dm cache: track all IO to the cache rather than just the origin device's IO
        dm cache policy smq: stop preemptively demoting blocks
        dm cache policy smq: put newly promoted entries at the top of the multiqueue
        dm cache policy smq: be more aggressive about triggering a writeback
        dm cache policy smq: only demote entries in bottom half of the clean multiqueue
        dm cache: fix incorrect 'idle_time' reset in IO tracker
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 243bfd2c
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "Here are some bugfixes from I2C, especially removing a wrongly
        displayed error message for all i2c muxes"
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: xgene: Set ACPI_COMPANION_I2C
        i2c: mv64xxx: don't override deferred probing when getting irq
        i2c: mux: only print failure message on error
        i2c: mux: reg: rename label to indicate what it does
        i2c: mux: reg: put away the parent i2c adapter on probe failure
    • David S. Miller's avatar
      Merge branch 'bnxt_en-DCBX-fixes' · f917174c
      David S. Miller authored
      Michael Chan says:
      bnxt_en: DCBX fixes.
      2 bug fixes for the case where the NIC's firmware DCBX agent is enabled.
      With these fixes, we will return the proper information to lldpad.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Michael Chan's avatar
      bnxt_en: Check status of firmware DCBX agent before setting DCB_CAP_DCBX_HOST. · f667724b
      Michael Chan authored
      Otherwise, all the host based DCBX settings from lldpad will fail if the
      firmware DCBX agent is running.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Michael Chan's avatar
      bnxt_en: Call bnxt_dcb_init() after getting firmware DCBX configuration. · 87fe6032
      Michael Chan authored
      In the current code, bnxt_dcb_init() is called too early before we
      determine if the firmware DCBX agent is running or not.  As a result,
      we are not setting the DCB_CAP_DCBX_HOST and DCB_CAP_DCBX_LLD_MANAGED
      flags properly to report to DCBNL.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Eric Dumazet's avatar
      net: fix compile error in skb_orphan_partial() · 9142e900
      Eric Dumazet authored
      If CONFIG_INET is not set, net/core/sock.c can not compile :
      net/core/sock.c: In function ‘skb_orphan_partial’:
      net/core/sock.c:1810:2: error: implicit declaration of function
      ‘skb_is_tcp_pure_ack’ [-Werror=implicit-function-declaration]
        if (skb_is_tcp_pure_ack(skb))
      Fix this by always including <net/tcp.h>
      Fixes: f6ba8d33
       ("netem: fix skb_orphan_partial()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Reported-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Liam R. Howlett's avatar
      sparc/ftrace: Fix ftrace graph time measurement · 48078d2d
      Liam R. Howlett authored
      The ftrace function_graph time measurements of a given function is not
      accurate according to those recorded by ftrace using the function
      filters.  This change pulls the x86_64 fix from 'commit 722b3c74
      ("ftrace/graph: Trace function entry before updating index")' into the
      sparc specific prepare_ftrace_return which stops ftrace from
      counting interrupted tasks in the time measurement.
      Example measurements for select_task_rq_fair running "hackbench 100
      process 1000":
                    |  tracing/trace_stat/function0  |  function_graph
       Before patch |  2.802 us                      |  4.255 us
       After patch  |  2.749 us                      |  3.094 us
      Signed-off-by: default avatarLiam R. Howlett <Liam.Howlett@Oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Orlando Arias's avatar
      sparc: Fix -Wstringop-overflow warning · deba804c
      Orlando Arias authored
      GCC 7 introduced the -Wstringop-overflow flag to detect buffer overflows
      in calls to string handling functions [1][2]. Due to the way
      ``empty_zero_page'' is declared in arch/sparc/include/setup.h, this
      causes a warning to trigger at compile time in the function mem_init(),
      which is subsequently converted to an error. The ensuing patch fixes
      this issue and aligns the declaration of empty_zero_page to that of
      other architectures. Thank you.
      [1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02308.html
      [2] https://gcc.gnu.org/gcc-7/changes.html
      Signed-off-by: default avatarOrlando Arias <oarias@knights.ucf.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Nitin Gupta's avatar
      sparc64: Fix mapping of 64k pages with MAP_FIXED · b6c41cb0
      Nitin Gupta authored
      An incorrect huge page alignment check caused
      mmap failure for 64K pages when MAP_FIXED is used
      with address not aligned to HPAGE_SIZE.
      Orabug: 25885991
      Fixes: dcd1912d
       ("sparc64: Add 64K page size support")
      Signed-off-by: default avatarNitin Gupta <nitin.m.gupta@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Craig Gallek's avatar
      ipv6: Prevent overrun when parsing v6 header options · 2423496a
      Craig Gallek authored
      The KASAN warning repoted below was discovered with a syzkaller
      program.  The reproducer is basically:
        int s = socket(AF_INET6, SOCK_RAW, NEXTHDR_HOP);
        send(s, &one_byte_of_data, 1, MSG_MORE);
        send(s, &more_than_mtu_bytes_data, 2000, 0);
      The socket() call sets the nexthdr field of the v6 header to
      NEXTHDR_HOP, the first send call primes the payload with a non zero
      byte of data, and the second send call triggers the fragmentation path.
      The fragmentation code tries to parse the header options in order
      to figure out where to insert the fragment option.  Since nexthdr points
      to an invalid option, the calculation of the size of the network header
      can made to be much larger than the linear section of the skb and data
      is read outside of it.
      This fix makes ip6_find_1stfrag return an error if it detects
      running out-of-bounds.
      [   42.361487] ==================================================================
      [   42.364412] BUG: KASAN: slab-out-of-bounds in ip6_fragment+0x11c8/0x3730
      [   42.365471] Read of size 840 at addr ffff88000969e798 by task ip6_fragment-oo/3789
      [   42.366469]
      [   42.366696] CPU: 1 PID: 3789 Comm: ip6_fragment-oo Not tainted 4.11.0+ #41
      [   42.367628] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
      [   42.368824] Call Trace:
      [   42.369183]  dump_stack+0xb3/0x10b
      [   42.369664]  print_address_description+0x73/0x290
      [   42.370325]  kasan_report+0x252/0x370
      [   42.370839]  ? ip6_fragment+0x11c8/0x3730
      [   42.371396]  check_memory_region+0x13c/0x1a0
      [   42.371978]  memcpy+0x23/0x50
      [   42.372395]  ip6_fragment+0x11c8/0x3730
      [   42.372920]  ? nf_ct_expect_unregister_notifier+0x110/0x110
      [   42.373681]  ? ip6_copy_metadata+0x7f0/0x7f0
      [   42.374263]  ? ip6_forward+0x2e30/0x2e30
      [   42.374803]  ip6_finish_output+0x584/0x990
      [   42.375350]  ip6_output+0x1b7/0x690
      [   42.375836]  ? ip6_finish_output+0x990/0x990
      [   42.376411]  ? ip6_fragment+0x3730/0x3730
      [   42.376968]  ip6_local_out+0x95/0x160
      [   42.377471]  ip6_send_skb+0xa1/0x330
      [   42.377969]  ip6_push_pending_frames+0xb3/0xe0
      [   42.378589]  rawv6_sendmsg+0x2051/0x2db0
      [   42.379129]  ? rawv6_bind+0x8b0/0x8b0
      [   42.379633]  ? _copy_from_user+0x84/0xe0
      [   42.380193]  ? debug_check_no_locks_freed+0x290/0x290
      [   42.380878]  ? ___sys_sendmsg+0x162/0x930
      [   42.381427]  ? rcu_read_lock_sched_held+0xa3/0x120
      [   42.382074]  ? sock_has_perm+0x1f6/0x290
      [   42.382614]  ? ___sys_sendmsg+0x167/0x930
      [   42.383173]  ? lock_downgrade+0x660/0x660
      [   42.383727]  inet_sendmsg+0x123/0x500
      [   42.384226]  ? inet_sendmsg+0x123/0x500
      [   42.384748]  ? inet_recvmsg+0x540/0x540
      [   42.385263]  sock_sendmsg+0xca/0x110
      [   42.385758]  SYSC_sendto+0x217/0x380
      [   42.386249]  ? SYSC_connect+0x310/0x310
      [   42.386783]  ? __might_fault+0x110/0x1d0
      [   42.387324]  ? lock_downgrade+0x660/0x660
      [   42.387880]  ? __fget_light+0xa1/0x1f0
      [   42.388403]  ? __fdget+0x18/0x20
      [   42.388851]  ? sock_common_setsockopt+0x95/0xd0
      [   42.389472]  ? SyS_setsockopt+0x17f/0x260
      [   42.390021]  ? entry_SYSCALL_64_fastpath+0x5/0xbe
      [   42.390650]  SyS_sendto+0x40/0x50
      [   42.391103]  entry_SYSCALL_64_fastpath+0x1f/0xbe
      [   42.391731] RIP: 0033:0x7fbbb711e383
      [   42.392217] RSP: 002b:00007ffff4d34f28 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      [   42.393235] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbbb711e383
      [   42.394195] RDX: 0000000000001000 RSI: 00007ffff4d34f60 RDI: 0000000000000003
      [   42.395145] RBP: 0000000000000046 R08: 00007ffff4d34f40 R09: 0000000000000018
      [   42.396056] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000400aad
      [   42.396598] R13: 0000000000000066 R14: 00007ffff4d34ee0 R15: 00007fbbb717af00
      [   42.397257]
      [   42.397411] Allocated by task 3789:
      [   42.397702]  save_stack_trace+0x16/0x20
      [   42.398005]  save_stack+0x46/0xd0
      [   42.398267]  kasan_kmalloc+0xad/0xe0
      [   42.398548]  kasan_slab_alloc+0x12/0x20
      [   42.398848]  __kmalloc_node_track_caller+0xcb/0x380
      [   42.399224]  __kmalloc_reserve.isra.32+0x41/0xe0
      [   42.399654]  __alloc_skb+0xf8/0x580
      [   42.400003]  sock_wmalloc+0xab/0xf0
      [   42.400346]  __ip6_append_data.isra.41+0x2472/0x33d0
      [   42.400813]  ip6_append_data+0x1a8/0x2f0
      [   42.401122]  rawv6_sendmsg+0x11ee/0x2db0
      [   42.401505]  inet_sendmsg+0x123/0x500
      [   42.401860]  sock_sendmsg+0xca/0x110
      [   42.402209]  ___sys_sendmsg+0x7cb/0x930
      [   42.402582]  __sys_sendmsg+0xd9/0x190
      [   42.402941]  SyS_sendmsg+0x2d/0x50
      [   42.403273]  entry_SYSCALL_64_fastpath+0x1f/0xbe
      [   42.403718]
      [   42.403871] Freed by task 1794:
      [   42.404146]  save_stack_trace+0x16/0x20
      [   42.404515]  save_stack+0x46/0xd0
      [   42.404827]  kasan_slab_free+0x72/0xc0
      [   42.405167]  kfree+0xe8/0x2b0
      [   42.405462]  skb_free_head+0x74/0xb0
      [   42.405806]  skb_release_data+0x30e/0x3a0
      [   42.406198]  skb_release_all+0x4a/0x60
      [   42.406563]  consume_skb+0x113/0x2e0
      [   42.406910]  skb_free_datagram+0x1a/0xe0
      [   42.407288]  netlink_recvmsg+0x60d/0xe40
      [   42.407667]  sock_recvmsg+0xd7/0x110
      [   42.408022]  ___sys_recvmsg+0x25c/0x580
      [   42.408395]  __sys_recvmsg+0xd6/0x190
      [   42.408753]  SyS_recvmsg+0x2d/0x50
      [   42.409086]  entry_SYSCALL_64_fastpath+0x1f/0xbe
      [   42.409513]
      [   42.409665] The buggy address belongs to the object at ffff88000969e780
      [   42.409665]  which belongs to the cache kmalloc-512 of size 512
      [   42.410846] The buggy address is located 24 bytes inside of
      [   42.410846]  512-byte region [ffff88000969e780, ffff88000969e980)
      [   42.411941] The buggy address belongs to the page:
      [   42.412405] page:ffffea000025a780 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
      [   42.413298] flags: 0x100000000008100(slab|head)
      [   42.413729] raw: 0100000000008100 0000000000000000 0000000000000000 00000001800c000c
      [   42.414387] raw: ffffea00002a9500 0000000900000007 ffff88000c401280 0000000000000000
      [   42.415074] page dumped because: kasan: bad access detected
      [   42.415604]
      [   42.415757] Memory state around the buggy address:
      [   42.416222]  ffff88000969e880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   42.416904]  ffff88000969e900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   42.417591] >ffff88000969e980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   42.418273]                    ^
      [   42.418588]  ffff88000969ea00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [   42.419273]  ffff88000969ea80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [   42.419882] ==================================================================
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Masahiro Yamada's avatar
      kbuild: skip install/check of headers right under uapi directories · 05d8cba4
      Masahiro Yamada authored
      Since commit 61562f98 ("uapi: export all arch specifics
      directories"), "make INSTALL_HDR_PATH=$root/usr headers_install"
      deletes standard glibc headers and others in $(root)/usr/include.
      The cause of the issue is that headers_install now starts descending
      from arch/$(hdr-arch)/include/uapi with $(root)/usr/include for its
      destination when installing asm headers.  So, headers already there
      are assumed to be unwanted.
      When headers_install starts descending from include/uapi with
      $(root)/usr/include for its destination, it works around the problem
      by creating an dummy destination $(root)/usr/include/uapi, but this
      is tricky.
      To fix the problem in a clean way is to skip headers install/check
      in include/uapi and arch/$(hdr-arch)/include/uapi because we know
      there are only sub-directories in uapi directories.  A good side
      effect is the empty destination $(root)/usr/include/uapi will go
      I am also removing the trailing slash in the headers_check target to
      skip checking in arch/$(hdr-arch)/include/uapi.
      Fixes: 61562f98
       ("uapi: export all arch specifics directories")
      Reported-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Tested-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
    • Mark Rutland's avatar
      arm64/cpufeature: don't use mutex in bringup path · 63a1e1c9
      Mark Rutland authored
      Currently, cpus_set_cap() calls static_branch_enable_cpuslocked(), which
      must take the jump_label mutex.
      We call cpus_set_cap() in the secondary bringup path, from the idle
      thread where interrupts are disabled. Taking a mutex in this path "is a
      NONO" regardless of whether it's contended, and something we must avoid.
      We didn't spot this until recently, as ___might_sleep() won't warn for
      this case until all CPUs have been brought up.
      This patch avoids taking the mutex in the secondary bringup path. The
      poking of static keys is deferred until enable_cpu_capabilities(), which
      runs in a suitable context on the boot CPU. To account for the static
      keys being set later, cpus_have_const_cap() is updated to use another
      static key to check whether the const cap keys have been initialised,
      falling back to the caps bitmap until this is the case.
      This means that users of cpus_have_const_cap() gain should only gain a
      single additional NOP in the fast path once the const caps are
      initialised, but should always see the current cap value.
      The hyp code should never dereference the caps array, since the caps are
      initialized before we run the module initcall to initialise hyp. A check
      is added to the hyp init code to document this requirement.
      This change will sidestep a number of issues when the upcoming hotplug
      locking rework is merged.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarMarc Zyniger <marc.zyngier@arm.com>
      Reviewed-by: default avatarSuzuki Poulose <suzuki.poulose@arm.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Sewior <bigeasy@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
    • Ihar Hrachyshka's avatar
      neighbour: update neigh timestamps iff update is effective · 77d71233
      Ihar Hrachyshka authored
      It's a common practice to send gratuitous ARPs after moving an
      IP address to another device to speed up healing of a service. To
      fulfill service availability constraints, the timing of network peers
      updating their caches to point to a new location of an IP address can be
      particularly important.
      Sometimes neigh_update calls won't touch neither lladdr nor state, for
      example if an update arrives in locktime interval. The neigh->updated
      value is tested by the protocol specific neigh code, which in turn
      will influence whether NEIGH_UPDATE_F_OVERRIDE gets set in the
      call to neigh_update() or not. As a result, we may effectively ignore
      the update request, bailing out of touching the neigh entry, except that
      we still bump its timestamps inside neigh_update.
      This may be a problem for updates arriving in quick succession. For
      example, consider the following scenario:
      A service is moved to another device with its IP address. The new device
      sends three gratuitous ARP requests into the network with ~1 seconds
      interval between them. Just before the first request arrives to one of
      network peer nodes, its neigh entry for the IP address transitions from
      STALE to DELAY.  This transition, among other things, updates
      neigh->updated. Once the kernel receives the first gratuitous ARP, it
      ignores it because its arrival time is inside the locktime interval. The
      kernel still bumps neigh->updated. Then the second gratuitous ARP
      request arrives, and it's also ignored because it's still in the (new)
      locktime interval. Same happens for the third request. The node
      eventually heals itself (after delay_first_probe_time seconds since the
      initial transition to DELAY state), but it just wasted some time and
      require a new ARP request/reply round trip. This unfortunate behaviour
      both puts more load on the network, as well as reduces service
      This patch changes neigh_update so that it bumps neigh->updated (as well
      as neigh->confirmed) only once we are sure that either lladdr or entry
      state will change). In the scenario described above, it means that the
      second gratuitous ARP request will actually update the entry lladdr.
      Ideally, we would update the neigh entry on the very first gratuitous
      ARP request. The locktime mechanism is designed to ignore ARP updates in
      a short timeframe after a previous ARP update was honoured by the kernel
      layer. This would require tracking timestamps for state transitions
      separately from timestamps when actual updates are received. This would
      probably involve changes in neighbour struct. Therefore, the patch
      doesn't tackle the issue of the first gratuitous APR ignored, leaving
      it for a follow-up.
      Signed-off-by: default avatarIhar Hrachyshka <ihrachys@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Ihar Hrachyshka's avatar
      arp: honour gratuitous ARP _replies_ · 23d268eb
      Ihar Hrachyshka authored
      When arp_accept is 1, gratuitous ARPs are supposed to override matching
      entries irrespective of whether they arrive during locktime. This was
      implemented in commit 56022a8f
       ("ipv4: arp: update neighbour address
      when a gratuitous arp is received and arp_accept is set")
      There is a glitch in the patch though. RFC 2002, section 4.6, "ARP,
      Proxy ARP, and Gratuitous ARP", defines gratuitous ARPs so that they can
      be either of Request or Reply type. Those Reply gratuitous ARPs can be
      triggered with standard tooling, for example, arping -A option does just
      This patch fixes the glitch, making both Request and Reply flavours of
      gratuitous ARPs to behave identically.
      As per RFC, if gratuitous ARPs are of Reply type, their Target Hardware
      Address field should also be set to the link-layer address to which this
      cache entry should be updated. The field is present in ARP over Ethernet
      but not in IEEE 1394. In this patch, I don't consider any broadcasted
      ARP replies as gratuitous if the field is not present, to conform the
      standard. It's not clear whether there is such a thing for IEEE 1394 as
      a gratuitous ARP reply; until it's cleared up, we will ignore such
      broadcasts. Note that they will still update existing ARP cache entries,
      assuming they arrive out of locktime time interval.
      Signed-off-by: default avatarIhar Hrachyshka <ihrachys@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Colin Ian King's avatar
      dm cache: handle kmalloc failure allocating background_tracker struct · 7e1b9521
      Colin Ian King authored
      Currently there is no kmalloc failure check on the allocation of
      the background_tracker struct in btracker_create(), and so a NULL return
      will lead to a NULL pointer dereference.  Add a NULL check.
      Detected by CoverityScan, CID#1416587 ("Dereference null return value")
      Fixes: b29d4986
       ("dm cache: significant rework to leverage dm-bio-prison-v2")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>