1. 09 Feb, 2018 14 commits
    • John Allen's avatar
      ibmvnic: Remove skb->protocol checks in ibmvnic_xmit · 2fa56a49
      John Allen authored
      
      
      Having these checks in ibmvnic_xmit causes problems with VLAN
      tagging and balance-alb/tlb bonding modes. The restriction they
      imposed can be removed.
      Signed-off-by: default avatarJohn Allen <jallen@linux.vnet.ibm.com>
      Signed-off-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2fa56a49
    • Daniel Borkmann's avatar
      bpf: fix rlimit in reuseport net selftest · 941ff6f1
      Daniel Borkmann authored
      
      
      Fix two issues in the reuseport_bpf selftests that were
      reported by Linaro CI:
      
        [...]
        + ./reuseport_bpf
        ---- IPv4 UDP ----
        Testing EBPF mod 10...
        Reprograming, testing mod 5...
        ./reuseport_bpf: ebpf error. log:
        0: (bf) r6 = r1
        1: (20) r0 = *(u32 *)skb[0]
        2: (97) r0 %= 10
        3: (95) exit
        processed 4 insns
        : Operation not permitted
        + echo FAIL
        [...]
        ---- IPv4 TCP ----
        Testing EBPF mod 10...
        ./reuseport_bpf: failed to bind send socket: Address already in use
        + echo FAIL
        [...]
      
      For the former adjust rlimit since this was the cause of
      failure for loading the BPF prog, and for the latter add
      SO_REUSEADDR.
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Link: https://bugs.linaro.org/show_bug.cgi?id=3502
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      941ff6f1
    • Alexey Kodanev's avatar
      sctp: verify size of a new chunk in _sctp_make_chunk() · 07f2c7ab
      Alexey Kodanev authored
      
      
      When SCTP makes INIT or INIT_ACK packet the total chunk length
      can exceed SCTP_MAX_CHUNK_LEN which leads to kernel panic when
      transmitting these packets, e.g. the crash on sending INIT_ACK:
      
      [  597.804948] skbuff: skb_over_panic: text:00000000ffae06e4 len:120168
                     put:120156 head:000000007aa47635 data:00000000d991c2de
                     tail:0x1d640 end:0xfec0 dev:<NULL>
      ...
      [  597.976970] ------------[ cut here ]------------
      [  598.033408] kernel BUG at net/core/skbuff.c:104!
      [  600.314841] Call Trace:
      [  600.345829]  <IRQ>
      [  600.371639]  ? sctp_packet_transmit+0x2095/0x26d0 [sctp]
      [  600.436934]  skb_put+0x16c/0x200
      [  600.477295]  sctp_packet_transmit+0x2095/0x26d0 [sctp]
      [  600.540630]  ? sctp_packet_config+0x890/0x890 [sctp]
      [  600.601781]  ? __sctp_packet_append_chunk+0x3b4/0xd00 [sctp]
      [  600.671356]  ? sctp_cmp_addr_exact+0x3f/0x90 [sctp]
      [  600.731482]  sctp_outq_flush+0x663/0x30d0 [sctp]
      [  600.788565]  ? sctp_make_init+0xbf0/0xbf0 [sctp]
      [  600.845555]  ? sctp_check_transmitted+0x18f0/0x18f0 [sctp]
      [  600.912945]  ? sctp_outq_tail+0x631/0x9d0 [sctp]
      [  600.969936]  sctp_cmd_interpreter.isra.22+0x3be1/0x5cb0 [sctp]
      [  601.041593]  ? sctp_sf_do_5_1B_init+0x85f/0xc30 [sctp]
      [  601.104837]  ? sctp_generate_t1_cookie_event+0x20/0x20 [sctp]
      [  601.175436]  ? sctp_eat_data+0x1710/0x1710 [sctp]
      [  601.233575]  sctp_do_sm+0x182/0x560 [sctp]
      [  601.284328]  ? sctp_has_association+0x70/0x70 [sctp]
      [  601.345586]  ? sctp_rcv+0xef4/0x32f0 [sctp]
      [  601.397478]  ? sctp6_rcv+0xa/0x20 [sctp]
      ...
      
      Here the chunk size for INIT_ACK packet becomes too big, mostly
      because of the state cookie (INIT packet has large size with
      many address parameters), plus additional server parameters.
      
      Later this chunk causes the panic in skb_put_data():
      
        skb_packet_transmit()
            sctp_packet_pack()
                skb_put_data(nskb, chunk->skb->data, chunk->skb->len);
      
      'nskb' (head skb) was previously allocated with packet->size
      from u16 'chunk->chunk_hdr->length'.
      
      As suggested by Marcelo we should check the chunk's length in
      _sctp_make_chunk() before trying to allocate skb for it and
      discard a chunk if its size bigger than SCTP_MAX_CHUNK_LEN.
      Signed-off-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leinter@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07f2c7ab
    • David S. Miller's avatar
      Merge branch 's390-qeth-fixes' · 7b30d51a
      David S. Miller authored
      
      
      Julian Wiedmann says:
      
      ====================
      s390/qeth: fixes 2018-02-09
      
      please apply the following two qeth patches for 4.16 and stable.
      
      One restricts a command quirk to the intended commandd type,
      while the other fixes an off-by-one during data transmission
      that can cause qeth to build malformed buffer descriptors.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b30d51a
    • Julian Wiedmann's avatar
      s390/qeth: fix SETIP command handling · 1c5b2216
      Julian Wiedmann authored
      send_control_data() applies some special handling to SETIP v4 IPA
      commands. But current code parses *all* command types for the SETIP
      command code. Limit the command code check to IPA commands.
      
      Fixes: 5b54e16f
      
       ("qeth: do not spin for SETIP ip assist command")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c5b2216
    • Ursula Braun's avatar
      s390/qeth: fix underestimated count of buffer elements · 89271c65
      Ursula Braun authored
      For a memory range/skb where the last byte falls onto a page boundary
      (ie. 'end' is of the form xxx...xxx001), the PFN_UP() part of the
      calculation currently doesn't round up to the next PFN due to an
      off-by-one error.
      Thus qeth believes that the skb occupies one page less than it
      actually does, and may select a IO buffer that doesn't have enough spare
      buffer elements to fit all of the skb's data.
      HW detects this as a malformed buffer descriptor, and raises an
      exception which then triggers device recovery.
      
      Fixes: 2863c613
      
       ("qeth: refactor calculation of SBALE count")
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.vnet.ibm.com>
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89271c65
    • Jason Wang's avatar
      ptr_ring: try vmalloc() when kmalloc() fails · 0bf7800f
      Jason Wang authored
      This patch switch to use kvmalloc_array() for using a vmalloc()
      fallback to help in case kmalloc() fails.
      
      Reported-by: syzbot+e4d4f9ddd4295539735d@syzkaller.appspotmail.com
      Fixes: 2e0ab8ca
      
       ("ptr_ring: array based FIFO for pointers")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0bf7800f
    • Jason Wang's avatar
      ptr_ring: fail early if queue occupies more than KMALLOC_MAX_SIZE · 6e6e41c3
      Jason Wang authored
      To avoid slab to warn about exceeded size, fail early if queue
      occupies more than KMALLOC_MAX_SIZE.
      
      Reported-by: syzbot+e4d4f9ddd4295539735d@syzkaller.appspotmail.com
      Fixes: 2e0ab8ca
      
       ("ptr_ring: array based FIFO for pointers")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e6e41c3
    • David S. Miller's avatar
      Merge branch 'stmmac-irq-fixes-cleanups' · 909ebd58
      David S. Miller authored
      
      
      Niklas Cassel says:
      
      ====================
      stmmac irq fixes/cleanups
      
      A couple of small stmmac irq fixes/cleanups.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      909ebd58
    • Niklas Cassel's avatar
      net: stmmac: remove redundant enable of PMT irq · 10291171
      Niklas Cassel authored
      
      
      For dwmac4, GMAC_INT_DEFAULT_ENABLE already includes
      GMAC_INT_PMT_EN, so it is redundant to check if hw->pmt
      is set, and if so, setting the bit again.
      
      For dwmac1000, GMAC_INT_DEFAULT_MASK does not include
      GMAC_INT_DISABLE_PMT, so it is redundant to check if
      hw->pmt is set, and if so, clearing an already cleared bit.
      
      Improve code readability by removing this redundant code.
      Signed-off-by: default avatarNiklas Cassel <niklas.cassel@axis.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10291171
    • Niklas Cassel's avatar
      net: stmmac: rename GMAC_INT_DEFAULT_MASK for dwmac4 · e879b7ab
      Niklas Cassel authored
      
      
      GMAC_INT_DEFAULT_MASK is written to the interrupt enable register.
      In previous versions of the IP (e.g. dwmac1000), this register was
      instead an interrupt mask register.
      To improve clarity and reflect reality, rename GMAC_INT_DEFAULT_MASK
      to GMAC_INT_DEFAULT_ENABLE.
      Signed-off-by: default avatarNiklas Cassel <niklas.cassel@axis.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e879b7ab
    • Niklas Cassel's avatar
      net: stmmac: discard disabled flags in interrupt status register · 1b84ca18
      Niklas Cassel authored
      The interrupt status register in both dwmac1000 and dwmac4 ignores
      interrupt enable (for dwmac4) / interrupt mask (for dwmac1000).
      Therefore, if we want to check only the bits that can actually trigger
      an irq, we have to filter the interrupt status register manually.
      
      Commit 0a764db1 ("stmmac: Discard masked flags in interrupt status
      register") fixed this for dwmac1000. Fix the same issue for dwmac4.
      
      Just like commit 0a764db1
      
       ("stmmac: Discard masked flags in
      interrupt status register"), this makes sure that we do not get
      spurious link up/link down prints.
      Signed-off-by: default avatarNiklas Cassel <niklas.cassel@axis.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b84ca18
    • Thomas Falcon's avatar
      ibmvnic: Reset long term map ID counter · faefaa97
      Thomas Falcon authored
      
      
      When allocating RX or TX buffer pools, the driver needs to provide a
      unique mapping ID to firmware for each pool. This value is assigned
      using a counter which is incremented after a new pool is created. The
      ID can be an integer ranging from 1-255. When migrating to a device
      that requests a different number of queues, this value was not being
      reset properly. As a result, after enough migrations, the counter
      exceeded the upper bound and pool creation failed. This is fixed by
      resetting the counter to one in this case.
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      faefaa97
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 437a4db6
      David S. Miller authored
      
      
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-02-09
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Two fixes for BPF sockmap in order to break up circular map references
         from programs attached to sockmap, and detaching related sockets in
         case of socket close() event. For the latter we get rid of the
         smap_state_change() and plug into ULP infrastructure, which will later
         also be used for additional features anyway such as TX hooks. For the
         second issue, dependency chain is broken up via map release callback
         to free parse/verdict programs, all from John.
      
      2) Fix a libbpf relocation issue that was found while implementing XDP
         support for Suricata project. Issue was that when clang was invoked
         with default target instead of bpf target, then various other e.g.
         debugging relevant sections are added to the ELF file that contained
         relocation entries pointing to non-BPF related sections which libbpf
         trips over instead of skipping them. Test cases for libbpf are added
         as well, from Jesper.
      
      3) Various misc fixes for bpftool and one for libbpf: a small addition
         to libbpf to make sure it recognizes all standard section prefixes.
         Then, the Makefile in bpftool/Documentation is improved to explicitly
         check for rst2man being installed on the system as we otherwise risk
         installing empty man pages; the man page for bpftool-map is corrected
         and a set of missing bash completions added in order to avoid shipping
         bpftool where the completions are only partially working, from Quentin.
      
      4) Fix applying the relocation to immediate load instructions in the
         nfp JIT which were missing a shift, from Jakub.
      
      5) Two fixes for the BPF kernel selftests: handle CONFIG_BPF_JIT_ALWAYS_ON=y
         gracefully in test_bpf.ko module and mark them as FLAG_EXPECTED_FAIL
         in this case; and explicitly delete the veth devices in the two tests
         test_xdp_{meta,redirect}.sh before dismantling the netnses as when
         selftests are run in batch mode, then workqueue to handle destruction
         might not have finished yet and thus veth creation in next test under
         same dev name would fail, from Yonghong.
      
      6) Fix test_kmod.sh to check the test_bpf.ko module path before performing
         an insmod, and fallback to modprobe. Especially the latter is useful
         when having a device under test that has the modules installed instead,
         from Naresh.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      437a4db6
  2. 08 Feb, 2018 26 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-libbpf-relo-fix-and-tests' · d977ae59
      Daniel Borkmann authored
      
      
      Jesper Dangaard Brouer says:
      
      ====================
      While playing with using libbpf for the Suricata project, we had
      issues LLVM >= 4.0.1 generating ELF files that could not be loaded
      with libbpf (tools/lib/bpf/).
      
      During the troubleshooting phase, I wrote a test program and improved
      the debugging output in libbpf.  I turned this into a selftests
      program, and it also serves as a code example for libbpf in itself.
      
      I discovered that there are at least three ELF load issues with
      libbpf.  I left them as TODO comments in (tools/testing/selftests/bpf)
      test_libbpf.sh. I've only fixed the load issue with eh_frames, and
      other types of relo-section that does not have exec flags.  We can
      work on the other issues later.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      d977ae59
    • Jesper Dangaard Brouer's avatar
      tools/libbpf: handle issues with bpf ELF objects containing .eh_frames · e3d91b0c
      Jesper Dangaard Brouer authored
      
      
      V3: More generic skipping of relo-section (suggested by Daniel)
      
      If clang >= 4.0.1 is missing the option '-target bpf', it will cause
      llc/llvm to create two ELF sections for "Exception Frames", with
      section names '.eh_frame' and '.rel.eh_frame'.
      
      The BPF ELF loader library libbpf fails when loading files with these
      sections.  The other in-kernel BPF ELF loader in samples/bpf/bpf_load.c,
      handle this gracefully. And iproute2 loader also seems to work with these
      "eh" sections.
      
      The issue in libbpf is caused by bpf_object__elf_collect() skipping
      some sections, and later when performing relocation it will be
      pointing to a skipped section, as these sections cannot be found by
      bpf_object__find_prog_by_idx() in bpf_object__collect_reloc().
      
      This is a general issue that also occurs for other sections, like
      debug sections which are also skipped and can have relo section.
      
      As suggested by Daniel.  To avoid keeping state about all skipped
      sections, instead perform a direct qlookup in the ELF object.  Lookup
      the section that the relo-section points to and check if it contains
      executable machine instructions (denoted by the sh_flags
      SHF_EXECINSTR).  Use this check to also skip irrelevant relo-sections.
      
      Note, for samples/bpf/ the '-target bpf' parameter to clang cannot be used
      due to incompatibility with asm embedded headers, that some of the samples
      include. This is explained in more details by Yonghong Song in bpf_devel_QA.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      e3d91b0c
    • Jesper Dangaard Brouer's avatar
      selftests/bpf: add selftest that use test_libbpf_open · f09b2e38
      Jesper Dangaard Brouer authored
      
      
      This script test_libbpf.sh will be part of the 'make run_tests'
      invocation, but can also be invoked manually in this directory,
      and a verbose mode can be enabled via setting the environment
      variable $VERBOSE like:
      
       $ VERBOSE=yes ./test_libbpf.sh
      
      The script contains some tests that are commented out, as they
      currently fail.  They are reminders about what we need to improve
      for the libbpf loader library.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      f09b2e38
    • Jesper Dangaard Brouer's avatar
      selftests/bpf: add test program for loading BPF ELF files · 864db336
      Jesper Dangaard Brouer authored
      
      
      V2: Moved program into selftests/bpf from tools/libbpf
      
      This program can be used on its own for testing/debugging if a
      BPF ELF-object file can be loaded with libbpf (from tools/lib/bpf).
      
      If something is wrong with the ELF object, the program have
      a --debug mode that will display the ELF sections and especially
      the skipped sections.  This allows for quickly identifying the
      problematic ELF section number, which can be corrolated with the
      readelf tool.
      
      The program signal error via return codes, and also have
      a --quiet mode, which is practical for use in scripts like
      selftests/bpf.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      864db336
    • Jesper Dangaard Brouer's avatar
      tools/libbpf: improve the pr_debug statements to contain section numbers · 077c066a
      Jesper Dangaard Brouer authored
      
      
      While debugging a bpf ELF loading issue, I needed to correlate the
      ELF section number with the failed relocation section reference.
      Thus, add section numbers/index to the pr_debug.
      
      In debug mode, also print section that were skipped.  This helped
      me identify that a section (.eh_frame) was skipped, and this was
      the reason the relocation section (.rel.eh_frame) could not find
      that section number.
      
      The section numbers corresponds to the readelf tools Section Headers [Nr].
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      077c066a
    • Jesper Dangaard Brouer's avatar
      bpf: Sync kernel ABI header with tooling header for bpf_common.h · 8c88181e
      Jesper Dangaard Brouer authored
      I recently fixed up a lot of commits that forgot to keep the tooling
      headers in sync.  And then I forgot to do the same thing in commit
      cb5f7334 ("bpf: add comments to BPF ld/ldx sizes"). Let correct
      that before people notice ;-).
      
      Lawrence did partly fix/sync this for bpf.h in commit d6d4f60c
      ("bpf: add selftest for tcpbpf").
      
      Fixes: cb5f7334
      
       ("bpf: add comments to BPF ld/ldx sizes")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      8c88181e
    • Heiner Kallweit's avatar
      net: phy: fix phy_start to consider PHY_IGNORE_INTERRUPT · 08f51385
      Heiner Kallweit authored
      This condition wasn't adjusted when PHY_IGNORE_INTERRUPT (-2) was added
      long ago. In case of PHY_IGNORE_INTERRUPT the MAC interrupt indicates
      also PHY state changes and we should do what the symbol says.
      
      Fixes: 84a527a4
      
       ("net: phylib: fix interrupts re-enablement in phy_start")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      08f51385
    • Dean Nelson's avatar
      net: thunder: change q_len's type to handle max ring size · 88c991a9
      Dean Nelson authored
      
      
      The Cavium thunder nicvf driver supports rx/tx rings of up to 65536 entries per.
      The number of entires are stored in the q_len member of struct q_desc_mem. The
      problem is that q_len being a u16, results in 65536 becoming 0.
      
      In getting pointers to descriptors in the rings, the driver uses q_len minus 1
      as a mask after incrementing the pointer, in order to go back to the beginning
      and not go past the end of the ring.
      
      With the q_len set to 0 the mask is no longer correct and the driver does go
      beyond the end of the ring, causing various ills. Usually the first thing that
      shows up is a "NETDEV WATCHDOG: enP2p1s0f1 (nicvf): transmit queue 7 timed out"
      warning.
      
      This patch remedies the problem by changing q_len to a u32.
      Signed-off-by: default avatarDean Nelson <dnelson@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      88c991a9
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-next-for-davem-2018-02-08' of... · e0c42c8e
      David S. Miller authored
      Merge tag 'wireless-drivers-next-for-davem-2018-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
      
      
      
      Kalle Valo says:
      
      ====================
      wireless-drivers-next patches for 4.16
      
      The most important here is the ssb fix, it has been reported by the
      users frequently and the fix just missed the final v4.15. Also
      numerous other fixes, mt76 had multiple problems with aggregation and
      a long standing unaligned access bug in rtlwifi is finally fixed.
      
      Major changes:
      
      ath10k
      
      * correct firmware RAM dump length for QCA6174/QCA9377
      
      * add new QCA988X device id
      
      * fix a kernel panic during pci probe
      
      * revert a recent commit which broke ath10k firmware metadata parsing
      
      ath9k
      
      * fix a noise floor regression introduced during the merge window
      
      * add new device id
      
      rtlwifi
      
      * fix unaligned access seen on ARM architecture
      
      mt76
      
      * various aggregation fixes which fix connection stalls
      
      ssb
      
      * fix b43 and b44 on non-MIPS which broke in v4.15-rc9
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0c42c8e
    • Hoang Le's avatar
      tipc: fix skb truesize/datasize ratio control · 55b3280d
      Hoang Le authored
      In commit d618d09a
      
       ("tipc: enforce valid ratio between skb truesize
      and contents") we introduced a test for ensuring that the condition
      truesize/datasize <= 4 is true for a received buffer. Unfortunately this
      test has two problems.
      
      - Because of the integer arithmetics the test
        if (skb->truesize / buf_roundup_len(skb) > 4) will miss all
        ratios [4 < ratio < 5], which was not the intention.
      - The buffer returned by skb_copy() inherits skb->truesize of the
        original buffer, which doesn't help the situation at all.
      
      In this commit, we change the ratio condition and replace skb_copy()
      with a call to skb_copy_expand() to finally get this right.
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55b3280d
    • Ivan Vecera's avatar
      net/sched: cls_u32: fix cls_u32 on filter replace · eb53f7af
      Ivan Vecera authored
      The following sequence is currently broken:
      
       # tc qdisc add dev foo ingress
       # tc filter replace dev foo protocol all ingress \
         u32 match u8 0 0 action mirred egress mirror dev bar1
       # tc filter replace dev foo protocol all ingress \
         handle 800::800 pref 49152 \
         u32 match u8 0 0 action mirred egress mirror dev bar2
       Error: cls_u32: Key node flags do not match passed flags.
       We have an error talking to the kernel, -1
      
      The error comes from u32_change() when comparing new and
      existing flags. The existing ones always contains one of
      TCA_CLS_FLAGS_{,NOT}_IN_HW flag depending on offloading state.
      These flags cannot be passed from userspace so the condition
      (n->flags != flags) in u32_change() always fails.
      
      Fix the condition so the flags TCA_CLS_FLAGS_NOT_IN_HW and
      TCA_CLS_FLAGS_IN_HW are not taken into account.
      
      Fixes: 24d3dc6d
      
       ("net/sched: cls_u32: Reflect HW offload status")
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb53f7af
    • Dan Williams's avatar
      mpls, nospec: Sanitize array index in mpls_label_ok() · 3968523f
      Dan Williams authored
      
      
      mpls_label_ok() validates that the 'platform_label' array index from a
      userspace netlink message payload is valid. Under speculation the
      mpls_label_ok() result may not resolve in the CPU pipeline until after
      the index is used to access an array element. Sanitize the index to zero
      to prevent userspace-controlled arbitrary out-of-bounds speculation, a
      precursor for a speculative execution side channel vulnerability.
      
      Cc: <stable@vger.kernel.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3968523f
    • Sowmini Varadhan's avatar
      rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and... · ebeeb1ad
      Sowmini Varadhan authored
      rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and rds connection/workq management
      
      An rds_connection can get added during netns deletion between lines 528
      and 529 of
      
        506 static void rds_tcp_kill_sock(struct net *net)
        :
        /* code to pull out all the rds_connections that should be destroyed */
        :
        528         spin_unlock_irq(&rds_tcp_conn_lock);
        529         list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node)
        530                 rds_conn_destroy(tc->t_cpath->cp_conn);
      
      Such an rds_connection would miss out the rds_conn_destroy()
      loop (that cancels all pending work) and (if it was scheduled
      after netns deletion) could trigger the use-after-free.
      
      A similar race-window exists for the module unload path
      in rds_tcp_exit -> rds_tcp_destroy_conns
      
      Concurrency with netns deletion (rds_tcp_kill_sock()) must be handled
      by checking check_net() before enqueuing new work or adding new
      connections.
      
      Concurrency with module-unload is handled by maintaining a module
      specific flag that is set at the start of the module exit function,
      and must be checked before enqueuing new work or adding new connections.
      
      This commit refactors existing RDS_DESTROY_PENDING checks added by
      commit 3db6e0d1
      
       ("rds: use RCU to synchronize work-enqueue with
      connection teardown") and consolidates all the concurrency checks
      listed above into the function rds_destroy_pending().
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ebeeb1ad
    • Kees Cook's avatar
      net: Whitelist the skbuff_head_cache "cb" field · 79a8a642
      Kees Cook authored
      Most callers of put_cmsg() use a "sizeof(foo)" for the length argument.
      Within put_cmsg(), a copy_to_user() call is made with a dynamic size, as a
      result of the cmsg header calculations. This means that hardened usercopy
      will examine the copy, even though it was technically a fixed size and
      should be implicitly whitelisted. All the put_cmsg() calls being built
      from values in skbuff_head_cache are coming out of the protocol-defined
      "cb" field, so whitelist this field entirely instead of creating per-use
      bounce buffers, for which there are concerns about performance.
      
      Original report was:
      
      Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLAB object 'skbuff_head_cache' (offset 64, size 16)!
      WARNING: CPU: 0 PID: 3663 at mm/usercopy.c:81 usercopy_warn+0xdb/0x100 mm/usercopy.c:76
      ...
       __check_heap_object+0x89/0xc0 mm/slab.c:4426
       check_heap_object mm/usercopy.c:236 [inline]
       __check_object_size+0x272/0x530 mm/usercopy.c:259
       check_object_size include/linux/thread_info.h:112 [inline]
       check_copy_size include/linux/thread_info.h:143 [inline]
       copy_to_user include/linux/uaccess.h:154 [inline]
       put_cmsg+0x233/0x3f0 net/core/scm.c:242
       sock_recv_errqueue+0x200/0x3e0 net/core/sock.c:2913
       packet_recvmsg+0xb2e/0x17a0 net/packet/af_packet.c:3296
       sock_recvmsg_nosec net/socket.c:803 [inline]
       sock_recvmsg+0xc9/0x110 net/socket.c:810
       ___sys_recvmsg+0x2a4/0x640 net/socket.c:2179
       __sys_recvmmsg+0x2a9/0xaf0 net/socket.c:2287
       SYSC_recvmmsg net/socket.c:2368 [inline]
       SyS_recvmmsg+0xc4/0x160 net/socket.c:2352
       entry_SYSCALL_64_fastpath+0x29/0xa0
      
      Reported-by: syzbot+e2d6cfb305e9f3911dea@syzkaller.appspotmail.com
      Fixes: 6d07d1cd
      
       ("usercopy: Restrict non-usercopy caches to size 0")
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79a8a642
    • Mathieu Malaterre's avatar
      net: Extra '_get' in declaration of arch_get_platform_mac_address · e728789c
      Mathieu Malaterre authored
      In commit c7f5d105
      
       ("net: Add eth_platform_get_mac_address() helper."),
      two declarations were added:
      
        int eth_platform_get_mac_address(struct device *dev, u8 *mac_addr);
        unsigned char *arch_get_platform_get_mac_address(void);
      
      An extra '_get' was introduced in arch_get_platform_get_mac_address, remove
      it. Fix compile warning using W=1:
      
        CC      net/ethernet/eth.o
      net/ethernet/eth.c:523:24: warning: no previous prototype for ‘arch_get_platform_mac_address’ [-Wmissing-prototypes]
       unsigned char * __weak arch_get_platform_mac_address(void)
                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        AR      net/ethernet/built-in.o
      Signed-off-by: default avatarMathieu Malaterre <malat@debian.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e728789c
    • Nathan Fontenot's avatar
      ibmvnic: queue reset when CRQ gets closed during reset · ec95dffa
      Nathan Fontenot authored
      
      
      While handling a driver reset we get a H_CLOSED return trying
      to send a CRQ event. When this occurs we need to queue up another
      reset attempt. Without doing this we see instances where the driver
      is left in a closed state because the reset failed and there is no
      further attempts to reset the driver.
      Signed-off-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec95dffa
    • Gustavo A. R. Silva's avatar
      atm: he: use 64-bit arithmetic instead of 32-bit · 583133b3
      Gustavo A. R. Silva authored
      
      
      Add suffix ULL to constants 272, 204, 136 and 68 in order to give the
      compiler complete information about the proper arithmetic to use.
      Notice that these constants are used in contexts that expect
      expressions of type unsigned long long (64 bits, unsigned).
      
      The following expressions are currently being evaluated using 32-bit
      arithmetic:
      
      272 * mult
      204 * mult
      136 * mult
      68 * mult
      
      Addresses-Coverity-ID: 201058
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      583133b3
    • Christian Brauner's avatar
      rtnetlink: require unique netns identifier · 4ff66cae
      Christian Brauner authored
      
      
      Since we've added support for IFLA_IF_NETNSID for RTM_{DEL,GET,SET,NEW}LINK
      it is possible for userspace to send us requests with three different
      properties to identify a target network namespace. This affects at least
      RTM_{NEW,SET}LINK. Each of them could potentially refer to a different
      network namespace which is confusing. For legacy reasons the kernel will
      pick the IFLA_NET_NS_PID property first and then look for the
      IFLA_NET_NS_FD property but there is no reason to extend this type of
      behavior to network namespace ids. The regression potential is quite
      minimal since the rtnetlink requests in question either won't allow
      IFLA_IF_NETNSID requests before 4.16 is out (RTM_{NEW,SET}LINK) or don't
      support IFLA_NET_NS_{PID,FD} (RTM_{DEL,GET}LINK) in the first place.
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Acked-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ff66cae
    • Jason Wang's avatar
      tuntap: add missing xdp flush · 762c330d
      Jason Wang authored
      When using devmap to redirect packets between interfaces,
      xdp_do_flush() is usually a must to flush any batched
      packets. Unfortunately this is missed in current tuntap
      implementation.
      
      Unlike most hardware driver which did XDP inside NAPI loop and call
      xdp_do_flush() at then end of each round of poll. TAP did it in the
      context of process e.g tun_get_user(). So fix this by count the
      pending redirected packets and flush when it exceeds NAPI_POLL_WEIGHT
      or MSG_MORE was cleared by sendmsg() caller.
      
      With this fix, xdp_redirect_map works again between two TAPs.
      
      Fixes: 761876c8
      
       ("tap: XDP support")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      762c330d
    • Nicolas Dichtel's avatar
      netlink: ensure to loop over all netns in genlmsg_multicast_allns() · cb9f7a9a
      Nicolas Dichtel authored
      Nowadays, nlmsg_multicast() returns only 0 or -ESRCH but this was not the
      case when commit 134e6375 was pushed.
      However, there was no reason to stop the loop if a netns does not have
      listeners.
      Returns -ESRCH only if there was no listeners in all netns.
      
      To avoid having the same problem in the future, I didn't take the
      assumption that nlmsg_multicast() returns only 0 or -ESRCH.
      
      Fixes: 134e6375
      
       ("genetlink: make netns aware")
      CC: Johannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb9f7a9a
    • David Howells's avatar
      rxrpc: Don't put crypto buffers on the stack · 8c2f826d
      David Howells authored
      
      
      Don't put buffers of data to be handed to crypto on the stack as this may
      cause an assertion failure in the kernel (see below).  Fix this by using an
      kmalloc'd buffer instead.
      
      kernel BUG at ./include/linux/scatterlist.h:147!
      ...
      RIP: 0010:rxkad_encrypt_response.isra.6+0x191/0x1b0 [rxrpc]
      RSP: 0018:ffffbe2fc06cfca8 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff989277d59900 RCX: 0000000000000028
      RDX: 0000259dc06cfd88 RSI: 0000000000000025 RDI: ffffbe30406cfd88
      RBP: ffffbe2fc06cfd60 R08: ffffbe2fc06cfd08 R09: ffffbe2fc06cfd08
      R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff7c5f80d9f95
      R13: ffffbe2fc06cfd88 R14: ffff98927a3f7aa0 R15: ffffbe2fc06cfd08
      FS:  0000000000000000(0000) GS:ffff98927fc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055b1ff28f0f8 CR3: 000000001b412003 CR4: 00000000003606f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       rxkad_respond_to_challenge+0x297/0x330 [rxrpc]
       rxrpc_process_connection+0xd1/0x690 [rxrpc]
       ? process_one_work+0x1c3/0x680
       ? __lock_is_held+0x59/0xa0
       process_one_work+0x249/0x680
       worker_thread+0x3a/0x390
       ? process_one_work+0x680/0x680
       kthread+0x121/0x140
       ? kthread_create_worker_on_cpu+0x70/0x70
       ret_from_fork+0x3a/0x50
      Reported-by: default avatarJonathan Billings <jsbillings@jsbillings.org>
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarJonathan Billings <jsbillings@jsbillings.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c2f826d
    • Kalle Valo's avatar
      Merge ath-current from git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git · 99ffd198
      Kalle Valo authored
      ath.git fixes for 4.16. Major changes:
      
      ath10k
      
      * correct firmware RAM dump length for QCA6174/QCA9377
      
      * add new QCA988X device id
      
      * fix a kernel panic during pci probe
      
      * revert a recent commit which broke ath10k firmware metadata parsing
      
      ath9k
      
      * fix a noise floor regression introduced during the merge window
      
      * add new device id
      99ffd198
    • David S. Miller's avatar
      Merge branch 'nfp-fix-disabling-TC-offloads-in-flower-max-TSO-segs-and-module-version' · c7025586
      David S. Miller authored
      
      
      Jakub Kicinski says:
      
      ====================
      nfp: fix disabling TC offloads in flower, max TSO segs and module version
      
      This set corrects the way nfp deals with the NETIF_F_HW_TC flag.
      It has slipped the review that flower offload does not currently
      refuse disabling this flag when filter offload is active.
      
      nfp's flower offload does not actually keep track of how many filters
      for each port are offloaded.  The accounting of the number of filters
      is added to the nfp core structures, and BPF moved to use these
      structures as well.
      
      If users are allowed to disable TC offloads while filters are active,
      not only is it incorrect behaviour, but actually the NFP will never
      be told to remove the flows, leading to use-after-free when stats
      arrive.
      
      Fourth patch makes sure we declare the max number of TSO segments.
      FW should drop longer packets cleanly (otherwise this would be a
      security problem for untrusted VFs) but dropping longer TSO frames
      is not nice and driver should prevent them from being generated.
      
      Last small addition populates MODULE_VERSION with kernel version.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7025586
    • Jakub Kicinski's avatar
      nfp: populate MODULE_VERSION · 1a5e8e35
      Jakub Kicinski authored
      
      
      DKMS and similar out-of-tree module replacement services use
      module version to make sure the out-of-tree software is not
      older than the module shipped with the kernel.  We use the
      kernel version in ethtool -i output, put it into MODULE_VERSION
      as well.
      Reported-by: default avatarJan Gutter <jan.gutter@netronome.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a5e8e35
    • Jakub Kicinski's avatar
      nfp: limit the number of TSO segments · 0d592e52
      Jakub Kicinski authored
      
      
      Most FWs limit the number of TSO segments a frame can produce
      to 64.  This is for fairness and efficiency (of FW datapath)
      reasons.  If a frame with larger number of segments is submitted
      the FW will drop it.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d592e52
    • Jakub Kicinski's avatar
      nfp: forbid disabling hw-tc-offload on representors while offload active · d692403e
      Jakub Kicinski authored
      All netdevs which can accept TC offloads must implement
      .ndo_set_features().  nfp_reprs currently do not do that, which
      means hw-tc-offload can be turned on and off even when offloads
      are active.
      
      Whether the offloads are active is really a question to nfp_ports,
      so remove the per-app tc_busy callback indirection thing, and
      simply count the number of offloaded items in nfp_port structure.
      
      Fixes: 8a276873
      
       ("nfp: provide infrastructure for offloading flower based TC filters")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Tested-by: default avatarPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d692403e