1. 23 Jan, 2018 1 commit
  2. 14 Apr, 2017 2 commits
  3. 25 Mar, 2015 1 commit
  4. 02 Jun, 2014 1 commit
    • Eric Dumazet's avatar
      inetpeer: get rid of ip_id_count · 73f156a6
      Eric Dumazet authored
      Ideally, we would need to generate IP ID using a per destination IP
      generator.
      
      linux kernels used inet_peer cache for this purpose, but this had a huge
      cost on servers disabling MTU discovery.
      
      1) each inet_peer struct consumes 192 bytes
      
      2) inetpeer cache uses a binary tree of inet_peer structs,
         with a nominal size of ~66000 elements under load.
      
      3) lookups in this tree are hitting a lot of cache lines, as tree depth
         is about 20.
      
      4) If server deals with many tcp flows, we have a high probability of
         not finding the inet_peer, allocating a fresh one, inserting it in
         the tree with same initial ip_id_count, (cf secure_ip_id())
      
      5) We garbage collect inet_peer aggressively.
      
      IP ID generation do not have to be 'perfect'
      
      Goal is trying to avoid duplicates in a short period of time,
      so that reassembly units have a chance to complete reassembly of
      fragments belonging to one message before receiving other fragments
      with a recycled ID.
      
      We simply use an array of generators, and a Jenkin hash using the dst IP
      as a key.
      
      ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
      belongs (it is only used from this file)
      
      secure_ip_id() and secure_ipv6_id() no longer are needed.
      
      Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
      unnecessary decrement/increment of the number of segments.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73f156a6
  5. 25 Feb, 2014 1 commit
  6. 19 Sep, 2013 1 commit
    • Ansis Atteka's avatar
      ip: generate unique IP identificator if local fragmentation is allowed · 703133de
      Ansis Atteka authored
      If local fragmentation is allowed, then ip_select_ident() and
      ip_select_ident_more() need to generate unique IDs to ensure
      correct defragmentation on the peer.
      
      For example, if IPsec (tunnel mode) has to encrypt large skbs
      that have local_df bit set, then all IP fragments that belonged
      to different ESP datagrams would have used the same identificator.
      If one of these IP fragments would get lost or reordered, then
      peer could possibly stitch together wrong IP fragments that did
      not belong to the same datagram. This would lead to a packet loss
      or data corruption.
      Signed-off-by: default avatarAnsis Atteka <aatteka@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      703133de
  7. 28 Aug, 2013 1 commit
    • Fan Du's avatar
      {ipv4,xfrm}: Introduce xfrm_tunnel_notifier for xfrm tunnel mode callback · aba82695
      Fan Du authored
      Some thoughts on IPv4 VTI implementation:
      
      The connection between VTI receiving part and xfrm tunnel mode input process
      is hardly a "xfrm_tunnel", xfrm_tunnel is used in places where, e.g ipip/sit
      and xfrm4_tunnel, acts like a true "tunnel" device.
      
      In addition, IMHO, VTI doesn't need vti_err to do something meaningful, as all
      VTI needs is just a notifier to be called whenever xfrm_input ingress a packet
      to update statistics.
      
      A IPsec protected packet is first handled by protocol handlers, e.g AH/ESP,
      to check packet authentication or encryption rightness. PMTU update is taken
      care of in this stage by protocol error handler.
      
      Then the packet is rearranged properly depending on whether it's transport
      mode or tunnel mode packed by mode "input" handler. The VTI handler code
      takes effects in this stage in tunnel mode only. So it neither need propagate
      PMTU, as it has already been done if necessary, nor the VTI handler is
      qualified as a xfrm_tunnel.
      
      So this patch introduces xfrm_tunnel_notifier and meanwhile wipe out vti_err
      code.
      Signed-off-by: default avatarFan Du <fan.du@windriver.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: David S. Miller <davem@davemloft.net>
      Reviewed-by: default avatarSaurabh Mohan <saurabh.mohan@vyatta.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      aba82695
  8. 06 Mar, 2013 1 commit
    • Nicolas Dichtel's avatar
      xfrm: allow to avoid copying DSCP during encapsulation · a947b0a9
      Nicolas Dichtel authored
      By default, DSCP is copying during encapsulation.
      Copying the DSCP in IPsec tunneling may be a bit dangerous because packets with
      different DSCP may get reordered relative to each other in the network and then
      dropped by the remote IPsec GW if the reordering becomes too big compared to the
      replay window.
      
      It is possible to avoid this copy with netfilter rules, but it's very convenient
      to be able to configure it for each SA directly.
      
      This patch adds a toogle for this purpose. By default, it's not set to maintain
      backward compatibility.
      
      Field flags in struct xfrm_usersa_info is full, hence I add a new attribute.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      a947b0a9
  9. 18 Feb, 2013 1 commit
  10. 15 Feb, 2013 1 commit
  11. 18 Jul, 2012 1 commit
  12. 23 Feb, 2012 1 commit
  13. 13 Dec, 2010 2 commits
    • David S. Miller's avatar
      ipv4: Don't pre-seed hoplimit metric. · 323e126f
      David S. Miller authored
      Always go through a new ip4_dst_hoplimit() helper, just like ipv6.
      
      This allowed several simplifications:
      
      1) The interim dst_metric_hoplimit() can go as it's no longer
         userd.
      
      2) The sysctl_ip_default_ttl entry no longer needs to use
         ipv4_doint_and_flush, since the sysctl is not cached in
         routing cache metrics any longer.
      
      3) ipv4_doint_and_flush no longer needs to be exported and
         therefore can be marked static.
      
      When ipv4_doint_and_flush_strategy was removed some time ago,
      the external declaration in ip.h was mistakenly left around
      so kill that off too.
      
      We have to move the sysctl_ip_default_ttl declaration into
      ipv4's route cache definition header net/route.h, because
      currently net/ip.h (where the declaration lives now) has
      a back dependency on net/route.h
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      323e126f
    • David S. Miller's avatar
  14. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  15. 03 Jun, 2009 1 commit
  16. 17 Jun, 2008 1 commit
  17. 24 Mar, 2008 1 commit
  18. 28 Jan, 2008 4 commits
    • Herbert Xu's avatar
      [IPSEC]: Rename tunnel-mode functions to avoid collisions with tunnels · 195ad6a3
      Herbert Xu authored
      It appears that I've managed to create two different functions both
      called xfrm6_tunnel_output.  This is because we have the plain tunnel
      encapsulation named xfrmX_tunnel as well as the tunnel-mode encapsulation
      which lives in the files xfrmX_mode_tunnel.c.
      
      This patch renames functions from the latter to use the xfrmX_mode_tunnel
      prefix to avoid name-space conflicts.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      195ad6a3
    • Herbert Xu's avatar
      [IPSEC]: Separate inner/outer mode processing on input · 227620e2
      Herbert Xu authored
      With inter-family transforms the inner mode differs from the outer
      mode.  Attempting to handle both sides from the same function means
      that it needs to handle both IPv4 and IPv6 which creates duplication
      and confusion.
      
      This patch separates the two parts on the input path so that each
      function deals with one family only.
      
      In particular, the functions xfrm4_extract_inut/xfrm6_extract_inut
      moves the pertinent fields from the IPv4/IPv6 IP headers into a
      neutral format stored in skb->cb.  This is then used by the inner mode
      input functions to modify the inner IP header.  In this way the input
      function no longer has to know about the outer address family.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      227620e2
    • Herbert Xu's avatar
      [IPSEC]: Separate inner/outer mode processing on output · 36cf9acf
      Herbert Xu authored
      With inter-family transforms the inner mode differs from the outer
      mode.  Attempting to handle both sides from the same function means
      that it needs to handle both IPv4 and IPv6 which creates duplication
      and confusion.
      
      This patch separates the two parts on the output path so that each
      function deals with one family only.
      
      In particular, the functions xfrm4_extract_output/xfrm6_extract_output
      moves the pertinent fields from the IPv4/IPv6 IP headers into a
      neutral format stored in skb->cb.  This is then used by the outer mode
      output functions to write the outer IP header.  In this way the output
      function no longer has to know about the inner address family.
      
      Since the extract functions are only called by tunnel modes (the only
      modes that can support inter-family transforms), I've also moved the
      xfrm*_tunnel_check_size calls into them.  This allows the correct ICMP
      message to be sent as opposed to now where you might call icmp_send
      with an IPv6 packet and vice versa.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      36cf9acf
    • Herbert Xu's avatar
      [INET]: Give outer DSCP directly to ip*_copy_dscp · 29bb43b4
      Herbert Xu authored
      This patch changes the prototype of ipv4_copy_dscp and ipv6_copy_dscp so
      that they directly take the outer DSCP rather than the outer IP header.
      This will help us to unify the code for inter-family tunnels.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      29bb43b4
  19. 18 Oct, 2007 1 commit
    • Herbert Xu's avatar
      [IPSEC]: Add missing BEET checks · 1bfcb10f
      Herbert Xu authored
      Currently BEET mode does not reinject the packet back into the stack
      like tunnel mode does.  Since BEET should behave just like tunnel mode
      this is incorrect.
      
      This patch fixes this by introducing a flags field to xfrm_mode that
      tells the IPsec code whether it should terminate and reinject the packet
      back into the stack.
      
      It then sets the flag for BEET and tunnel mode.
      
      I've also added a number of missing BEET checks elsewhere where we check
      whether a given mode is a tunnel or not.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1bfcb10f
  20. 10 Oct, 2007 3 commits
  21. 31 May, 2007 1 commit
  22. 26 Apr, 2007 9 commits
  23. 26 Feb, 2007 1 commit
  24. 13 Feb, 2007 1 commit
  25. 08 Feb, 2007 1 commit