Skip to content
Snippets Groups Projects
  1. Jun 17, 2008
  2. Jun 16, 2008
  3. Jun 13, 2008
  4. Jun 12, 2008
    • David S. Miller's avatar
      tcp: Revert 'process defer accept as established' changes. · ec0a1966
      David S. Miller authored
      
      This reverts two changesets, ec3c0982
      ("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and
      the follow-on bug fix 9ae27e0a
      ("tcp: Fix slab corruption with ipv6 and tcp6fuzz").
      
      This change causes several problems, first reported by Ingo Molnar
      as a distcc-over-loopback regression where connections were getting
      stuck.
      
      Ilpo Järvinen first spotted the locking problems.  The new function
      added by this code, tcp_defer_accept_check(), only has the
      child socket locked, yet it is modifying state of the parent
      listening socket.
      
      Fixing that is non-trivial at best, because we can't simply just grab
      the parent listening socket lock at this point, because it would
      create an ABBA deadlock.  The normal ordering is parent listening
      socket --> child socket, but this code path would require the
      reverse lock ordering.
      
      Next is a problem noticed by Vitaliy Gusev, he noted:
      
      ----------------------------------------
      >--- a/net/ipv4/tcp_timer.c
      >+++ b/net/ipv4/tcp_timer.c
      >@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data)
      > 		goto death;
      > 	}
      >
      >+	if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) {
      >+		tcp_send_active_reset(sk, GFP_ATOMIC);
      >+		goto death;
      
      Here socket sk is not attached to listening socket's request queue. tcp_done()
      will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should
      release this sk) as socket is not DEAD. Therefore socket sk will be lost for
      freeing.
      ----------------------------------------
      
      Finally, Alexey Kuznetsov argues that there might not even be any
      real value or advantage to these new semantics even if we fix all
      of the bugs:
      
      ----------------------------------------
      Hiding from accept() sockets with only out-of-order data only
      is the only thing which is impossible with old approach. Is this really
      so valuable? My opinion: no, this is nothing but a new loophole
      to consume memory without control.
      ----------------------------------------
      
      So revert this thing for now.
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec0a1966
    • David S. Miller's avatar
      ipv6: Fix duplicate initialization of rawv6_prot.destroy · f23d60de
      David S. Miller authored
      
      In changeset 22dd4850
      ("raw: Raw socket leak.") code was added so that we
      flush pending frames on raw sockets to avoid leaks.
      
      The ipv4 part was fine, but the ipv6 part was not
      done correctly.  Unlike the ipv4 side, the ipv6 code
      already has a .destroy method for rawv6_prot.
      
      So now there were two assignments to this member, and
      what the compiler does is use the last one, effectively
      making the ipv6 parts of that changeset a NOP.
      
      Fix this by removing the:
      
      	.destroy	   = inet6_destroy_sock,
      
      line, and adding an inet6_destroy_sock() call to the
      end of raw6_destroy().
      
      Noticed by Al Viro.
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      f23d60de
    • Patrick McHardy's avatar
      netfilter: nf_conntrack: fix ctnetlink related crash in nf_nat_setup_info() · ceeff754
      Patrick McHardy authored
      When creation of a new conntrack entry in ctnetlink fails after having
      set up the NAT mappings, the conntrack has an extension area allocated
      that is not getting properly destroyed when freeing the conntrack again.
      This means the NAT extension is still in the bysource hash, causing a
      crash when walking over the hash chain the next time:
      
      BUG: unable to handle kernel paging request at 00120fbd
      IP: [<c03d394b>] nf_nat_setup_info+0x221/0x58a
      *pde = 00000000
      Oops: 0000 [#1] PREEMPT SMP
      
      Pid: 2795, comm: conntrackd Not tainted (2.6.26-rc5 #1)
      EIP: 0060:[<c03d394b>] EFLAGS: 00010206 CPU: 1
      EIP is at nf_nat_setup_info+0x221/0x58a
      EAX: 00120fbd EBX: 00120fbd ECX: 00000001 EDX: 00000000
      ESI: 0000019e EDI: e853bbb4 EBP: e853bbc8 ESP: e853bb78
       DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      Process conntrackd (pid: 2795, ti=e853a000 task=f7de10f0 task.ti=e853a000)
      Stack: 00000000 e853bc2c e85672ec 00000008 c0561084 63c1db4a 00000000 00000000
             00000000 0002e109 61d2b1c3 00000000 00000000 00000000 01114e22 61d2b1c3
             00000000 00000000 f7444674 e853bc04 00000008 c038e728 0000000a f7444674
      Call Trace:
       [<c038e728>] nla_parse+0x5c/0xb0
       [<c0397c1b>] ctnetlink_change_status+0x190/0x1c6
       [<c0397eec>] ctnetlink_new_conntrack+0x189/0x61f
       [<c0119aee>] update_curr+0x3d/0x52
       [<c03902d1>] nfnetlink_rcv_msg+0xc1/0xd8
       [<c0390228>] nfnetlink_rcv_msg+0x18/0xd8
       [<c0390210>] nfnetlink_rcv_msg+0x0/0xd8
       [<c038d2ce>] netlink_rcv_skb+0x2d/0x71
       [<c0390205>] nfnetlink_rcv+0x19/0x24
       [<c038d0f5>] netlink_unicast+0x1b3/0x216
       ...
      
      Move invocation of the extension destructors to nf_conntrack_free()
      to fix this problem.
      
      Fixes http://bugzilla.kernel.org/show_bug.cgi?id=10875
      
      
      
      Reported-and-Tested-by: default avatarKrzysztof Piotr Oledzki <ole@ans.pl>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ceeff754
    • Eric Leblond's avatar
      netfilter: Make nflog quiet when no one listen in userspace. · b66985b1
      Eric Leblond authored
      
      The message "nf_log_packet: can't log since no backend logging module loaded
      in! Please either load one, or disable logging explicitly" was displayed for
      each logged packet when no userspace application is listening to nflog events.
      The message seems to warn for a problem with a kernel module missing but as
      said before this is not the case. I thus propose to suppress the message (I
      don't see any reason to flood the log because a user application has crashed.)
      
      Signed-off-by: default avatarEric Leblond <eric@inl.fr>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b66985b1
    • YOSHIFUJI Hideaki's avatar
      ipv6: Fail with appropriate error code when setting not-applicable sockopt. · 1717699c
      YOSHIFUJI Hideaki authored
      
      IPV6_MULTICAST_HOPS, for example, is not valid for stream sockets.
      Since they are virtually unavailable for stream sockets,
      we should return ENOPROTOOPT instead of EINVAL.
      
      Signed-off-by: default avatarYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      1717699c
    • YOSHIFUJI Hideaki's avatar
      ipv6: Check IPV6_MULTICAST_LOOP option value. · 28d44882
      YOSHIFUJI Hideaki authored
      
      Only 0 and 1 are valid for IPV6_MULTICAST_LOOP socket option,
      and we should return an error of EINVAL otherwise, per RFC3493.
      
      Based on patch from Shan Wei <shanwei@cn.fujitsu.com>.
      
      Signed-off-by: default avatarYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      28d44882
    • Shan Wei's avatar
      ipv6: Check the hop limit setting in ancillary data. · e8766fc8
      Shan Wei authored
      
      When specifing the outgoing hop limit as ancillary data for sendmsg(),
      the kernel doesn't check the integer hop limit value as specified in
      [RFC-3542] section 6.3.
      
      Signed-off-by: default avatarShan Wei <shanwei@cn.fujitsu.com>
      Signed-off-by: default avatarYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      e8766fc8
    • YOSHIFUJI Hideaki's avatar
      ipv6 route: Fix route lifetime in netlink message. · 36e3deae
      YOSHIFUJI Hideaki authored
      
      1) We may have route lifetime larger than INT_MAX.
      In that case we had wired value in lifetime.
      Use INT_MAX if lifetime does not fit in s32.
      
      2) Lifetime is valid iif RTF_EXPIRES is set.
      
      Signed-off-by: default avatarYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      36e3deae
    • YOSHIFUJI Hideaki's avatar
  5. Jun 11, 2008
    • Gerrit Renker's avatar
      dccp: Bug in initial acknowledgment number assignment · be4c798a
      Gerrit Renker authored
      
      Step 8.5 in RFC 4340 says for the newly cloned socket
      
                 Initialize S.GAR := S.ISS,
      
      but what in fact the code (minisocks.c) does is
      
                 Initialize S.GAR := S.ISR,
      
      which is wrong (typo?) -- fixed by the patch.
      
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      be4c798a
    • Gerrit Renker's avatar
      dccp ccid-3: X truncated due to type conversion · 7deb0f85
      Gerrit Renker authored
      
      This fixes a bug in computing the inter-packet-interval t_ipi = s/X: 
      
       scaled_div32(a, b) uses u32 for b, but in "scaled_div32(s, X)" the type of the
       sending rate `X' is u64. Since X is scaled by 2^6, this truncates rates greater
       than 2^26 Bps (~537 Mbps).
      
      Using full 64-bit division now.
      
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      7deb0f85
    • Gerrit Renker's avatar
      dccp ccid-3: TFRC reverse-lookup Bug-Fix · 1e8a287c
      Gerrit Renker authored
      
      This fixes a bug in the reverse lookup of p: given a value f(p), instead of p,
      the function returned the smallest tabulated value f(p).
      
      The smallest tabulated value of
      	 
         10^6 * f(p) =  sqrt(2*p/3) + 12 * sqrt(3*p/8) * (32 * p^3 + p) 
      
      for p=0.0001 is 8172. 
      
      Since this value is scaled by 10^6, the outcome of this bug is that a loss
      of 8172/10^6 = 0.8172% was reported whenever the input was below the table
      resolution of 0.01%.
      
      This means that the value was over 80 times too high, resulting in large spikes
      of the initial loss interval, thus unnecessarily reducing the throughput.
      
      Also corrected the printk format (%u for u32).
      
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      1e8a287c
    • Gerrit Renker's avatar
      dccp ccid-2: Bug-Fix - Ack Vectors need to be ignored on request sockets · 65907a43
      Gerrit Renker authored
      
      This fixes an oversight from an earlier patch, ensuring that Ack Vectors
      are not processed on request sockets.
      
      The issue is that Ack Vectors must not be parsed on request sockets, since
      the Ack Vector feature depends on the selection of the (TX) CCID. During the
      initial handshake the CCIDs are undefined, and so RFC 4340, 10.3 applies:
      
       "Using CCID-specific options and feature options during a negotiation
        for the corresponding CCID feature is NOT RECOMMENDED [...]"
      
      And it is not even possible: when the server receives the Request from the 
      client, the CCID and Ack vector features are undefined; when the Ack finalising
      the 3-way hanshake arrives, the request socket has not been cloned yet into a
      full socket. (This order is necessary, since otherwise the newly created socket
      would have to be destroyed whenever an option error occurred - a malicious
      hacker could simply send garbage options and exploit this.)
      
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      65907a43
    • Gerrit Renker's avatar
      dccp: Fix sparse warnings · 1e2f0e5e
      Gerrit Renker authored
      
      This patch fixes the following sparse warnings:
       * nested min(max()) expression:
         net/dccp/ccids/ccid3.c:91:21: warning: symbol '__x' shadows an earlier one
         net/dccp/ccids/ccid3.c:91:21: warning: symbol '__y' shadows an earlier one
         
       * Declaration of function prototypes in .c instead of .h file, resulting in
         "should it be static?" warnings. 
      
       * Declared "struct dccpw" static (local to dccp_probe).
       
       * Disabled dccp_delayed_ack() - not fully removed due to RFC 4340, 11.3
         ("Receivers SHOULD implement delayed acknowledgement timers ...").
      
       * Used a different local variable name to avoid
         net/dccp/ackvec.c:293:13: warning: symbol 'state' shadows an earlier one
         net/dccp/ackvec.c:238:33: originally declared here
      
       * Removed unused functions `dccp_ackvector_print' and `dccp_ackvec_print'.
      
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      1e2f0e5e
    • Gerrit Renker's avatar
      dccp ccid-3: Bug-Fix - Zero RTT is possible · 3294f202
      Gerrit Renker authored
      
      In commit $(825de27d) (from 27th May, commit
      message `dccp ccid-3: Fix "t_ipi explosion" bug'), the CCID-3 window counter
      computation was fixed to cope with RTTs < 4 microseconds.
      
      Such RTTs can be found e.g. when running CCID-3 over loopback. The fix removed
      a check against RTT < 4, but introduced a divide-by-zero bug.
      
      All steady-state RTTs in DCCP are filtered using dccp_sample_rtt(), which
      ensures non-zero samples. However, a zero RTT is possible on initialisation,
      when there is no RTT sample from the Request/Response exchange.
      
      The fix is to use the fallback-RTT from RFC 4340, 3.4.
      
      This is also better than just fixing update_win_count() since it allows other
      parts of the code to always assume that the RTT is non-zero during the time
      that the CCID is used.
      
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      3294f202
  6. Jun 10, 2008
  7. Jun 09, 2008
  8. Jun 05, 2008
  9. Jun 04, 2008
Loading