Skip to content
  • Tariq Toukan's avatar
    net/mlx5e: Support RX multi-packet WQE (Striding RQ) · 461017cb
    Tariq Toukan authored
    
    
    Introduce the feature of multi-packet WQE (RX Work Queue Element)
    referred to as (MPWQE or Striding RQ), in which WQEs are larger
    and serve multiple packets each.
    
    Every WQE consists of many strides of the same size, every received
    packet is aligned to a beginning of a stride and is written to
    consecutive strides within a WQE.
    
    In the regular approach, each regular WQE is big enough to be capable
    of serving one received packet of any size up to MTU or 64K in case of
    device LRO is enabled, making it very wasteful when dealing with
    small packets or device LRO is enabled.
    
    For its flexibility, MPWQE allows a better memory utilization
    (implying improvements in CPU utilization and packet rate) as packets
    consume strides according to their size, preserving the rest of
    the WQE to be available for other packets.
    
    MPWQE default configuration:
    	Num of WQEs	= 16
    	Strides Per WQE = 2048
    	Stride Size	= 64 byte
    
    The default WQEs memory footprint went from 1024*mtu (~1.5MB) to
    16 * 2048 * 64 = 2MB per ring.
    However, HW LRO can now be supported at no additional cost in memory
    footprint, and hence we turn it on by default and get an even better
    performance.
    
    Performance tested on ConnectX4-Lx 50G.
    To isolate the feature under test, the numbers below were measured with
    HW LRO turned off. We verified that the performance just improves when
    LRO is turned back on.
    
    * Netperf single TCP stream:
    - BW raised by 10-15% for representative packet sizes:
      default, 64B, 1024B, 1478B, 65536B.
    
    * Netperf multi TCP stream:
    - No degradation, line rate reached.
    
    * Pktgen: packet rate raised by 2-10% for traffic of different message
    sizes: 64B, 128B, 256B, 1024B, and 1500B.
    
    * Pktgen: packet loss in bursts of small messages (64byte),
    single stream:
    - | num packets | packets loss before | packets loss after
      |     2K      |       ~ 1K          |       0
      |     8K      |       ~ 6K          |       0
      |     16K     |       ~13K          |       0
      |     32K     |       ~28K          |       0
      |     64K     |       ~57K          |     ~24K
    
    As expected as the driver can receive as many small packets (<=64B) as
    the number of total strides in the ring (default = 2048 * 16) vs. 1024
    (default ring size regardless of packets size) before this feature.
    
    Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
    Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
    Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    461017cb