Alexander Duyck | f7a6272 | 2016-04-10 21:45:09 -0400 | [diff] [blame] | 1 | Segmentation Offloads in the Linux Networking Stack |
| 2 | |
| 3 | Introduction |
| 4 | ============ |
| 5 | |
| 6 | This document describes a set of techniques in the Linux networking stack |
| 7 | to take advantage of segmentation offload capabilities of various NICs. |
| 8 | |
| 9 | The following technologies are described: |
| 10 | * TCP Segmentation Offload - TSO |
| 11 | * UDP Fragmentation Offload - UFO |
| 12 | * IPIP, SIT, GRE, and UDP Tunnel Offloads |
| 13 | * Generic Segmentation Offload - GSO |
| 14 | * Generic Receive Offload - GRO |
| 15 | * Partial Generic Segmentation Offload - GSO_PARTIAL |
Daniel Axtens | a677088 | 2018-02-14 18:05:33 +1100 | [diff] [blame] | 16 | * SCTP accelleration with GSO - GSO_BY_FRAGS |
Alexander Duyck | f7a6272 | 2016-04-10 21:45:09 -0400 | [diff] [blame] | 17 | |
| 18 | TCP Segmentation Offload |
| 19 | ======================== |
| 20 | |
| 21 | TCP segmentation allows a device to segment a single frame into multiple |
| 22 | frames with a data payload size specified in skb_shinfo()->gso_size. |
Daniel Axtens | 3d07e07 | 2018-03-08 23:34:35 +1100 | [diff] [blame] | 23 | When TCP segmentation requested the bit for either SKB_GSO_TCPV4 or |
| 24 | SKB_GSO_TCPV6 should be set in skb_shinfo()->gso_type and |
Alexander Duyck | f7a6272 | 2016-04-10 21:45:09 -0400 | [diff] [blame] | 25 | skb_shinfo()->gso_size should be set to a non-zero value. |
| 26 | |
| 27 | TCP segmentation is dependent on support for the use of partial checksum |
| 28 | offload. For this reason TSO is normally disabled if the Tx checksum |
| 29 | offload for a given device is disabled. |
| 30 | |
| 31 | In order to support TCP segmentation offload it is necessary to populate |
| 32 | the network and transport header offsets of the skbuff so that the device |
| 33 | drivers will be able determine the offsets of the IP or IPv6 header and the |
| 34 | TCP header. In addition as CHECKSUM_PARTIAL is required csum_start should |
| 35 | also point to the TCP header of the packet. |
| 36 | |
| 37 | For IPv4 segmentation we support one of two types in terms of the IP ID. |
| 38 | The default behavior is to increment the IP ID with every segment. If the |
| 39 | GSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP |
| 40 | ID and all segments will use the same IP ID. If a device has |
| 41 | NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO |
| 42 | and we will either increment the IP ID for all frames, or leave it at a |
| 43 | static value based on driver preference. |
| 44 | |
| 45 | UDP Fragmentation Offload |
| 46 | ========================= |
| 47 | |
| 48 | UDP fragmentation offload allows a device to fragment an oversized UDP |
| 49 | datagram into multiple IPv4 fragments. Many of the requirements for UDP |
| 50 | fragmentation offload are the same as TSO. However the IPv4 ID for |
| 51 | fragments should not increment as a single IPv4 datagram is fragmented. |
| 52 | |
Daniel Axtens | a65820e | 2018-02-14 18:05:31 +1100 | [diff] [blame] | 53 | UFO is deprecated: modern kernels will no longer generate UFO skbs, but can |
| 54 | still receive them from tuntap and similar devices. Offload of UDP-based |
| 55 | tunnel protocols is still supported. |
| 56 | |
Alexander Duyck | f7a6272 | 2016-04-10 21:45:09 -0400 | [diff] [blame] | 57 | IPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads |
| 58 | ======================================================== |
| 59 | |
| 60 | In addition to the offloads described above it is possible for a frame to |
| 61 | contain additional headers such as an outer tunnel. In order to account |
| 62 | for such instances an additional set of segmentation offload types were |
Nicolas Dichtel | 11bafd5 | 2017-07-07 14:08:25 +0200 | [diff] [blame] | 63 | introduced including SKB_GSO_IPXIP4, SKB_GSO_IPXIP6, SKB_GSO_GRE, and |
Alexander Duyck | f7a6272 | 2016-04-10 21:45:09 -0400 | [diff] [blame] | 64 | SKB_GSO_UDP_TUNNEL. These extra segmentation types are used to identify |
| 65 | cases where there are more than just 1 set of headers. For example in the |
| 66 | case of IPIP and SIT we should have the network and transport headers moved |
| 67 | from the standard list of headers to "inner" header offsets. |
| 68 | |
| 69 | Currently only two levels of headers are supported. The convention is to |
| 70 | refer to the tunnel headers as the outer headers, while the encapsulated |
| 71 | data is normally referred to as the inner headers. Below is the list of |
| 72 | calls to access the given headers: |
| 73 | |
| 74 | IPIP/SIT Tunnel: |
| 75 | Outer Inner |
| 76 | MAC skb_mac_header |
| 77 | Network skb_network_header skb_inner_network_header |
| 78 | Transport skb_transport_header |
| 79 | |
| 80 | UDP/GRE Tunnel: |
| 81 | Outer Inner |
| 82 | MAC skb_mac_header skb_inner_mac_header |
| 83 | Network skb_network_header skb_inner_network_header |
| 84 | Transport skb_transport_header skb_inner_transport_header |
| 85 | |
| 86 | In addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and |
| 87 | SKB_GSO_UDP_TUNNEL_CSUM. These two additional tunnel types reflect the |
| 88 | fact that the outer header also requests to have a non-zero checksum |
| 89 | included in the outer header. |
| 90 | |
Daniel Axtens | bc3c243 | 2018-02-14 18:05:32 +1100 | [diff] [blame] | 91 | Finally there is SKB_GSO_TUNNEL_REMCSUM which indicates that a given tunnel |
| 92 | header has requested a remote checksum offload. In this case the inner |
| 93 | headers will be left with a partial checksum and only the outer header |
| 94 | checksum will be computed. |
Alexander Duyck | f7a6272 | 2016-04-10 21:45:09 -0400 | [diff] [blame] | 95 | |
| 96 | Generic Segmentation Offload |
| 97 | ============================ |
| 98 | |
| 99 | Generic segmentation offload is a pure software offload that is meant to |
| 100 | deal with cases where device drivers cannot perform the offloads described |
| 101 | above. What occurs in GSO is that a given skbuff will have its data broken |
| 102 | out over multiple skbuffs that have been resized to match the MSS provided |
| 103 | via skb_shinfo()->gso_size. |
| 104 | |
| 105 | Before enabling any hardware segmentation offload a corresponding software |
| 106 | offload is required in GSO. Otherwise it becomes possible for a frame to |
| 107 | be re-routed between devices and end up being unable to be transmitted. |
| 108 | |
| 109 | Generic Receive Offload |
| 110 | ======================= |
| 111 | |
| 112 | Generic receive offload is the complement to GSO. Ideally any frame |
| 113 | assembled by GRO should be segmented to create an identical sequence of |
| 114 | frames using GSO, and any sequence of frames segmented by GSO should be |
| 115 | able to be reassembled back to the original by GRO. The only exception to |
| 116 | this is IPv4 ID in the case that the DF bit is set for a given IP header. |
| 117 | If the value of the IPv4 ID is not sequentially incrementing it will be |
| 118 | altered so that it is when a frame assembled via GRO is segmented via GSO. |
| 119 | |
| 120 | Partial Generic Segmentation Offload |
| 121 | ==================================== |
| 122 | |
| 123 | Partial generic segmentation offload is a hybrid between TSO and GSO. What |
| 124 | it effectively does is take advantage of certain traits of TCP and tunnels |
| 125 | so that instead of having to rewrite the packet headers for each segment |
| 126 | only the inner-most transport header and possibly the outer-most network |
| 127 | header need to be updated. This allows devices that do not support tunnel |
| 128 | offloads or tunnel offloads with checksum to still make use of segmentation. |
| 129 | |
| 130 | With the partial offload what occurs is that all headers excluding the |
| 131 | inner transport header are updated such that they will contain the correct |
| 132 | values for if the header was simply duplicated. The one exception to this |
| 133 | is the outer IPv4 ID field. It is up to the device drivers to guarantee |
| 134 | that the IPv4 ID field is incremented in the case that a given header does |
| 135 | not have the DF bit set. |
Daniel Axtens | a677088 | 2018-02-14 18:05:33 +1100 | [diff] [blame] | 136 | |
| 137 | SCTP accelleration with GSO |
| 138 | =========================== |
| 139 | |
| 140 | SCTP - despite the lack of hardware support - can still take advantage of |
| 141 | GSO to pass one large packet through the network stack, rather than |
| 142 | multiple small packets. |
| 143 | |
| 144 | This requires a different approach to other offloads, as SCTP packets |
| 145 | cannot be just segmented to (P)MTU. Rather, the chunks must be contained in |
| 146 | IP segments, padding respected. So unlike regular GSO, SCTP can't just |
| 147 | generate a big skb, set gso_size to the fragmentation point and deliver it |
| 148 | to IP layer. |
| 149 | |
| 150 | Instead, the SCTP protocol layer builds an skb with the segments correctly |
| 151 | padded and stored as chained skbs, and skb_segment() splits based on those. |
| 152 | To signal this, gso_size is set to the special value GSO_BY_FRAGS. |
| 153 | |
| 154 | Therefore, any code in the core networking stack must be aware of the |
| 155 | possibility that gso_size will be GSO_BY_FRAGS and handle that case |
Daniel Axtens | d02f51c | 2018-03-03 03:03:46 +0100 | [diff] [blame] | 156 | appropriately. |
| 157 | |
Daniel Axtens | 1dd27cd | 2018-03-09 14:06:09 +1100 | [diff] [blame] | 158 | There are some helpers to make this easier: |
| 159 | |
| 160 | - skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see if |
| 161 | an skb is an SCTP GSO skb. |
Daniel Axtens | d02f51c | 2018-03-03 03:03:46 +0100 | [diff] [blame] | 162 | |
| 163 | - For size checks, the skb_gso_validate_*_len family of helpers correctly |
| 164 | considers GSO_BY_FRAGS. |
| 165 | |
| 166 | - For manipulating packets, skb_increase_gso_size and skb_decrease_gso_size |
| 167 | will check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs. |
Daniel Axtens | a677088 | 2018-02-14 18:05:33 +1100 | [diff] [blame] | 168 | |
| 169 | This also affects drivers with the NETIF_F_FRAGLIST & NETIF_F_GSO_SCTP bits |
| 170 | set. Note also that NETIF_F_GSO_SCTP is included in NETIF_F_GSO_SOFTWARE. |