Otto Sabart | 1b23f5e | 2019-01-06 00:28:59 +0100 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0 |
| 2 | |
Otto Sabart | b83eb68 | 2019-01-06 00:29:28 +0100 | [diff] [blame] | 3 | ===================== |
| 4 | Segmentation Offloads |
| 5 | ===================== |
Otto Sabart | 1b23f5e | 2019-01-06 00:28:59 +0100 | [diff] [blame] | 6 | |
Alexander Duyck | f7a6272 | 2016-04-10 21:45:09 -0400 | [diff] [blame] | 7 | |
| 8 | Introduction |
| 9 | ============ |
| 10 | |
| 11 | This document describes a set of techniques in the Linux networking stack |
| 12 | to take advantage of segmentation offload capabilities of various NICs. |
| 13 | |
| 14 | The following technologies are described: |
| 15 | * TCP Segmentation Offload - TSO |
| 16 | * UDP Fragmentation Offload - UFO |
| 17 | * IPIP, SIT, GRE, and UDP Tunnel Offloads |
| 18 | * Generic Segmentation Offload - GSO |
| 19 | * Generic Receive Offload - GRO |
| 20 | * Partial Generic Segmentation Offload - GSO_PARTIAL |
Weitao Hou | ba3c438 | 2019-05-20 13:23:17 +0800 | [diff] [blame] | 21 | * SCTP acceleration with GSO - GSO_BY_FRAGS |
Alexander Duyck | f7a6272 | 2016-04-10 21:45:09 -0400 | [diff] [blame] | 22 | |
Otto Sabart | 1b23f5e | 2019-01-06 00:28:59 +0100 | [diff] [blame] | 23 | |
Alexander Duyck | f7a6272 | 2016-04-10 21:45:09 -0400 | [diff] [blame] | 24 | TCP Segmentation Offload |
| 25 | ======================== |
| 26 | |
| 27 | TCP segmentation allows a device to segment a single frame into multiple |
| 28 | frames with a data payload size specified in skb_shinfo()->gso_size. |
Daniel Axtens | 3d07e07 | 2018-03-08 23:34:35 +1100 | [diff] [blame] | 29 | When TCP segmentation requested the bit for either SKB_GSO_TCPV4 or |
| 30 | SKB_GSO_TCPV6 should be set in skb_shinfo()->gso_type and |
Alexander Duyck | f7a6272 | 2016-04-10 21:45:09 -0400 | [diff] [blame] | 31 | skb_shinfo()->gso_size should be set to a non-zero value. |
| 32 | |
| 33 | TCP segmentation is dependent on support for the use of partial checksum |
| 34 | offload. For this reason TSO is normally disabled if the Tx checksum |
| 35 | offload for a given device is disabled. |
| 36 | |
| 37 | In order to support TCP segmentation offload it is necessary to populate |
| 38 | the network and transport header offsets of the skbuff so that the device |
| 39 | drivers will be able determine the offsets of the IP or IPv6 header and the |
| 40 | TCP header. In addition as CHECKSUM_PARTIAL is required csum_start should |
| 41 | also point to the TCP header of the packet. |
| 42 | |
| 43 | For IPv4 segmentation we support one of two types in terms of the IP ID. |
| 44 | The default behavior is to increment the IP ID with every segment. If the |
| 45 | GSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP |
| 46 | ID and all segments will use the same IP ID. If a device has |
| 47 | NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO |
| 48 | and we will either increment the IP ID for all frames, or leave it at a |
| 49 | static value based on driver preference. |
| 50 | |
Otto Sabart | 1b23f5e | 2019-01-06 00:28:59 +0100 | [diff] [blame] | 51 | |
Alexander Duyck | f7a6272 | 2016-04-10 21:45:09 -0400 | [diff] [blame] | 52 | UDP Fragmentation Offload |
| 53 | ========================= |
| 54 | |
| 55 | UDP fragmentation offload allows a device to fragment an oversized UDP |
| 56 | datagram into multiple IPv4 fragments. Many of the requirements for UDP |
| 57 | fragmentation offload are the same as TSO. However the IPv4 ID for |
| 58 | fragments should not increment as a single IPv4 datagram is fragmented. |
| 59 | |
Daniel Axtens | a65820e | 2018-02-14 18:05:31 +1100 | [diff] [blame] | 60 | UFO is deprecated: modern kernels will no longer generate UFO skbs, but can |
| 61 | still receive them from tuntap and similar devices. Offload of UDP-based |
| 62 | tunnel protocols is still supported. |
| 63 | |
Otto Sabart | 1b23f5e | 2019-01-06 00:28:59 +0100 | [diff] [blame] | 64 | |
Alexander Duyck | f7a6272 | 2016-04-10 21:45:09 -0400 | [diff] [blame] | 65 | IPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads |
| 66 | ======================================================== |
| 67 | |
| 68 | In addition to the offloads described above it is possible for a frame to |
| 69 | contain additional headers such as an outer tunnel. In order to account |
| 70 | for such instances an additional set of segmentation offload types were |
Nicolas Dichtel | 11bafd5 | 2017-07-07 14:08:25 +0200 | [diff] [blame] | 71 | introduced including SKB_GSO_IPXIP4, SKB_GSO_IPXIP6, SKB_GSO_GRE, and |
Alexander Duyck | f7a6272 | 2016-04-10 21:45:09 -0400 | [diff] [blame] | 72 | SKB_GSO_UDP_TUNNEL. These extra segmentation types are used to identify |
| 73 | cases where there are more than just 1 set of headers. For example in the |
| 74 | case of IPIP and SIT we should have the network and transport headers moved |
| 75 | from the standard list of headers to "inner" header offsets. |
| 76 | |
| 77 | Currently only two levels of headers are supported. The convention is to |
| 78 | refer to the tunnel headers as the outer headers, while the encapsulated |
| 79 | data is normally referred to as the inner headers. Below is the list of |
| 80 | calls to access the given headers: |
| 81 | |
Otto Sabart | 1b23f5e | 2019-01-06 00:28:59 +0100 | [diff] [blame] | 82 | IPIP/SIT Tunnel:: |
Alexander Duyck | f7a6272 | 2016-04-10 21:45:09 -0400 | [diff] [blame] | 83 | |
Otto Sabart | 1b23f5e | 2019-01-06 00:28:59 +0100 | [diff] [blame] | 84 | Outer Inner |
| 85 | MAC skb_mac_header |
| 86 | Network skb_network_header skb_inner_network_header |
| 87 | Transport skb_transport_header |
| 88 | |
| 89 | UDP/GRE Tunnel:: |
| 90 | |
| 91 | Outer Inner |
| 92 | MAC skb_mac_header skb_inner_mac_header |
| 93 | Network skb_network_header skb_inner_network_header |
| 94 | Transport skb_transport_header skb_inner_transport_header |
Alexander Duyck | f7a6272 | 2016-04-10 21:45:09 -0400 | [diff] [blame] | 95 | |
| 96 | In addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and |
| 97 | SKB_GSO_UDP_TUNNEL_CSUM. These two additional tunnel types reflect the |
| 98 | fact that the outer header also requests to have a non-zero checksum |
| 99 | included in the outer header. |
| 100 | |
Daniel Axtens | bc3c243 | 2018-02-14 18:05:32 +1100 | [diff] [blame] | 101 | Finally there is SKB_GSO_TUNNEL_REMCSUM which indicates that a given tunnel |
| 102 | header has requested a remote checksum offload. In this case the inner |
| 103 | headers will be left with a partial checksum and only the outer header |
| 104 | checksum will be computed. |
Alexander Duyck | f7a6272 | 2016-04-10 21:45:09 -0400 | [diff] [blame] | 105 | |
Otto Sabart | 1b23f5e | 2019-01-06 00:28:59 +0100 | [diff] [blame] | 106 | |
Alexander Duyck | f7a6272 | 2016-04-10 21:45:09 -0400 | [diff] [blame] | 107 | Generic Segmentation Offload |
| 108 | ============================ |
| 109 | |
| 110 | Generic segmentation offload is a pure software offload that is meant to |
| 111 | deal with cases where device drivers cannot perform the offloads described |
| 112 | above. What occurs in GSO is that a given skbuff will have its data broken |
| 113 | out over multiple skbuffs that have been resized to match the MSS provided |
| 114 | via skb_shinfo()->gso_size. |
| 115 | |
| 116 | Before enabling any hardware segmentation offload a corresponding software |
| 117 | offload is required in GSO. Otherwise it becomes possible for a frame to |
| 118 | be re-routed between devices and end up being unable to be transmitted. |
| 119 | |
Otto Sabart | 1b23f5e | 2019-01-06 00:28:59 +0100 | [diff] [blame] | 120 | |
Alexander Duyck | f7a6272 | 2016-04-10 21:45:09 -0400 | [diff] [blame] | 121 | Generic Receive Offload |
| 122 | ======================= |
| 123 | |
| 124 | Generic receive offload is the complement to GSO. Ideally any frame |
| 125 | assembled by GRO should be segmented to create an identical sequence of |
| 126 | frames using GSO, and any sequence of frames segmented by GSO should be |
| 127 | able to be reassembled back to the original by GRO. The only exception to |
| 128 | this is IPv4 ID in the case that the DF bit is set for a given IP header. |
| 129 | If the value of the IPv4 ID is not sequentially incrementing it will be |
| 130 | altered so that it is when a frame assembled via GRO is segmented via GSO. |
| 131 | |
Otto Sabart | 1b23f5e | 2019-01-06 00:28:59 +0100 | [diff] [blame] | 132 | |
Alexander Duyck | f7a6272 | 2016-04-10 21:45:09 -0400 | [diff] [blame] | 133 | Partial Generic Segmentation Offload |
| 134 | ==================================== |
| 135 | |
| 136 | Partial generic segmentation offload is a hybrid between TSO and GSO. What |
| 137 | it effectively does is take advantage of certain traits of TCP and tunnels |
| 138 | so that instead of having to rewrite the packet headers for each segment |
| 139 | only the inner-most transport header and possibly the outer-most network |
| 140 | header need to be updated. This allows devices that do not support tunnel |
| 141 | offloads or tunnel offloads with checksum to still make use of segmentation. |
| 142 | |
| 143 | With the partial offload what occurs is that all headers excluding the |
| 144 | inner transport header are updated such that they will contain the correct |
| 145 | values for if the header was simply duplicated. The one exception to this |
| 146 | is the outer IPv4 ID field. It is up to the device drivers to guarantee |
| 147 | that the IPv4 ID field is incremented in the case that a given header does |
| 148 | not have the DF bit set. |
Daniel Axtens | a677088 | 2018-02-14 18:05:33 +1100 | [diff] [blame] | 149 | |
Otto Sabart | 1b23f5e | 2019-01-06 00:28:59 +0100 | [diff] [blame] | 150 | |
Weitao Hou | ba3c438 | 2019-05-20 13:23:17 +0800 | [diff] [blame] | 151 | SCTP acceleration with GSO |
Daniel Axtens | a677088 | 2018-02-14 18:05:33 +1100 | [diff] [blame] | 152 | =========================== |
| 153 | |
| 154 | SCTP - despite the lack of hardware support - can still take advantage of |
| 155 | GSO to pass one large packet through the network stack, rather than |
| 156 | multiple small packets. |
| 157 | |
| 158 | This requires a different approach to other offloads, as SCTP packets |
| 159 | cannot be just segmented to (P)MTU. Rather, the chunks must be contained in |
| 160 | IP segments, padding respected. So unlike regular GSO, SCTP can't just |
| 161 | generate a big skb, set gso_size to the fragmentation point and deliver it |
| 162 | to IP layer. |
| 163 | |
| 164 | Instead, the SCTP protocol layer builds an skb with the segments correctly |
| 165 | padded and stored as chained skbs, and skb_segment() splits based on those. |
| 166 | To signal this, gso_size is set to the special value GSO_BY_FRAGS. |
| 167 | |
| 168 | Therefore, any code in the core networking stack must be aware of the |
| 169 | possibility that gso_size will be GSO_BY_FRAGS and handle that case |
Daniel Axtens | d02f51c | 2018-03-03 03:03:46 +0100 | [diff] [blame] | 170 | appropriately. |
| 171 | |
Daniel Axtens | 1dd27cd | 2018-03-09 14:06:09 +1100 | [diff] [blame] | 172 | There are some helpers to make this easier: |
| 173 | |
Otto Sabart | 1b23f5e | 2019-01-06 00:28:59 +0100 | [diff] [blame] | 174 | - skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see if |
| 175 | an skb is an SCTP GSO skb. |
Daniel Axtens | d02f51c | 2018-03-03 03:03:46 +0100 | [diff] [blame] | 176 | |
Otto Sabart | 1b23f5e | 2019-01-06 00:28:59 +0100 | [diff] [blame] | 177 | - For size checks, the skb_gso_validate_*_len family of helpers correctly |
| 178 | considers GSO_BY_FRAGS. |
Daniel Axtens | d02f51c | 2018-03-03 03:03:46 +0100 | [diff] [blame] | 179 | |
Otto Sabart | 1b23f5e | 2019-01-06 00:28:59 +0100 | [diff] [blame] | 180 | - For manipulating packets, skb_increase_gso_size and skb_decrease_gso_size |
| 181 | will check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs. |
Daniel Axtens | a677088 | 2018-02-14 18:05:33 +1100 | [diff] [blame] | 182 | |
| 183 | This also affects drivers with the NETIF_F_FRAGLIST & NETIF_F_GSO_SCTP bits |
| 184 | set. Note also that NETIF_F_GSO_SCTP is included in NETIF_F_GSO_SOFTWARE. |