Alexander Duyck | f7a6272 | 2016-04-10 21:45:09 -0400 | [diff] [blame] | 1 | Segmentation Offloads in the Linux Networking Stack |
| 2 | |
| 3 | Introduction |
| 4 | ============ |
| 5 | |
| 6 | This document describes a set of techniques in the Linux networking stack |
| 7 | to take advantage of segmentation offload capabilities of various NICs. |
| 8 | |
| 9 | The following technologies are described: |
| 10 | * TCP Segmentation Offload - TSO |
| 11 | * UDP Fragmentation Offload - UFO |
| 12 | * IPIP, SIT, GRE, and UDP Tunnel Offloads |
| 13 | * Generic Segmentation Offload - GSO |
| 14 | * Generic Receive Offload - GRO |
| 15 | * Partial Generic Segmentation Offload - GSO_PARTIAL |
| 16 | |
| 17 | TCP Segmentation Offload |
| 18 | ======================== |
| 19 | |
| 20 | TCP segmentation allows a device to segment a single frame into multiple |
| 21 | frames with a data payload size specified in skb_shinfo()->gso_size. |
| 22 | When TCP segmentation requested the bit for either SKB_GSO_TCP or |
| 23 | SKB_GSO_TCP6 should be set in skb_shinfo()->gso_type and |
| 24 | skb_shinfo()->gso_size should be set to a non-zero value. |
| 25 | |
| 26 | TCP segmentation is dependent on support for the use of partial checksum |
| 27 | offload. For this reason TSO is normally disabled if the Tx checksum |
| 28 | offload for a given device is disabled. |
| 29 | |
| 30 | In order to support TCP segmentation offload it is necessary to populate |
| 31 | the network and transport header offsets of the skbuff so that the device |
| 32 | drivers will be able determine the offsets of the IP or IPv6 header and the |
| 33 | TCP header. In addition as CHECKSUM_PARTIAL is required csum_start should |
| 34 | also point to the TCP header of the packet. |
| 35 | |
| 36 | For IPv4 segmentation we support one of two types in terms of the IP ID. |
| 37 | The default behavior is to increment the IP ID with every segment. If the |
| 38 | GSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP |
| 39 | ID and all segments will use the same IP ID. If a device has |
| 40 | NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO |
| 41 | and we will either increment the IP ID for all frames, or leave it at a |
| 42 | static value based on driver preference. |
| 43 | |
| 44 | UDP Fragmentation Offload |
| 45 | ========================= |
| 46 | |
| 47 | UDP fragmentation offload allows a device to fragment an oversized UDP |
| 48 | datagram into multiple IPv4 fragments. Many of the requirements for UDP |
| 49 | fragmentation offload are the same as TSO. However the IPv4 ID for |
| 50 | fragments should not increment as a single IPv4 datagram is fragmented. |
| 51 | |
| 52 | IPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads |
| 53 | ======================================================== |
| 54 | |
| 55 | In addition to the offloads described above it is possible for a frame to |
| 56 | contain additional headers such as an outer tunnel. In order to account |
| 57 | for such instances an additional set of segmentation offload types were |
Nicolas Dichtel | 11bafd5 | 2017-07-07 14:08:25 +0200 | [diff] [blame^] | 58 | introduced including SKB_GSO_IPXIP4, SKB_GSO_IPXIP6, SKB_GSO_GRE, and |
Alexander Duyck | f7a6272 | 2016-04-10 21:45:09 -0400 | [diff] [blame] | 59 | SKB_GSO_UDP_TUNNEL. These extra segmentation types are used to identify |
| 60 | cases where there are more than just 1 set of headers. For example in the |
| 61 | case of IPIP and SIT we should have the network and transport headers moved |
| 62 | from the standard list of headers to "inner" header offsets. |
| 63 | |
| 64 | Currently only two levels of headers are supported. The convention is to |
| 65 | refer to the tunnel headers as the outer headers, while the encapsulated |
| 66 | data is normally referred to as the inner headers. Below is the list of |
| 67 | calls to access the given headers: |
| 68 | |
| 69 | IPIP/SIT Tunnel: |
| 70 | Outer Inner |
| 71 | MAC skb_mac_header |
| 72 | Network skb_network_header skb_inner_network_header |
| 73 | Transport skb_transport_header |
| 74 | |
| 75 | UDP/GRE Tunnel: |
| 76 | Outer Inner |
| 77 | MAC skb_mac_header skb_inner_mac_header |
| 78 | Network skb_network_header skb_inner_network_header |
| 79 | Transport skb_transport_header skb_inner_transport_header |
| 80 | |
| 81 | In addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and |
| 82 | SKB_GSO_UDP_TUNNEL_CSUM. These two additional tunnel types reflect the |
| 83 | fact that the outer header also requests to have a non-zero checksum |
| 84 | included in the outer header. |
| 85 | |
| 86 | Finally there is SKB_GSO_REMCSUM which indicates that a given tunnel header |
| 87 | has requested a remote checksum offload. In this case the inner headers |
| 88 | will be left with a partial checksum and only the outer header checksum |
| 89 | will be computed. |
| 90 | |
| 91 | Generic Segmentation Offload |
| 92 | ============================ |
| 93 | |
| 94 | Generic segmentation offload is a pure software offload that is meant to |
| 95 | deal with cases where device drivers cannot perform the offloads described |
| 96 | above. What occurs in GSO is that a given skbuff will have its data broken |
| 97 | out over multiple skbuffs that have been resized to match the MSS provided |
| 98 | via skb_shinfo()->gso_size. |
| 99 | |
| 100 | Before enabling any hardware segmentation offload a corresponding software |
| 101 | offload is required in GSO. Otherwise it becomes possible for a frame to |
| 102 | be re-routed between devices and end up being unable to be transmitted. |
| 103 | |
| 104 | Generic Receive Offload |
| 105 | ======================= |
| 106 | |
| 107 | Generic receive offload is the complement to GSO. Ideally any frame |
| 108 | assembled by GRO should be segmented to create an identical sequence of |
| 109 | frames using GSO, and any sequence of frames segmented by GSO should be |
| 110 | able to be reassembled back to the original by GRO. The only exception to |
| 111 | this is IPv4 ID in the case that the DF bit is set for a given IP header. |
| 112 | If the value of the IPv4 ID is not sequentially incrementing it will be |
| 113 | altered so that it is when a frame assembled via GRO is segmented via GSO. |
| 114 | |
| 115 | Partial Generic Segmentation Offload |
| 116 | ==================================== |
| 117 | |
| 118 | Partial generic segmentation offload is a hybrid between TSO and GSO. What |
| 119 | it effectively does is take advantage of certain traits of TCP and tunnels |
| 120 | so that instead of having to rewrite the packet headers for each segment |
| 121 | only the inner-most transport header and possibly the outer-most network |
| 122 | header need to be updated. This allows devices that do not support tunnel |
| 123 | offloads or tunnel offloads with checksum to still make use of segmentation. |
| 124 | |
| 125 | With the partial offload what occurs is that all headers excluding the |
| 126 | inner transport header are updated such that they will contain the correct |
| 127 | values for if the header was simply duplicated. The one exception to this |
| 128 | is the outer IPv4 ID field. It is up to the device drivers to guarantee |
| 129 | that the IPv4 ID field is incremented in the case that a given header does |
| 130 | not have the DF bit set. |