Mauro Carvalho Chehab | ea5baca | 2020-04-30 18:04:03 +0200 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0 |
| 2 | |
| 3 | ===================================================== |
Michał Mirosław | e5b1de1 | 2011-07-12 22:27:00 -0700 | [diff] [blame] | 4 | Netdev features mess and how to get out from it alive |
| 5 | ===================================================== |
| 6 | |
| 7 | Author: |
| 8 | Michał Mirosław <mirq-linux@rere.qmqm.pl> |
| 9 | |
| 10 | |
| 11 | |
Mauro Carvalho Chehab | ea5baca | 2020-04-30 18:04:03 +0200 | [diff] [blame] | 12 | Part I: Feature sets |
| 13 | ==================== |
Michał Mirosław | e5b1de1 | 2011-07-12 22:27:00 -0700 | [diff] [blame] | 14 | |
| 15 | Long gone are the days when a network card would just take and give packets |
| 16 | verbatim. Today's devices add multiple features and bugs (read: offloads) |
| 17 | that relieve an OS of various tasks like generating and checking checksums, |
| 18 | splitting packets, classifying them. Those capabilities and their state |
| 19 | are commonly referred to as netdev features in Linux kernel world. |
| 20 | |
| 21 | There are currently three sets of features relevant to the driver, and |
| 22 | one used internally by network core: |
| 23 | |
| 24 | 1. netdev->hw_features set contains features whose state may possibly |
| 25 | be changed (enabled or disabled) for a particular device by user's |
| 26 | request. This set should be initialized in ndo_init callback and not |
| 27 | changed later. |
| 28 | |
| 29 | 2. netdev->features set contains features which are currently enabled |
| 30 | for a device. This should be changed only by network core or in |
| 31 | error paths of ndo_set_features callback. |
| 32 | |
| 33 | 3. netdev->vlan_features set contains features whose state is inherited |
| 34 | by child VLAN devices (limits netdev->features set). This is currently |
| 35 | used for all VLAN devices whether tags are stripped or inserted in |
| 36 | hardware or software. |
| 37 | |
| 38 | 4. netdev->wanted_features set contains feature set requested by user. |
| 39 | This set is filtered by ndo_fix_features callback whenever it or |
| 40 | some device-specific conditions change. This set is internal to |
| 41 | networking core and should not be referenced in drivers. |
| 42 | |
| 43 | |
| 44 | |
Mauro Carvalho Chehab | ea5baca | 2020-04-30 18:04:03 +0200 | [diff] [blame] | 45 | Part II: Controlling enabled features |
| 46 | ===================================== |
Michał Mirosław | e5b1de1 | 2011-07-12 22:27:00 -0700 | [diff] [blame] | 47 | |
| 48 | When current feature set (netdev->features) is to be changed, new set |
| 49 | is calculated and filtered by calling ndo_fix_features callback |
| 50 | and netdev_fix_features(). If the resulting set differs from current |
| 51 | set, it is passed to ndo_set_features callback and (if the callback |
| 52 | returns success) replaces value stored in netdev->features. |
| 53 | NETDEV_FEAT_CHANGE notification is issued after that whenever current |
| 54 | set might have changed. |
| 55 | |
| 56 | The following events trigger recalculation: |
| 57 | 1. device's registration, after ndo_init returned success |
| 58 | 2. user requested changes in features state |
| 59 | 3. netdev_update_features() is called |
| 60 | |
| 61 | ndo_*_features callbacks are called with rtnl_lock held. Missing callbacks |
| 62 | are treated as always returning success. |
| 63 | |
| 64 | A driver that wants to trigger recalculation must do so by calling |
| 65 | netdev_update_features() while holding rtnl_lock. This should not be done |
| 66 | from ndo_*_features callbacks. netdev->features should not be modified by |
| 67 | driver except by means of ndo_fix_features callback. |
| 68 | |
| 69 | |
| 70 | |
Mauro Carvalho Chehab | ea5baca | 2020-04-30 18:04:03 +0200 | [diff] [blame] | 71 | Part III: Implementation hints |
| 72 | ============================== |
Michał Mirosław | e5b1de1 | 2011-07-12 22:27:00 -0700 | [diff] [blame] | 73 | |
| 74 | * ndo_fix_features: |
| 75 | |
| 76 | All dependencies between features should be resolved here. The resulting |
| 77 | set can be reduced further by networking core imposed limitations (as coded |
| 78 | in netdev_fix_features()). For this reason it is safer to disable a feature |
| 79 | when its dependencies are not met instead of forcing the dependency on. |
| 80 | |
| 81 | This callback should not modify hardware nor driver state (should be |
| 82 | stateless). It can be called multiple times between successive |
| 83 | ndo_set_features calls. |
| 84 | |
| 85 | Callback must not alter features contained in NETIF_F_SOFT_FEATURES or |
| 86 | NETIF_F_NEVER_CHANGE sets. The exception is NETIF_F_VLAN_CHALLENGED but |
| 87 | care must be taken as the change won't affect already configured VLANs. |
| 88 | |
| 89 | * ndo_set_features: |
| 90 | |
| 91 | Hardware should be reconfigured to match passed feature set. The set |
| 92 | should not be altered unless some error condition happens that can't |
| 93 | be reliably detected in ndo_fix_features. In this case, the callback |
| 94 | should update netdev->features to match resulting hardware state. |
| 95 | Errors returned are not (and cannot be) propagated anywhere except dmesg. |
| 96 | (Note: successful return is zero, >0 means silent error.) |
| 97 | |
| 98 | |
| 99 | |
Mauro Carvalho Chehab | ea5baca | 2020-04-30 18:04:03 +0200 | [diff] [blame] | 100 | Part IV: Features |
| 101 | ================= |
Michał Mirosław | e5b1de1 | 2011-07-12 22:27:00 -0700 | [diff] [blame] | 102 | |
| 103 | For current list of features, see include/linux/netdev_features.h. |
| 104 | This section describes semantics of some of them. |
| 105 | |
| 106 | * Transmit checksumming |
| 107 | |
| 108 | For complete description, see comments near the top of include/linux/skbuff.h. |
| 109 | |
| 110 | Note: NETIF_F_HW_CSUM is a superset of NETIF_F_IP_CSUM + NETIF_F_IPV6_CSUM. |
| 111 | It means that device can fill TCP/UDP-like checksum anywhere in the packets |
| 112 | whatever headers there might be. |
| 113 | |
| 114 | * Transmit TCP segmentation offload |
| 115 | |
| 116 | NETIF_F_TSO_ECN means that hardware can properly split packets with CWR bit |
| 117 | set, be it TCPv4 (when NETIF_F_TSO is enabled) or TCPv6 (NETIF_F_TSO6). |
| 118 | |
Willem de Bruijn | 83aa025 | 2018-04-26 13:42:21 -0400 | [diff] [blame] | 119 | * Transmit UDP segmentation offload |
| 120 | |
Jesse Brandeburg | 09e58b2 | 2018-11-07 21:40:17 -0800 | [diff] [blame] | 121 | NETIF_F_GSO_UDP_L4 accepts a single UDP header with a payload that exceeds |
Willem de Bruijn | 83aa025 | 2018-04-26 13:42:21 -0400 | [diff] [blame] | 122 | gso_size. On segmentation, it segments the payload on gso_size boundaries and |
| 123 | replicates the network and UDP headers (fixing up the last one if less than |
| 124 | gso_size). |
| 125 | |
Michał Mirosław | e5b1de1 | 2011-07-12 22:27:00 -0700 | [diff] [blame] | 126 | * Transmit DMA from high memory |
| 127 | |
| 128 | On platforms where this is relevant, NETIF_F_HIGHDMA signals that |
| 129 | ndo_start_xmit can handle skbs with frags in high memory. |
| 130 | |
| 131 | * Transmit scatter-gather |
| 132 | |
| 133 | Those features say that ndo_start_xmit can handle fragmented skbs: |
| 134 | NETIF_F_SG --- paged skbs (skb_shinfo()->frags), NETIF_F_FRAGLIST --- |
| 135 | chained skbs (skb->next/prev list). |
| 136 | |
| 137 | * Software features |
| 138 | |
| 139 | Features contained in NETIF_F_SOFT_FEATURES are features of networking |
| 140 | stack. Driver should not change behaviour based on them. |
| 141 | |
| 142 | * LLTX driver (deprecated for hardware drivers) |
| 143 | |
Florian Westphal | f0cdf76 | 2016-04-24 21:38:14 +0200 | [diff] [blame] | 144 | NETIF_F_LLTX is meant to be used by drivers that don't need locking at all, |
| 145 | e.g. software tunnels. |
Michał Mirosław | e5b1de1 | 2011-07-12 22:27:00 -0700 | [diff] [blame] | 146 | |
Florian Westphal | f0cdf76 | 2016-04-24 21:38:14 +0200 | [diff] [blame] | 147 | This is also used in a few legacy drivers that implement their |
| 148 | own locking, don't use it for new (hardware) drivers. |
Michał Mirosław | e5b1de1 | 2011-07-12 22:27:00 -0700 | [diff] [blame] | 149 | |
| 150 | * netns-local device |
| 151 | |
| 152 | NETIF_F_NETNS_LOCAL is set for devices that are not allowed to move between |
| 153 | network namespaces (e.g. loopback). |
| 154 | |
| 155 | Don't use it in drivers. |
| 156 | |
| 157 | * VLAN challenged |
| 158 | |
| 159 | NETIF_F_VLAN_CHALLENGED should be set for devices which can't cope with VLAN |
| 160 | headers. Some drivers set this because the cards can't handle the bigger MTU. |
| 161 | [FIXME: Those cases could be fixed in VLAN code by allowing only reduced-MTU |
| 162 | VLANs. This may be not useful, though.] |
Ben Greear | 36eabda3 | 2012-02-11 15:39:14 +0000 | [diff] [blame] | 163 | |
| 164 | * rx-fcs |
| 165 | |
| 166 | This requests that the NIC append the Ethernet Frame Checksum (FCS) |
| 167 | to the end of the skb data. This allows sniffers and other tools to |
| 168 | read the CRC recorded by the NIC on receipt of the packet. |
Ben Greear | 5e0c03c | 2012-02-11 15:39:45 +0000 | [diff] [blame] | 169 | |
| 170 | * rx-all |
| 171 | |
| 172 | This requests that the NIC receive all possible frames, including errored |
| 173 | frames (such as bad FCS, etc). This can be helpful when sniffing a link with |
| 174 | bad packets on it. Some NICs may receive more packets if also put into normal |
Kirill Smelkov | 73e212f | 2012-11-10 07:12:36 +0000 | [diff] [blame] | 175 | PROMISC mode. |
Michael Chan | fb1f5f7 | 2017-12-16 03:09:40 -0500 | [diff] [blame] | 176 | |
| 177 | * rx-gro-hw |
| 178 | |
| 179 | This requests that the NIC enables Hardware GRO (generic receive offload). |
| 180 | Hardware GRO is basically the exact reverse of TSO, and is generally |
| 181 | stricter than Hardware LRO. A packet stream merged by Hardware GRO must |
| 182 | be re-segmentable by GSO or TSO back to the exact original packet stream. |
| 183 | Hardware GRO is dependent on RXCSUM since every packet successfully merged |
| 184 | by hardware must also have the checksum verified by hardware. |
George McCollister | dcf0cd1 | 2021-02-09 19:02:11 -0600 | [diff] [blame] | 185 | |
| 186 | * hsr-tag-ins-offload |
| 187 | |
| 188 | This should be set for devices which insert an HSR (High-availability Seamless |
| 189 | Redundancy) or PRP (Parallel Redundancy Protocol) tag automatically. |
| 190 | |
| 191 | * hsr-tag-rm-offload |
| 192 | |
| 193 | This should be set for devices which remove HSR (High-availability Seamless |
| 194 | Redundancy) or PRP (Parallel Redundancy Protocol) tags automatically. |
| 195 | |
| 196 | * hsr-fwd-offload |
| 197 | |
| 198 | This should be set for devices which forward HSR (High-availability Seamless |
| 199 | Redundancy) frames from one port to another in hardware. |
| 200 | |
| 201 | * hsr-dup-offload |
| 202 | |
| 203 | This should be set for devices which duplicate outgoing HSR (High-availability |
| 204 | Seamless Redundancy) or PRP (Parallel Redundancy Protocol) tags automatically |
| 205 | frames in hardware. |