Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0 |
| 2 | |
| 3 | ============ |
| 4 | Timestamping |
| 5 | ============ |
| 6 | |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 7 | |
| 8 | 1. Control Interfaces |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 9 | ===================== |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 10 | |
| 11 | The interfaces for receiving network packages timestamps are: |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 12 | |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 13 | SO_TIMESTAMP |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 14 | Generates a timestamp for each incoming packet in (not necessarily |
| 15 | monotonic) system time. Reports the timestamp via recvmsg() in a |
Deepa Dinamani | 9dd4921 | 2019-02-02 07:34:52 -0800 | [diff] [blame] | 16 | control message in usec resolution. |
| 17 | SO_TIMESTAMP is defined as SO_TIMESTAMP_NEW or SO_TIMESTAMP_OLD |
| 18 | based on the architecture type and time_t representation of libc. |
| 19 | Control message format is in struct __kernel_old_timeval for |
| 20 | SO_TIMESTAMP_OLD and in struct __kernel_sock_timeval for |
| 21 | SO_TIMESTAMP_NEW options respectively. |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 22 | |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 23 | SO_TIMESTAMPNS |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 24 | Same timestamping mechanism as SO_TIMESTAMP, but reports the |
Deepa Dinamani | 9dd4921 | 2019-02-02 07:34:52 -0800 | [diff] [blame] | 25 | timestamp as struct timespec in nsec resolution. |
| 26 | SO_TIMESTAMPNS is defined as SO_TIMESTAMPNS_NEW or SO_TIMESTAMPNS_OLD |
| 27 | based on the architecture type and time_t representation of libc. |
| 28 | Control message format is in struct timespec for SO_TIMESTAMPNS_OLD |
| 29 | and in struct __kernel_timespec for SO_TIMESTAMPNS_NEW options |
| 30 | respectively. |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 31 | |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 32 | IP_MULTICAST_LOOP + SO_TIMESTAMP[NS] |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 33 | Only for multicast:approximate transmit timestamp obtained by |
| 34 | reading the looped packet receive timestamp. |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 35 | |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 36 | SO_TIMESTAMPING |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 37 | Generates timestamps on reception, transmission or both. Supports |
| 38 | multiple timestamp sources, including hardware. Supports generating |
| 39 | timestamps for stream sockets. |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 40 | |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 41 | |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 42 | 1.1 SO_TIMESTAMP (also SO_TIMESTAMP_OLD and SO_TIMESTAMP_NEW) |
| 43 | ------------------------------------------------------------- |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 44 | |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 45 | This socket option enables timestamping of datagrams on the reception |
| 46 | path. Because the destination socket, if any, is not known early in |
| 47 | the network stack, the feature has to be enabled for all packets. The |
| 48 | same is true for all early receive timestamp options. |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 49 | |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 50 | For interface details, see `man 7 socket`. |
| 51 | |
Deepa Dinamani | 9dd4921 | 2019-02-02 07:34:52 -0800 | [diff] [blame] | 52 | Always use SO_TIMESTAMP_NEW timestamp to always get timestamp in |
| 53 | struct __kernel_sock_timeval format. |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 54 | |
Deepa Dinamani | 9dd4921 | 2019-02-02 07:34:52 -0800 | [diff] [blame] | 55 | SO_TIMESTAMP_OLD returns incorrect timestamps after the year 2038 |
| 56 | on 32 bit machines. |
| 57 | |
| 58 | 1.2 SO_TIMESTAMPNS (also SO_TIMESTAMPNS_OLD and SO_TIMESTAMPNS_NEW): |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 59 | |
| 60 | This option is identical to SO_TIMESTAMP except for the returned data type. |
| 61 | Its struct timespec allows for higher resolution (ns) timestamps than the |
| 62 | timeval of SO_TIMESTAMP (ms). |
| 63 | |
Deepa Dinamani | 9dd4921 | 2019-02-02 07:34:52 -0800 | [diff] [blame] | 64 | Always use SO_TIMESTAMPNS_NEW timestamp to always get timestamp in |
| 65 | struct __kernel_timespec format. |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 66 | |
Deepa Dinamani | 9dd4921 | 2019-02-02 07:34:52 -0800 | [diff] [blame] | 67 | SO_TIMESTAMPNS_OLD returns incorrect timestamps after the year 2038 |
| 68 | on 32 bit machines. |
| 69 | |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 70 | 1.3 SO_TIMESTAMPING (also SO_TIMESTAMPING_OLD and SO_TIMESTAMPING_NEW) |
| 71 | ---------------------------------------------------------------------- |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 72 | |
| 73 | Supports multiple types of timestamp requests. As a result, this |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 74 | socket option takes a bitmap of flags, not a boolean. In:: |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 75 | |
Ahmad Fatoum | 5e34fa2 | 2017-07-08 21:28:44 +0200 | [diff] [blame] | 76 | err = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val)); |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 77 | |
| 78 | val is an integer with any of the following bits set. Setting other |
| 79 | bit returns EINVAL and does not change the current state. |
| 80 | |
Soheil Hassas Yeganeh | fd91e12 | 2016-04-02 23:08:13 -0400 | [diff] [blame] | 81 | The socket option configures timestamp generation for individual |
| 82 | sk_buffs (1.3.1), timestamp reporting to the socket's error |
| 83 | queue (1.3.2) and options (1.3.3). Timestamp generation can also |
| 84 | be enabled for individual sendmsg calls using cmsg (1.3.4). |
| 85 | |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 86 | |
| 87 | 1.3.1 Timestamp Generation |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 88 | ^^^^^^^^^^^^^^^^^^^^^^^^^^ |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 89 | |
| 90 | Some bits are requests to the stack to try to generate timestamps. Any |
| 91 | combination of them is valid. Changes to these bits apply to newly |
| 92 | created packets, not to packets already in the stack. As a result, it |
| 93 | is possible to selectively request timestamps for a subset of packets |
| 94 | (e.g., for sampling) by embedding an send() call within two setsockopt |
| 95 | calls, one to enable timestamp generation and one to disable it. |
| 96 | Timestamps may also be generated for reasons other than being |
| 97 | requested by a particular socket, such as when receive timestamping is |
| 98 | enabled system wide, as explained earlier. |
| 99 | |
| 100 | SOF_TIMESTAMPING_RX_HARDWARE: |
| 101 | Request rx timestamps generated by the network adapter. |
| 102 | |
| 103 | SOF_TIMESTAMPING_RX_SOFTWARE: |
| 104 | Request rx timestamps when data enters the kernel. These timestamps |
| 105 | are generated just after a device driver hands a packet to the |
| 106 | kernel receive stack. |
| 107 | |
| 108 | SOF_TIMESTAMPING_TX_HARDWARE: |
Soheil Hassas Yeganeh | fd91e12 | 2016-04-02 23:08:13 -0400 | [diff] [blame] | 109 | Request tx timestamps generated by the network adapter. This flag |
| 110 | can be enabled via both socket options and control messages. |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 111 | |
| 112 | SOF_TIMESTAMPING_TX_SOFTWARE: |
| 113 | Request tx timestamps when data leaves the kernel. These timestamps |
| 114 | are generated in the device driver as close as possible, but always |
| 115 | prior to, passing the packet to the network interface. Hence, they |
| 116 | require driver support and may not be available for all devices. |
Soheil Hassas Yeganeh | fd91e12 | 2016-04-02 23:08:13 -0400 | [diff] [blame] | 117 | This flag can be enabled via both socket options and control messages. |
| 118 | |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 119 | SOF_TIMESTAMPING_TX_SCHED: |
| 120 | Request tx timestamps prior to entering the packet scheduler. Kernel |
| 121 | transmit latency is, if long, often dominated by queuing delay. The |
| 122 | difference between this timestamp and one taken at |
| 123 | SOF_TIMESTAMPING_TX_SOFTWARE will expose this latency independent |
| 124 | of protocol processing. The latency incurred in protocol |
| 125 | processing, if any, can be computed by subtracting a userspace |
| 126 | timestamp taken immediately before send() from this timestamp. On |
| 127 | machines with virtual devices where a transmitted packet travels |
| 128 | through multiple devices and, hence, multiple packet schedulers, |
| 129 | a timestamp is generated at each layer. This allows for fine |
Soheil Hassas Yeganeh | fd91e12 | 2016-04-02 23:08:13 -0400 | [diff] [blame] | 130 | grained measurement of queuing delay. This flag can be enabled |
| 131 | via both socket options and control messages. |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 132 | |
| 133 | SOF_TIMESTAMPING_TX_ACK: |
| 134 | Request tx timestamps when all data in the send buffer has been |
| 135 | acknowledged. This only makes sense for reliable protocols. It is |
| 136 | currently only implemented for TCP. For that protocol, it may |
| 137 | over-report measurement, because the timestamp is generated when all |
| 138 | data up to and including the buffer at send() was acknowledged: the |
| 139 | cumulative acknowledgment. The mechanism ignores SACK and FACK. |
Soheil Hassas Yeganeh | fd91e12 | 2016-04-02 23:08:13 -0400 | [diff] [blame] | 140 | This flag can be enabled via both socket options and control messages. |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 141 | |
| 142 | |
| 143 | 1.3.2 Timestamp Reporting |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 144 | ^^^^^^^^^^^^^^^^^^^^^^^^^ |
Andrew Lutomirski | adca476 | 2014-03-04 17:24:10 -0800 | [diff] [blame] | 145 | |
| 146 | The other three bits control which timestamps will be reported in a |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 147 | generated control message. Changes to the bits take immediate |
| 148 | effect at the timestamp reporting locations in the stack. Timestamps |
| 149 | are only reported for packets that also have the relevant timestamp |
| 150 | generation request set. |
Andrew Lutomirski | adca476 | 2014-03-04 17:24:10 -0800 | [diff] [blame] | 151 | |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 152 | SOF_TIMESTAMPING_SOFTWARE: |
| 153 | Report any software timestamps when available. |
Andrew Lutomirski | adca476 | 2014-03-04 17:24:10 -0800 | [diff] [blame] | 154 | |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 155 | SOF_TIMESTAMPING_SYS_HARDWARE: |
| 156 | This option is deprecated and ignored. |
Andrew Lutomirski | adca476 | 2014-03-04 17:24:10 -0800 | [diff] [blame] | 157 | |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 158 | SOF_TIMESTAMPING_RAW_HARDWARE: |
| 159 | Report hardware timestamps as generated by |
| 160 | SOF_TIMESTAMPING_TX_HARDWARE when available. |
| 161 | |
| 162 | |
| 163 | 1.3.3 Timestamp Options |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 164 | ^^^^^^^^^^^^^^^^^^^^^^^ |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 165 | |
Willem de Bruijn | 829ae9d | 2014-11-30 22:22:34 -0500 | [diff] [blame] | 166 | The interface supports the options |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 167 | |
| 168 | SOF_TIMESTAMPING_OPT_ID: |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 169 | Generate a unique identifier along with each packet. A process can |
| 170 | have multiple concurrent timestamping requests outstanding. Packets |
| 171 | can be reordered in the transmit path, for instance in the packet |
| 172 | scheduler. In that case timestamps will be queued onto the error |
Willem de Bruijn | cbd3aad | 2014-11-30 22:22:35 -0500 | [diff] [blame] | 173 | queue out of order from the original send() calls. It is not always |
| 174 | possible to uniquely match timestamps to the original send() calls |
| 175 | based on timestamp order or payload inspection alone, then. |
| 176 | |
| 177 | This option associates each packet at send() with a unique |
| 178 | identifier and returns that along with the timestamp. The identifier |
| 179 | is derived from a per-socket u32 counter (that wraps). For datagram |
| 180 | sockets, the counter increments with each sent packet. For stream |
| 181 | sockets, it increments with every byte. |
| 182 | |
| 183 | The counter starts at zero. It is initialized the first time that |
| 184 | the socket option is enabled. It is reset each time the option is |
| 185 | enabled after having been disabled. Resetting the counter does not |
| 186 | change the identifiers of existing packets in the system. |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 187 | |
| 188 | This option is implemented only for transmit timestamps. There, the |
| 189 | timestamp is always looped along with a struct sock_extended_err. |
Andrew Lutomirski | 138a7f4 | 2014-11-24 12:02:29 -0800 | [diff] [blame] | 190 | The option modifies field ee_data to pass an id that is unique |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 191 | among all possibly concurrently outstanding timestamp requests for |
Willem de Bruijn | cbd3aad | 2014-11-30 22:22:35 -0500 | [diff] [blame] | 192 | that socket. |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 193 | |
| 194 | |
Willem de Bruijn | 829ae9d | 2014-11-30 22:22:34 -0500 | [diff] [blame] | 195 | SOF_TIMESTAMPING_OPT_CMSG: |
Willem de Bruijn | 829ae9d | 2014-11-30 22:22:34 -0500 | [diff] [blame] | 196 | Support recv() cmsg for all timestamped packets. Control messages |
| 197 | are already supported unconditionally on all packets with receive |
| 198 | timestamps and on IPv6 packets with transmit timestamp. This option |
| 199 | extends them to IPv4 packets with transmit timestamp. One use case |
| 200 | is to correlate packets with their egress device, by enabling socket |
| 201 | option IP_PKTINFO simultaneously. |
| 202 | |
| 203 | |
Willem de Bruijn | 49ca0d8 | 2015-01-30 13:29:31 -0500 | [diff] [blame] | 204 | SOF_TIMESTAMPING_OPT_TSONLY: |
Willem de Bruijn | 49ca0d8 | 2015-01-30 13:29:31 -0500 | [diff] [blame] | 205 | Applies to transmit timestamps only. Makes the kernel return the |
| 206 | timestamp as a cmsg alongside an empty packet, as opposed to |
| 207 | alongside the original packet. This reduces the amount of memory |
| 208 | charged to the socket's receive budget (SO_RCVBUF) and delivers |
| 209 | the timestamp even if sysctl net.core.tstamp_allow_data is 0. |
| 210 | This option disables SOF_TIMESTAMPING_OPT_CMSG. |
| 211 | |
Francis Yan | 1c88580 | 2016-11-27 23:07:18 -0800 | [diff] [blame] | 212 | SOF_TIMESTAMPING_OPT_STATS: |
Francis Yan | 1c88580 | 2016-11-27 23:07:18 -0800 | [diff] [blame] | 213 | Optional stats that are obtained along with the transmit timestamps. |
| 214 | It must be used together with SOF_TIMESTAMPING_OPT_TSONLY. When the |
| 215 | transmit timestamp is available, the stats are available in a |
| 216 | separate control message of type SCM_TIMESTAMPING_OPT_STATS, as a |
| 217 | list of TLVs (struct nlattr) of types. These stats allow the |
| 218 | application to associate various transport layer stats with |
| 219 | the transmit timestamps, such as how long a certain block of |
| 220 | data was limited by peer's receiver window. |
Willem de Bruijn | 49ca0d8 | 2015-01-30 13:29:31 -0500 | [diff] [blame] | 221 | |
Miroslav Lichvar | aad9c8c | 2017-05-19 17:52:38 +0200 | [diff] [blame] | 222 | SOF_TIMESTAMPING_OPT_PKTINFO: |
Miroslav Lichvar | aad9c8c | 2017-05-19 17:52:38 +0200 | [diff] [blame] | 223 | Enable the SCM_TIMESTAMPING_PKTINFO control message for incoming |
| 224 | packets with hardware timestamps. The message contains struct |
| 225 | scm_ts_pktinfo, which supplies the index of the real interface which |
| 226 | received the packet and its length at layer 2. A valid (non-zero) |
| 227 | interface index will be returned only if CONFIG_NET_RX_BUSY_POLL is |
| 228 | enabled and the driver is using NAPI. The struct contains also two |
| 229 | other fields, but they are reserved and undefined. |
| 230 | |
Miroslav Lichvar | b50a5c7 | 2017-05-19 17:52:40 +0200 | [diff] [blame] | 231 | SOF_TIMESTAMPING_OPT_TX_SWHW: |
Miroslav Lichvar | b50a5c7 | 2017-05-19 17:52:40 +0200 | [diff] [blame] | 232 | Request both hardware and software timestamps for outgoing packets |
| 233 | when SOF_TIMESTAMPING_TX_HARDWARE and SOF_TIMESTAMPING_TX_SOFTWARE |
| 234 | are enabled at the same time. If both timestamps are generated, |
| 235 | two separate messages will be looped to the socket's error queue, |
| 236 | each containing just one timestamp. |
| 237 | |
Willem de Bruijn | 49ca0d8 | 2015-01-30 13:29:31 -0500 | [diff] [blame] | 238 | New applications are encouraged to pass SOF_TIMESTAMPING_OPT_ID to |
| 239 | disambiguate timestamps and SOF_TIMESTAMPING_OPT_TSONLY to operate |
| 240 | regardless of the setting of sysctl net.core.tstamp_allow_data. |
| 241 | |
| 242 | An exception is when a process needs additional cmsg data, for |
| 243 | instance SOL_IP/IP_PKTINFO to detect the egress network interface. |
| 244 | Then pass option SOF_TIMESTAMPING_OPT_CMSG. This option depends on |
| 245 | having access to the contents of the original packet, so cannot be |
| 246 | combined with SOF_TIMESTAMPING_OPT_TSONLY. |
| 247 | |
| 248 | |
Soheil Hassas Yeganeh | fd91e12 | 2016-04-02 23:08:13 -0400 | [diff] [blame] | 249 | 1.3.4. Enabling timestamps via control messages |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 250 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
Soheil Hassas Yeganeh | fd91e12 | 2016-04-02 23:08:13 -0400 | [diff] [blame] | 251 | |
| 252 | In addition to socket options, timestamp generation can be requested |
| 253 | per write via cmsg, only for SOF_TIMESTAMPING_TX_* (see Section 1.3.1). |
| 254 | Using this feature, applications can sample timestamps per sendmsg() |
| 255 | without paying the overhead of enabling and disabling timestamps via |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 256 | setsockopt:: |
Soheil Hassas Yeganeh | fd91e12 | 2016-04-02 23:08:13 -0400 | [diff] [blame] | 257 | |
| 258 | struct msghdr *msg; |
| 259 | ... |
| 260 | cmsg = CMSG_FIRSTHDR(msg); |
| 261 | cmsg->cmsg_level = SOL_SOCKET; |
| 262 | cmsg->cmsg_type = SO_TIMESTAMPING; |
| 263 | cmsg->cmsg_len = CMSG_LEN(sizeof(__u32)); |
| 264 | *((__u32 *) CMSG_DATA(cmsg)) = SOF_TIMESTAMPING_TX_SCHED | |
| 265 | SOF_TIMESTAMPING_TX_SOFTWARE | |
| 266 | SOF_TIMESTAMPING_TX_ACK; |
| 267 | err = sendmsg(fd, msg, 0); |
| 268 | |
| 269 | The SOF_TIMESTAMPING_TX_* flags set via cmsg will override |
| 270 | the SOF_TIMESTAMPING_TX_* flags set via setsockopt. |
| 271 | |
| 272 | Moreover, applications must still enable timestamp reporting via |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 273 | setsockopt to receive timestamps:: |
Soheil Hassas Yeganeh | fd91e12 | 2016-04-02 23:08:13 -0400 | [diff] [blame] | 274 | |
| 275 | __u32 val = SOF_TIMESTAMPING_SOFTWARE | |
| 276 | SOF_TIMESTAMPING_OPT_ID /* or any other flag */; |
Ahmad Fatoum | 5e34fa2 | 2017-07-08 21:28:44 +0200 | [diff] [blame] | 277 | err = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val)); |
Soheil Hassas Yeganeh | fd91e12 | 2016-04-02 23:08:13 -0400 | [diff] [blame] | 278 | |
| 279 | |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 280 | 1.4 Bytestream Timestamps |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 281 | ------------------------- |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 282 | |
| 283 | The SO_TIMESTAMPING interface supports timestamping of bytes in a |
| 284 | bytestream. Each request is interpreted as a request for when the |
| 285 | entire contents of the buffer has passed a timestamping point. That |
| 286 | is, for streams option SOF_TIMESTAMPING_TX_SOFTWARE will record |
| 287 | when all bytes have reached the device driver, regardless of how |
| 288 | many packets the data has been converted into. |
| 289 | |
| 290 | In general, bytestreams have no natural delimiters and therefore |
| 291 | correlating a timestamp with data is non-trivial. A range of bytes |
| 292 | may be split across segments, any segments may be merged (possibly |
| 293 | coalescing sections of previously segmented buffers associated with |
| 294 | independent send() calls). Segments can be reordered and the same |
| 295 | byte range can coexist in multiple segments for protocols that |
| 296 | implement retransmissions. |
| 297 | |
| 298 | It is essential that all timestamps implement the same semantics, |
| 299 | regardless of these possible transformations, as otherwise they are |
| 300 | incomparable. Handling "rare" corner cases differently from the |
| 301 | simple case (a 1:1 mapping from buffer to skb) is insufficient |
| 302 | because performance debugging often needs to focus on such outliers. |
| 303 | |
| 304 | In practice, timestamps can be correlated with segments of a |
| 305 | bytestream consistently, if both semantics of the timestamp and the |
| 306 | timing of measurement are chosen correctly. This challenge is no |
| 307 | different from deciding on a strategy for IP fragmentation. There, the |
| 308 | definition is that only the first fragment is timestamped. For |
| 309 | bytestreams, we chose that a timestamp is generated only when all |
| 310 | bytes have passed a point. SOF_TIMESTAMPING_TX_ACK as defined is easy to |
| 311 | implement and reason about. An implementation that has to take into |
| 312 | account SACK would be more complex due to possible transmission holes |
| 313 | and out of order arrival. |
| 314 | |
| 315 | On the host, TCP can also break the simple 1:1 mapping from buffer to |
| 316 | skbuff as a result of Nagle, cork, autocork, segmentation and GSO. The |
| 317 | implementation ensures correctness in all cases by tracking the |
| 318 | individual last byte passed to send(), even if it is no longer the |
| 319 | last byte after an skbuff extend or merge operation. It stores the |
| 320 | relevant sequence number in skb_shinfo(skb)->tskey. Because an skbuff |
| 321 | has only one such field, only one timestamp can be generated. |
| 322 | |
| 323 | In rare cases, a timestamp request can be missed if two requests are |
| 324 | collapsed onto the same skb. A process can detect this situation by |
| 325 | enabling SOF_TIMESTAMPING_OPT_ID and comparing the byte offset at |
| 326 | send time with the value returned for each timestamp. It can prevent |
| 327 | the situation by always flushing the TCP stack in between requests, |
| 328 | for instance by enabling TCP_NODELAY and disabling TCP_CORK and |
| 329 | autocork. |
| 330 | |
| 331 | These precautions ensure that the timestamp is generated only when all |
| 332 | bytes have passed a timestamp point, assuming that the network stack |
| 333 | itself does not reorder the segments. The stack indeed tries to avoid |
| 334 | reordering. The one exception is under administrator control: it is |
| 335 | possible to construct a packet scheduler configuration that delays |
| 336 | segments from the same stream differently. Such a setup would be |
| 337 | unusual. |
| 338 | |
| 339 | |
| 340 | 2 Data Interfaces |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 341 | ================== |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 342 | |
| 343 | Timestamps are read using the ancillary data feature of recvmsg(). |
| 344 | See `man 3 cmsg` for details of this interface. The socket manual |
| 345 | page (`man 7 socket`) describes how timestamps generated with |
| 346 | SO_TIMESTAMP and SO_TIMESTAMPNS records can be retrieved. |
| 347 | |
| 348 | |
| 349 | 2.1 SCM_TIMESTAMPING records |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 350 | ---------------------------- |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 351 | |
| 352 | These timestamps are returned in a control message with cmsg_level |
| 353 | SOL_SOCKET, cmsg_type SCM_TIMESTAMPING, and payload of type |
Patrick Loschmidt | 6929869 | 2010-04-07 21:52:07 -0700 | [diff] [blame] | 354 | |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 355 | For SO_TIMESTAMPING_OLD:: |
Deepa Dinamani | 9dd4921 | 2019-02-02 07:34:52 -0800 | [diff] [blame] | 356 | |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 357 | struct scm_timestamping { |
| 358 | struct timespec ts[3]; |
| 359 | }; |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 360 | |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 361 | For SO_TIMESTAMPING_NEW:: |
Deepa Dinamani | 9dd4921 | 2019-02-02 07:34:52 -0800 | [diff] [blame] | 362 | |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 363 | struct scm_timestamping64 { |
| 364 | struct __kernel_timespec ts[3]; |
Deepa Dinamani | 9dd4921 | 2019-02-02 07:34:52 -0800 | [diff] [blame] | 365 | |
| 366 | Always use SO_TIMESTAMPING_NEW timestamp to always get timestamp in |
| 367 | struct scm_timestamping64 format. |
| 368 | |
| 369 | SO_TIMESTAMPING_OLD returns incorrect timestamps after the year 2038 |
| 370 | on 32 bit machines. |
| 371 | |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 372 | The structure can return up to three timestamps. This is a legacy |
Miroslav Lichvar | 67953d4 | 2017-05-19 17:52:39 +0200 | [diff] [blame] | 373 | feature. At least one field is non-zero at any time. Most timestamps |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 374 | are passed in ts[0]. Hardware timestamps are passed in ts[2]. |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 375 | |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 376 | ts[1] used to hold hardware timestamps converted to system time. |
| 377 | Instead, expose the hardware clock device on the NIC directly as |
| 378 | a HW PTP clock source, to allow time conversion in userspace and |
| 379 | optionally synchronize system time with a userspace PTP stack such |
Mauro Carvalho Chehab | 329f004 | 2019-06-12 14:52:57 -0300 | [diff] [blame] | 380 | as linuxptp. For the PTP clock API, see Documentation/driver-api/ptp.rst. |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 381 | |
Miroslav Lichvar | 67953d4 | 2017-05-19 17:52:39 +0200 | [diff] [blame] | 382 | Note that if the SO_TIMESTAMP or SO_TIMESTAMPNS option is enabled |
| 383 | together with SO_TIMESTAMPING using SOF_TIMESTAMPING_SOFTWARE, a false |
| 384 | software timestamp will be generated in the recvmsg() call and passed |
| 385 | in ts[0] when a real software timestamp is missing. This happens also |
| 386 | on hardware transmit timestamps. |
| 387 | |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 388 | 2.1.1 Transmit timestamps with MSG_ERRQUEUE |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 389 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 390 | |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 391 | For transmit timestamps the outgoing packet is looped back to the |
| 392 | socket's error queue with the send timestamp(s) attached. A process |
| 393 | receives the timestamps by calling recvmsg() with flag MSG_ERRQUEUE |
| 394 | set and with a msg_control buffer sufficiently large to receive the |
| 395 | relevant metadata structures. The recvmsg call returns the original |
| 396 | outgoing data packet with two ancillary messages attached. |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 397 | |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 398 | A message of cm_level SOL_IP(V6) and cm_type IP(V6)_RECVERR |
| 399 | embeds a struct sock_extended_err. This defines the error type. For |
| 400 | timestamps, the ee_errno field is ENOMSG. The other ancillary message |
| 401 | will have cm_level SOL_SOCKET and cm_type SCM_TIMESTAMPING. This |
| 402 | embeds the struct scm_timestamping. |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 403 | |
| 404 | |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 405 | 2.1.1.2 Timestamp types |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 406 | ~~~~~~~~~~~~~~~~~~~~~~~ |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 407 | |
| 408 | The semantics of the three struct timespec are defined by field |
| 409 | ee_info in the extended error structure. It contains a value of |
| 410 | type SCM_TSTAMP_* to define the actual timestamp passed in |
| 411 | scm_timestamping. |
| 412 | |
| 413 | The SCM_TSTAMP_* types are 1:1 matches to the SOF_TIMESTAMPING_* |
| 414 | control fields discussed previously, with one exception. For legacy |
| 415 | reasons, SCM_TSTAMP_SND is equal to zero and can be set for both |
| 416 | SOF_TIMESTAMPING_TX_HARDWARE and SOF_TIMESTAMPING_TX_SOFTWARE. It |
| 417 | is the first if ts[2] is non-zero, the second otherwise, in which |
| 418 | case the timestamp is stored in ts[0]. |
| 419 | |
| 420 | |
| 421 | 2.1.1.3 Fragmentation |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 422 | ~~~~~~~~~~~~~~~~~~~~~ |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 423 | |
| 424 | Fragmentation of outgoing datagrams is rare, but is possible, e.g., by |
| 425 | explicitly disabling PMTU discovery. If an outgoing packet is fragmented, |
| 426 | then only the first fragment is timestamped and returned to the sending |
| 427 | socket. |
| 428 | |
| 429 | |
| 430 | 2.1.1.4 Packet Payload |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 431 | ~~~~~~~~~~~~~~~~~~~~~~ |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 432 | |
| 433 | The calling application is often not interested in receiving the whole |
| 434 | packet payload that it passed to the stack originally: the socket |
| 435 | error queue mechanism is just a method to piggyback the timestamp on. |
| 436 | In this case, the application can choose to read datagrams with a |
| 437 | smaller buffer, possibly even of length 0. The payload is truncated |
| 438 | accordingly. Until the process calls recvmsg() on the error queue, |
| 439 | however, the full packet is queued, taking up budget from SO_RCVBUF. |
| 440 | |
| 441 | |
| 442 | 2.1.1.5 Blocking Read |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 443 | ~~~~~~~~~~~~~~~~~~~~~ |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 444 | |
| 445 | Reading from the error queue is always a non-blocking operation. To |
| 446 | block waiting on a timestamp, use poll or select. poll() will return |
| 447 | POLLERR in pollfd.revents if any data is ready on the error queue. |
| 448 | There is no need to pass this flag in pollfd.events. This flag is |
| 449 | ignored on request. See also `man 2 poll`. |
| 450 | |
| 451 | |
| 452 | 2.1.2 Receive timestamps |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 453 | ^^^^^^^^^^^^^^^^^^^^^^^^ |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 454 | |
| 455 | On reception, there is no reason to read from the socket error queue. |
| 456 | The SCM_TIMESTAMPING ancillary data is sent along with the packet data |
| 457 | on a normal recvmsg(). Since this is not a socket error, it is not |
| 458 | accompanied by a message SOL_IP(V6)/IP(V6)_RECVERROR. In this case, |
| 459 | the meaning of the three fields in struct scm_timestamping is |
| 460 | implicitly defined. ts[0] holds a software timestamp if set, ts[1] |
| 461 | is again deprecated and ts[2] holds a hardware timestamp if set. |
| 462 | |
| 463 | |
| 464 | 3. Hardware Timestamping configuration: SIOCSHWTSTAMP and SIOCGHWTSTAMP |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 465 | ======================================================================= |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 466 | |
| 467 | Hardware time stamping must also be initialized for each device driver |
Patrick Loschmidt | 6929869 | 2010-04-07 21:52:07 -0700 | [diff] [blame] | 468 | that is expected to do hardware time stamping. The parameter is defined in |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 469 | include/uapi/linux/net_tstamp.h as:: |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 470 | |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 471 | struct hwtstamp_config { |
| 472 | int flags; /* no flags defined right now, must be zero */ |
| 473 | int tx_type; /* HWTSTAMP_TX_* */ |
| 474 | int rx_filter; /* HWTSTAMP_FILTER_* */ |
| 475 | }; |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 476 | |
| 477 | Desired behavior is passed into the kernel and to a specific device by |
| 478 | calling ioctl(SIOCSHWTSTAMP) with a pointer to a struct ifreq whose |
| 479 | ifr_data points to a struct hwtstamp_config. The tx_type and |
| 480 | rx_filter are hints to the driver what it is expected to do. If |
| 481 | the requested fine-grained filtering for incoming packets is not |
| 482 | supported, the driver may time stamp more than just the requested types |
| 483 | of packets. |
| 484 | |
Jacob Keller | eff3cdd | 2015-04-22 14:40:30 -0700 | [diff] [blame] | 485 | Drivers are free to use a more permissive configuration than the requested |
| 486 | configuration. It is expected that drivers should only implement directly the |
| 487 | most generic mode that can be supported. For example if the hardware can |
| 488 | support HWTSTAMP_FILTER_V2_EVENT, then it should generally always upscale |
| 489 | HWTSTAMP_FILTER_V2_L2_SYNC_MESSAGE, and so forth, as HWTSTAMP_FILTER_V2_EVENT |
| 490 | is more generic (and more useful to applications). |
| 491 | |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 492 | A driver which supports hardware time stamping shall update the struct |
| 493 | with the actual, possibly more permissive configuration. If the |
| 494 | requested packets cannot be time stamped, then nothing should be |
| 495 | changed and ERANGE shall be returned (in contrast to EINVAL, which |
| 496 | indicates that SIOCSHWTSTAMP is not supported at all). |
| 497 | |
| 498 | Only a processes with admin rights may change the configuration. User |
| 499 | space is responsible to ensure that multiple processes don't interfere |
| 500 | with each other and that the settings are reset. |
| 501 | |
Ben Hutchings | fd468c7 | 2013-11-14 01:19:29 +0000 | [diff] [blame] | 502 | Any process can read the actual configuration by passing this |
| 503 | structure to ioctl(SIOCGHWTSTAMP) in the same way. However, this has |
| 504 | not been implemented in all drivers. |
| 505 | |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 506 | :: |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 507 | |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 508 | /* possible values for hwtstamp_config->tx_type */ |
| 509 | enum { |
| 510 | /* |
| 511 | * no outgoing packet will need hardware time stamping; |
| 512 | * should a packet arrive which asks for it, no hardware |
| 513 | * time stamping will be done |
| 514 | */ |
| 515 | HWTSTAMP_TX_OFF, |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 516 | |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 517 | /* |
| 518 | * enables hardware time stamping for outgoing packets; |
| 519 | * the sender of the packet decides which are to be |
| 520 | * time stamped by setting SOF_TIMESTAMPING_TX_SOFTWARE |
| 521 | * before sending the packet |
| 522 | */ |
| 523 | HWTSTAMP_TX_ON, |
| 524 | }; |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 525 | |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 526 | /* possible values for hwtstamp_config->rx_filter */ |
| 527 | enum { |
| 528 | /* time stamp no incoming packet at all */ |
| 529 | HWTSTAMP_FILTER_NONE, |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 530 | |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 531 | /* time stamp any incoming packet */ |
| 532 | HWTSTAMP_FILTER_ALL, |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 533 | |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 534 | /* return value: time stamp all packets requested plus some others */ |
| 535 | HWTSTAMP_FILTER_SOME, |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 536 | |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 537 | /* PTP v1, UDP, any kind of event packet */ |
| 538 | HWTSTAMP_FILTER_PTP_V1_L4_EVENT, |
| 539 | |
| 540 | /* for the complete list of values, please check |
| 541 | * the include file include/uapi/linux/net_tstamp.h |
| 542 | */ |
| 543 | }; |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 544 | |
Willem de Bruijn | 8fe2f76 | 2014-08-31 21:27:47 -0400 | [diff] [blame] | 545 | 3.1 Hardware Timestamping Implementation: Device Drivers |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 546 | -------------------------------------------------------- |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 547 | |
| 548 | A driver which supports hardware time stamping must support the |
Patrick Loschmidt | 6929869 | 2010-04-07 21:52:07 -0700 | [diff] [blame] | 549 | SIOCSHWTSTAMP ioctl and update the supplied struct hwtstamp_config with |
Ben Hutchings | fd468c7 | 2013-11-14 01:19:29 +0000 | [diff] [blame] | 550 | the actual values as described in the section on SIOCSHWTSTAMP. It |
| 551 | should also support SIOCGHWTSTAMP. |
Patrick Loschmidt | 6929869 | 2010-04-07 21:52:07 -0700 | [diff] [blame] | 552 | |
| 553 | Time stamps for received packets must be stored in the skb. To get a pointer |
| 554 | to the shared time stamp structure of the skb call skb_hwtstamps(). Then |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 555 | set the time stamps in the structure:: |
Patrick Loschmidt | 6929869 | 2010-04-07 21:52:07 -0700 | [diff] [blame] | 556 | |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 557 | struct skb_shared_hwtstamps { |
| 558 | /* hardware time stamp transformed into duration |
| 559 | * since arbitrary point in time |
| 560 | */ |
| 561 | ktime_t hwtstamp; |
| 562 | }; |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 563 | |
| 564 | Time stamps for outgoing packets are to be generated as follows: |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 565 | |
Oliver Hartkopp | 2244d07 | 2010-08-17 08:59:14 +0000 | [diff] [blame] | 566 | - In hard_start_xmit(), check if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) |
| 567 | is set no-zero. If yes, then the driver is expected to do hardware time |
| 568 | stamping. |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 569 | - If this is possible for the skb and requested, then declare |
Oliver Hartkopp | 2244d07 | 2010-08-17 08:59:14 +0000 | [diff] [blame] | 570 | that the driver is doing the time stamping by setting the flag |
Mauro Carvalho Chehab | 06bfa47 | 2020-04-30 18:04:31 +0200 | [diff] [blame] | 571 | SKBTX_IN_PROGRESS in skb_shinfo(skb)->tx_flags , e.g. with:: |
Oliver Hartkopp | 2244d07 | 2010-08-17 08:59:14 +0000 | [diff] [blame] | 572 | |
| 573 | skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS; |
| 574 | |
| 575 | You might want to keep a pointer to the associated skb for the next step |
| 576 | and not free the skb. A driver not supporting hardware time stamping doesn't |
| 577 | do that. A driver must never touch sk_buff::tstamp! It is used to store |
| 578 | software generated time stamps by the network subsystem. |
Jakub Kicinski | 59cb89e | 2014-03-16 20:32:48 +0100 | [diff] [blame] | 579 | - Driver should call skb_tx_timestamp() as close to passing sk_buff to hardware |
| 580 | as possible. skb_tx_timestamp() provides a software time stamp if requested |
| 581 | and hardware timestamping is not possible (SKBTX_IN_PROGRESS not set). |
Patrick Ohly | cb9eff0 | 2009-02-12 05:03:36 +0000 | [diff] [blame] | 582 | - As soon as the driver has sent the packet and/or obtained a |
| 583 | hardware time stamp for it, it passes the time stamp back by |
| 584 | calling skb_hwtstamp_tx() with the original skb, the raw |
Patrick Loschmidt | 6929869 | 2010-04-07 21:52:07 -0700 | [diff] [blame] | 585 | hardware time stamp. skb_hwtstamp_tx() clones the original skb and |
| 586 | adds the timestamps, therefore the original skb has to be freed now. |
| 587 | If obtaining the hardware time stamp somehow fails, then the driver |
| 588 | should not fall back to software time stamping. The rationale is that |
| 589 | this would occur at a later time in the processing pipeline than other |
| 590 | software time stamping and therefore could lead to unexpected deltas |
| 591 | between time stamps. |
Vladimir Oltean | 94d9f78 | 2020-07-09 23:17:33 +0300 | [diff] [blame] | 592 | |
| 593 | 3.2 Special considerations for stacked PTP Hardware Clocks |
| 594 | ---------------------------------------------------------- |
| 595 | |
| 596 | There are situations when there may be more than one PHC (PTP Hardware Clock) |
| 597 | in the data path of a packet. The kernel has no explicit mechanism to allow the |
| 598 | user to select which PHC to use for timestamping Ethernet frames. Instead, the |
| 599 | assumption is that the outermost PHC is always the most preferable, and that |
| 600 | kernel drivers collaborate towards achieving that goal. Currently there are 3 |
| 601 | cases of stacked PHCs, detailed below: |
| 602 | |
| 603 | 3.2.1 DSA (Distributed Switch Architecture) switches |
| 604 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 605 | |
| 606 | These are Ethernet switches which have one of their ports connected to an |
| 607 | (otherwise completely unaware) host Ethernet interface, and perform the role of |
| 608 | a port multiplier with optional forwarding acceleration features. Each DSA |
| 609 | switch port is visible to the user as a standalone (virtual) network interface, |
| 610 | and its network I/O is performed, under the hood, indirectly through the host |
| 611 | interface (redirecting to the host port on TX, and intercepting frames on RX). |
| 612 | |
| 613 | When a DSA switch is attached to a host port, PTP synchronization has to |
| 614 | suffer, since the switch's variable queuing delay introduces a path delay |
| 615 | jitter between the host port and its PTP partner. For this reason, some DSA |
| 616 | switches include a timestamping clock of their own, and have the ability to |
| 617 | perform network timestamping on their own MAC, such that path delays only |
| 618 | measure wire and PHY propagation latencies. Timestamping DSA switches are |
| 619 | supported in Linux and expose the same ABI as any other network interface (save |
| 620 | for the fact that the DSA interfaces are in fact virtual in terms of network |
| 621 | I/O, they do have their own PHC). It is typical, but not mandatory, for all |
| 622 | interfaces of a DSA switch to share the same PHC. |
| 623 | |
| 624 | By design, PTP timestamping with a DSA switch does not need any special |
| 625 | handling in the driver for the host port it is attached to. However, when the |
| 626 | host port also supports PTP timestamping, DSA will take care of intercepting |
| 627 | the ``.ndo_do_ioctl`` calls towards the host port, and block attempts to enable |
| 628 | hardware timestamping on it. This is because the SO_TIMESTAMPING API does not |
| 629 | allow the delivery of multiple hardware timestamps for the same packet, so |
| 630 | anybody else except for the DSA switch port must be prevented from doing so. |
| 631 | |
| 632 | In code, DSA provides for most of the infrastructure for timestamping already, |
| 633 | in generic code: a BPF classifier (``ptp_classify_raw``) is used to identify |
| 634 | PTP event messages (any other packets, including PTP general messages, are not |
| 635 | timestamped), and provides two hooks to drivers: |
| 636 | |
| 637 | - ``.port_txtstamp()``: The driver is passed a clone of the timestampable skb |
| 638 | to be transmitted, before actually transmitting it. Typically, a switch will |
| 639 | have a PTP TX timestamp register (or sometimes a FIFO) where the timestamp |
| 640 | becomes available. There may be an IRQ that is raised upon this timestamp's |
| 641 | availability, or the driver might have to poll after invoking |
| 642 | ``dev_queue_xmit()`` towards the host interface. Either way, in the |
| 643 | ``.port_txtstamp()`` method, the driver only needs to save the clone for |
| 644 | later use (when the timestamp becomes available). Each skb is annotated with |
| 645 | a pointer to its clone, in ``DSA_SKB_CB(skb)->clone``, to ease the driver's |
| 646 | job of keeping track of which clone belongs to which skb. |
| 647 | |
| 648 | - ``.port_rxtstamp()``: The original (and only) timestampable skb is provided |
| 649 | to the driver, for it to annotate it with a timestamp, if that is immediately |
| 650 | available, or defer to later. On reception, timestamps might either be |
| 651 | available in-band (through metadata in the DSA header, or attached in other |
| 652 | ways to the packet), or out-of-band (through another RX timestamping FIFO). |
| 653 | Deferral on RX is typically necessary when retrieving the timestamp needs a |
| 654 | sleepable context. In that case, it is the responsibility of the DSA driver |
| 655 | to call ``netif_rx_ni()`` on the freshly timestamped skb. |
| 656 | |
| 657 | 3.2.2 Ethernet PHYs |
| 658 | ^^^^^^^^^^^^^^^^^^^ |
| 659 | |
| 660 | These are devices that typically fulfill a Layer 1 role in the network stack, |
| 661 | hence they do not have a representation in terms of a network interface as DSA |
| 662 | switches do. However, PHYs may be able to detect and timestamp PTP packets, for |
| 663 | performance reasons: timestamps taken as close as possible to the wire have the |
| 664 | potential to yield a more stable and precise synchronization. |
| 665 | |
| 666 | A PHY driver that supports PTP timestamping must create a ``struct |
| 667 | mii_timestamper`` and add a pointer to it in ``phydev->mii_ts``. The presence |
| 668 | of this pointer will be checked by the networking stack. |
| 669 | |
| 670 | Since PHYs do not have network interface representations, the timestamping and |
| 671 | ethtool ioctl operations for them need to be mediated by their respective MAC |
| 672 | driver. Therefore, as opposed to DSA switches, modifications need to be done |
| 673 | to each individual MAC driver for PHY timestamping support. This entails: |
| 674 | |
| 675 | - Checking, in ``.ndo_do_ioctl``, whether ``phy_has_hwtstamp(netdev->phydev)`` |
| 676 | is true or not. If it is, then the MAC driver should not process this request |
| 677 | but instead pass it on to the PHY using ``phy_mii_ioctl()``. |
| 678 | |
| 679 | - On RX, special intervention may or may not be needed, depending on the |
| 680 | function used to deliver skb's up the network stack. In the case of plain |
| 681 | ``netif_rx()`` and similar, MAC drivers must check whether |
| 682 | ``skb_defer_rx_timestamp(skb)`` is necessary or not - and if it is, don't |
| 683 | call ``netif_rx()`` at all. If ``CONFIG_NETWORK_PHY_TIMESTAMPING`` is |
| 684 | enabled, and ``skb->dev->phydev->mii_ts`` exists, its ``.rxtstamp()`` hook |
| 685 | will be called now, to determine, using logic very similar to DSA, whether |
| 686 | deferral for RX timestamping is necessary. Again like DSA, it becomes the |
| 687 | responsibility of the PHY driver to send the packet up the stack when the |
| 688 | timestamp is available. |
| 689 | |
| 690 | For other skb receive functions, such as ``napi_gro_receive`` and |
| 691 | ``netif_receive_skb``, the stack automatically checks whether |
| 692 | ``skb_defer_rx_timestamp()`` is necessary, so this check is not needed inside |
| 693 | the driver. |
| 694 | |
| 695 | - On TX, again, special intervention might or might not be needed. The |
| 696 | function that calls the ``mii_ts->txtstamp()`` hook is named |
| 697 | ``skb_clone_tx_timestamp()``. This function can either be called directly |
| 698 | (case in which explicit MAC driver support is indeed needed), but the |
| 699 | function also piggybacks from the ``skb_tx_timestamp()`` call, which many MAC |
| 700 | drivers already perform for software timestamping purposes. Therefore, if a |
| 701 | MAC supports software timestamping, it does not need to do anything further |
| 702 | at this stage. |
| 703 | |
| 704 | 3.2.3 MII bus snooping devices |
| 705 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 706 | |
| 707 | These perform the same role as timestamping Ethernet PHYs, save for the fact |
| 708 | that they are discrete devices and can therefore be used in conjunction with |
| 709 | any PHY even if it doesn't support timestamping. In Linux, they are |
| 710 | discoverable and attachable to a ``struct phy_device`` through Device Tree, and |
| 711 | for the rest, they use the same mii_ts infrastructure as those. See |
| 712 | Documentation/devicetree/bindings/ptp/timestamper.txt for more details. |
| 713 | |
| 714 | 3.2.4 Other caveats for MAC drivers |
| 715 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 716 | |
| 717 | Stacked PHCs, especially DSA (but not only) - since that doesn't require any |
| 718 | modification to MAC drivers, so it is more difficult to ensure correctness of |
| 719 | all possible code paths - is that they uncover bugs which were impossible to |
| 720 | trigger before the existence of stacked PTP clocks. One example has to do with |
| 721 | this line of code, already presented earlier:: |
| 722 | |
| 723 | skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS; |
| 724 | |
| 725 | Any TX timestamping logic, be it a plain MAC driver, a DSA switch driver, a PHY |
| 726 | driver or a MII bus snooping device driver, should set this flag. |
| 727 | But a MAC driver that is unaware of PHC stacking might get tripped up by |
| 728 | somebody other than itself setting this flag, and deliver a duplicate |
| 729 | timestamp. |
| 730 | For example, a typical driver design for TX timestamping might be to split the |
| 731 | transmission part into 2 portions: |
| 732 | |
| 733 | 1. "TX": checks whether PTP timestamping has been previously enabled through |
| 734 | the ``.ndo_do_ioctl`` ("``priv->hwtstamp_tx_enabled == true``") and the |
| 735 | current skb requires a TX timestamp ("``skb_shinfo(skb)->tx_flags & |
| 736 | SKBTX_HW_TSTAMP``"). If this is true, it sets the |
| 737 | "``skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS``" flag. Note: as |
| 738 | described above, in the case of a stacked PHC system, this condition should |
| 739 | never trigger, as this MAC is certainly not the outermost PHC. But this is |
| 740 | not where the typical issue is. Transmission proceeds with this packet. |
| 741 | |
| 742 | 2. "TX confirmation": Transmission has finished. The driver checks whether it |
| 743 | is necessary to collect any TX timestamp for it. Here is where the typical |
| 744 | issues are: the MAC driver takes a shortcut and only checks whether |
| 745 | "``skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS``" was set. With a stacked |
| 746 | PHC system, this is incorrect because this MAC driver is not the only entity |
| 747 | in the TX data path who could have enabled SKBTX_IN_PROGRESS in the first |
| 748 | place. |
| 749 | |
| 750 | The correct solution for this problem is for MAC drivers to have a compound |
| 751 | check in their "TX confirmation" portion, not only for |
| 752 | "``skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS``", but also for |
| 753 | "``priv->hwtstamp_tx_enabled == true``". Because the rest of the system ensures |
| 754 | that PTP timestamping is not enabled for anything other than the outermost PHC, |
| 755 | this enhanced check will avoid delivering a duplicated TX timestamp to user |
| 756 | space. |