Mauro Carvalho Chehab | 33155ba | 2020-04-28 00:01:28 +0200 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0 |
| 2 | |
| 3 | ============= |
Ian McDonald | 98069ff | 2005-11-10 13:04:33 -0800 | [diff] [blame] | 4 | DCCP protocol |
Gerrit Renker | 4886fca | 2010-08-29 19:23:13 +0000 | [diff] [blame] | 5 | ============= |
Ian McDonald | 98069ff | 2005-11-10 13:04:33 -0800 | [diff] [blame] | 6 | |
Ian McDonald | 98069ff | 2005-11-10 13:04:33 -0800 | [diff] [blame] | 7 | |
Mauro Carvalho Chehab | 33155ba | 2020-04-28 00:01:28 +0200 | [diff] [blame] | 8 | .. Contents |
| 9 | - Introduction |
| 10 | - Missing features |
| 11 | - Socket options |
| 12 | - Sysctl variables |
| 13 | - IOCTLs |
| 14 | - Other tunables |
| 15 | - Notes |
Ian McDonald | 98069ff | 2005-11-10 13:04:33 -0800 | [diff] [blame] | 16 | |
Gerrit Renker | 4886fca | 2010-08-29 19:23:13 +0000 | [diff] [blame] | 17 | |
Ian McDonald | 98069ff | 2005-11-10 13:04:33 -0800 | [diff] [blame] | 18 | Introduction |
| 19 | ============ |
Ian McDonald | 98069ff | 2005-11-10 13:04:33 -0800 | [diff] [blame] | 20 | Datagram Congestion Control Protocol (DCCP) is an unreliable, connection |
Gerrit Renker | e333b3ed | 2007-11-21 10:09:56 -0200 | [diff] [blame] | 21 | oriented protocol designed to solve issues present in UDP and TCP, particularly |
| 22 | for real-time and multimedia (streaming) traffic. |
Masanari Iida | c17cb8b | 2013-10-30 16:46:15 +0900 | [diff] [blame] | 23 | It divides into a base protocol (RFC 4340) and pluggable congestion control |
| 24 | modules called CCIDs. Like pluggable TCP congestion control, at least one CCID |
Gerrit Renker | e333b3ed | 2007-11-21 10:09:56 -0200 | [diff] [blame] | 25 | needs to be enabled in order for the protocol to function properly. In the Linux |
| 26 | implementation, this is the TCP-like CCID2 (RFC 4341). Additional CCIDs, such as |
| 27 | the TCP-friendly CCID3 (RFC 4342), are optional. |
| 28 | For a brief introduction to CCIDs and suggestions for choosing a CCID to match |
| 29 | given applications, see section 10 of RFC 4340. |
Ian McDonald | 98069ff | 2005-11-10 13:04:33 -0800 | [diff] [blame] | 30 | |
| 31 | It has a base protocol and pluggable congestion control IDs (CCIDs). |
| 32 | |
Gerrit Renker | ebe6f7e | 2007-11-21 10:00:17 -0200 | [diff] [blame] | 33 | DCCP is a Proposed Standard (RFC 2026), and the homepage for DCCP as a protocol |
| 34 | is at http://www.ietf.org/html.charters/dccp-charter.html |
Ian McDonald | 98069ff | 2005-11-10 13:04:33 -0800 | [diff] [blame] | 35 | |
Gerrit Renker | 4886fca | 2010-08-29 19:23:13 +0000 | [diff] [blame] | 36 | |
Ian McDonald | 98069ff | 2005-11-10 13:04:33 -0800 | [diff] [blame] | 37 | Missing features |
| 38 | ================ |
Gerrit Renker | ebe6f7e | 2007-11-21 10:00:17 -0200 | [diff] [blame] | 39 | The Linux DCCP implementation does not currently support all the features that are |
| 40 | specified in RFCs 4340...42. |
Ian McDonald | 98069ff | 2005-11-10 13:04:33 -0800 | [diff] [blame] | 41 | |
Ian McDonald | ddfe10b | 2006-11-20 18:42:45 -0200 | [diff] [blame] | 42 | The known bugs are at: |
Mauro Carvalho Chehab | 33155ba | 2020-04-28 00:01:28 +0200 | [diff] [blame] | 43 | |
Michael Witten | c996d8b | 2010-11-15 19:55:34 +0000 | [diff] [blame] | 44 | http://www.linuxfoundation.org/collaborate/workgroups/networking/todo#DCCP |
Ian McDonald | 98069ff | 2005-11-10 13:04:33 -0800 | [diff] [blame] | 45 | |
Gerrit Renker | ebe6f7e | 2007-11-21 10:00:17 -0200 | [diff] [blame] | 46 | For more up-to-date versions of the DCCP implementation, please consider using |
| 47 | the experimental DCCP test tree; instructions for checking this out are on: |
Michael Witten | c996d8b | 2010-11-15 19:55:34 +0000 | [diff] [blame] | 48 | http://www.linuxfoundation.org/collaborate/workgroups/networking/dccp_testing#Experimental_DCCP_source_tree |
Gerrit Renker | ebe6f7e | 2007-11-21 10:00:17 -0200 | [diff] [blame] | 49 | |
| 50 | |
Ian McDonald | 98069ff | 2005-11-10 13:04:33 -0800 | [diff] [blame] | 51 | Socket options |
| 52 | ============== |
Tomasz Grobelny | 871a2c1 | 2010-12-04 13:38:01 +0100 | [diff] [blame] | 53 | DCCP_SOCKOPT_QPOLICY_ID sets the dequeuing policy for outgoing packets. It takes |
| 54 | a policy ID as argument and can only be set before the connection (i.e. changes |
| 55 | during an established connection are not supported). Currently, two policies are |
| 56 | defined: the "simple" policy (DCCPQ_POLICY_SIMPLE), which does nothing special, |
| 57 | and a priority-based variant (DCCPQ_POLICY_PRIO). The latter allows to pass an |
| 58 | u32 priority value as ancillary data to sendmsg(), where higher numbers indicate |
| 59 | a higher packet priority (similar to SO_PRIORITY). This ancillary data needs to |
Mauro Carvalho Chehab | 33155ba | 2020-04-28 00:01:28 +0200 | [diff] [blame] | 60 | be formatted using a cmsg(3) message header filled in as follows:: |
| 61 | |
Tomasz Grobelny | 871a2c1 | 2010-12-04 13:38:01 +0100 | [diff] [blame] | 62 | cmsg->cmsg_level = SOL_DCCP; |
| 63 | cmsg->cmsg_type = DCCP_SCM_PRIORITY; |
| 64 | cmsg->cmsg_len = CMSG_LEN(sizeof(uint32_t)); /* or CMSG_LEN(4) */ |
| 65 | |
| 66 | DCCP_SOCKOPT_QPOLICY_TXQLEN sets the maximum length of the output queue. A zero |
| 67 | value is always interpreted as unbounded queue length. If different from zero, |
| 68 | the interpretation of this parameter depends on the current dequeuing policy |
| 69 | (see above): the "simple" policy will enforce a fixed queue size by returning |
| 70 | EAGAIN, whereas the "prio" policy enforces a fixed queue length by dropping the |
| 71 | lowest-priority packet first. The default value for this parameter is |
| 72 | initialised from /proc/sys/net/dccp/default/tx_qlen. |
| 73 | |
Gerrit Renker | 00e4d11 | 2006-09-22 09:33:58 +0100 | [diff] [blame] | 74 | DCCP_SOCKOPT_SERVICE sets the service. The specification mandates use of |
| 75 | service codes (RFC 4340, sec. 8.1.2); if this socket option is not set, |
| 76 | the socket will fall back to 0 (which means that no meaningful service code |
Gerrit Renker | 126acd5 | 2007-10-04 14:40:22 -0700 | [diff] [blame] | 77 | is present). On active sockets this is set before connect(); specifying more |
| 78 | than one code has no effect (all subsequent service codes are ignored). The |
| 79 | case is different for passive sockets, where multiple service codes (up to 32) |
| 80 | can be set before calling bind(). |
Ian McDonald | 98069ff | 2005-11-10 13:04:33 -0800 | [diff] [blame] | 81 | |
Gerrit Renker | 7c559a9 | 2007-10-04 14:39:22 -0700 | [diff] [blame] | 82 | DCCP_SOCKOPT_GET_CUR_MPS is read-only and retrieves the current maximum packet |
| 83 | size (application payload size) in bytes, see RFC 4340, section 14. |
| 84 | |
Gerrit Renker | d90ebcb | 2008-11-12 00:47:26 -0800 | [diff] [blame] | 85 | DCCP_SOCKOPT_AVAILABLE_CCIDS is also read-only and returns the list of CCIDs |
Gerrit Renker | 69a6a0b | 2010-02-07 20:20:28 +0000 | [diff] [blame] | 86 | supported by the endpoint. The option value is an array of type uint8_t whose |
| 87 | size is passed as option length. The minimum array size is 4 elements, the |
| 88 | value returned in the optlen argument always reflects the true number of |
| 89 | built-in CCIDs. |
Gerrit Renker | d90ebcb | 2008-11-12 00:47:26 -0800 | [diff] [blame] | 90 | |
Gerrit Renker | b20a9c2 | 2008-11-23 16:02:31 -0800 | [diff] [blame] | 91 | DCCP_SOCKOPT_CCID is write-only and sets both the TX and RX CCIDs at the same |
| 92 | time, combining the operation of the next two socket options. This option is |
Carlos Garcia | c98be0c | 2014-04-04 22:31:00 -0400 | [diff] [blame] | 93 | preferable over the latter two, since often applications will use the same |
Gerrit Renker | b20a9c2 | 2008-11-23 16:02:31 -0800 | [diff] [blame] | 94 | type of CCID for both directions; and mixed use of CCIDs is not currently well |
| 95 | understood. This socket option takes as argument at least one uint8_t value, or |
| 96 | an array of uint8_t values, which must match available CCIDS (see above). CCIDs |
| 97 | must be registered on the socket before calling connect() or listen(). |
| 98 | |
| 99 | DCCP_SOCKOPT_TX_CCID is read/write. It returns the current CCID (if set) or sets |
| 100 | the preference list for the TX CCID, using the same format as DCCP_SOCKOPT_CCID. |
Mauro Carvalho Chehab | 33155ba | 2020-04-28 00:01:28 +0200 | [diff] [blame] | 101 | Please note that the getsockopt argument type here is ``int``, not uint8_t. |
Gerrit Renker | b20a9c2 | 2008-11-23 16:02:31 -0800 | [diff] [blame] | 102 | |
| 103 | DCCP_SOCKOPT_RX_CCID is analogous to DCCP_SOCKOPT_TX_CCID, but for the RX CCID. |
| 104 | |
Gerrit Renker | b8599d2 | 2007-12-13 12:25:01 -0200 | [diff] [blame] | 105 | DCCP_SOCKOPT_SERVER_TIMEWAIT enables the server (listening socket) to hold |
| 106 | timewait state when closing the connection (RFC 4340, 8.3). The usual case is |
| 107 | that the closing server sends a CloseReq, whereupon the client holds timewait |
| 108 | state. When this boolean socket option is on, the server sends a Close instead |
| 109 | and will enter TIMEWAIT. This option must be set after accept() returns. |
| 110 | |
Gerrit Renker | 6f4e5ff | 2006-11-10 17:43:06 -0200 | [diff] [blame] | 111 | DCCP_SOCKOPT_SEND_CSCOV and DCCP_SOCKOPT_RECV_CSCOV are used for setting the |
| 112 | partial checksum coverage (RFC 4340, sec. 9.2). The default is that checksums |
| 113 | always cover the entire packet and that only fully covered application data is |
| 114 | accepted by the receiver. Hence, when using this feature on the sender, it must |
| 115 | be enabled at the receiver, too with suitable choice of CsCov. |
| 116 | |
| 117 | DCCP_SOCKOPT_SEND_CSCOV sets the sender checksum coverage. Values in the |
| 118 | range 0..15 are acceptable. The default setting is 0 (full coverage), |
| 119 | values between 1..15 indicate partial coverage. |
Mauro Carvalho Chehab | 33155ba | 2020-04-28 00:01:28 +0200 | [diff] [blame] | 120 | |
Gerrit Renker | 2bfd754 | 2007-10-04 14:50:57 -0700 | [diff] [blame] | 121 | DCCP_SOCKOPT_RECV_CSCOV is for the receiver and has a different meaning: it |
Gerrit Renker | 6f4e5ff | 2006-11-10 17:43:06 -0200 | [diff] [blame] | 122 | sets a threshold, where again values 0..15 are acceptable. The default |
| 123 | of 0 means that all packets with a partial coverage will be discarded. |
| 124 | Values in the range 1..15 indicate that packets with minimally such a |
| 125 | coverage value are also acceptable. The higher the number, the more |
Gerrit Renker | 2bfd754 | 2007-10-04 14:50:57 -0700 | [diff] [blame] | 126 | restrictive this setting (see [RFC 4340, sec. 9.2.1]). Partial coverage |
| 127 | settings are inherited to the child socket after accept(). |
Gerrit Renker | 6f4e5ff | 2006-11-10 17:43:06 -0200 | [diff] [blame] | 128 | |
Gerrit Renker | f264510 | 2007-03-20 15:01:14 -0300 | [diff] [blame] | 129 | The following two options apply to CCID 3 exclusively and are getsockopt()-only. |
| 130 | In either case, a TFRC info struct (defined in <linux/tfrc.h>) is returned. |
Mauro Carvalho Chehab | 33155ba | 2020-04-28 00:01:28 +0200 | [diff] [blame] | 131 | |
Gerrit Renker | f264510 | 2007-03-20 15:01:14 -0300 | [diff] [blame] | 132 | DCCP_SOCKOPT_CCID_RX_INFO |
Mauro Carvalho Chehab | 33155ba | 2020-04-28 00:01:28 +0200 | [diff] [blame] | 133 | Returns a ``struct tfrc_rx_info`` in optval; the buffer for optval and |
Gerrit Renker | f264510 | 2007-03-20 15:01:14 -0300 | [diff] [blame] | 134 | optlen must be set to at least sizeof(struct tfrc_rx_info). |
Mauro Carvalho Chehab | 33155ba | 2020-04-28 00:01:28 +0200 | [diff] [blame] | 135 | |
Gerrit Renker | f264510 | 2007-03-20 15:01:14 -0300 | [diff] [blame] | 136 | DCCP_SOCKOPT_CCID_TX_INFO |
Mauro Carvalho Chehab | 33155ba | 2020-04-28 00:01:28 +0200 | [diff] [blame] | 137 | Returns a ``struct tfrc_tx_info`` in optval; the buffer for optval and |
Gerrit Renker | f264510 | 2007-03-20 15:01:14 -0300 | [diff] [blame] | 138 | optlen must be set to at least sizeof(struct tfrc_tx_info). |
| 139 | |
Gerrit Renker | 8e8c71f | 2007-11-21 09:56:48 -0200 | [diff] [blame] | 140 | On unidirectional connections it is useful to close the unused half-connection |
| 141 | via shutdown (SHUT_WR or SHUT_RD): this will reduce per-packet processing costs. |
Gerrit Renker | f264510 | 2007-03-20 15:01:14 -0300 | [diff] [blame] | 142 | |
Gerrit Renker | 4886fca | 2010-08-29 19:23:13 +0000 | [diff] [blame] | 143 | |
Gerrit Renker | 2e2e9e9 | 2006-11-13 13:23:52 -0200 | [diff] [blame] | 144 | Sysctl variables |
| 145 | ================ |
| 146 | Several DCCP default parameters can be managed by the following sysctls |
| 147 | (sysctl net.dccp.default or /proc/sys/net/dccp/default): |
| 148 | |
| 149 | request_retries |
| 150 | The number of active connection initiation retries (the number of |
| 151 | Requests minus one) before timing out. In addition, it also governs |
| 152 | the behaviour of the other, passive side: this variable also sets |
| 153 | the number of times DCCP repeats sending a Response when the initial |
| 154 | handshake does not progress from RESPOND to OPEN (i.e. when no Ack |
| 155 | is received after the initial Request). This value should be greater |
| 156 | than 0, suggested is less than 10. Analogue of tcp_syn_retries. |
| 157 | |
| 158 | retries1 |
| 159 | How often a DCCP Response is retransmitted until the listening DCCP |
| 160 | side considers its connecting peer dead. Analogue of tcp_retries1. |
| 161 | |
| 162 | retries2 |
| 163 | The number of times a general DCCP packet is retransmitted. This has |
| 164 | importance for retransmitted acknowledgments and feature negotiation, |
| 165 | data packets are never retransmitted. Analogue of tcp_retries2. |
| 166 | |
Gerrit Renker | 2e2e9e9 | 2006-11-13 13:23:52 -0200 | [diff] [blame] | 167 | tx_ccid = 2 |
Gerrit Renker | 0049bab | 2008-12-08 01:18:05 -0800 | [diff] [blame] | 168 | Default CCID for the sender-receiver half-connection. Depending on the |
| 169 | choice of CCID, the Send Ack Vector feature is enabled automatically. |
Gerrit Renker | 2e2e9e9 | 2006-11-13 13:23:52 -0200 | [diff] [blame] | 170 | |
| 171 | rx_ccid = 2 |
Gerrit Renker | 0049bab | 2008-12-08 01:18:05 -0800 | [diff] [blame] | 172 | Default CCID for the receiver-sender half-connection; see tx_ccid. |
Gerrit Renker | 2e2e9e9 | 2006-11-13 13:23:52 -0200 | [diff] [blame] | 173 | |
| 174 | seq_window = 100 |
Gerrit Renker | 792b487 | 2009-01-16 23:36:31 +0000 | [diff] [blame] | 175 | The initial sequence window (sec. 7.5.2) of the sender. This influences |
| 176 | the local ackno validity and the remote seqno validity windows (7.5.1). |
Gerrit Renker | bfbb234 | 2011-01-02 18:15:58 +0100 | [diff] [blame] | 177 | Values in the range Wmin = 32 (RFC 4340, 7.5.2) up to 2^32-1 can be set. |
Gerrit Renker | 2e2e9e9 | 2006-11-13 13:23:52 -0200 | [diff] [blame] | 178 | |
Ian McDonald | 82e3ab9 | 2006-11-20 19:19:32 -0200 | [diff] [blame] | 179 | tx_qlen = 5 |
| 180 | The size of the transmit buffer in packets. A value of 0 corresponds |
| 181 | to an unbounded transmit buffer. |
| 182 | |
Gerrit Renker | a94f0f9 | 2007-09-26 11:31:49 -0300 | [diff] [blame] | 183 | sync_ratelimit = 125 ms |
| 184 | The timeout between subsequent DCCP-Sync packets sent in response to |
| 185 | sequence-invalid packets on the same socket (RFC 4340, 7.5.4). The unit |
| 186 | of this parameter is milliseconds; a value of 0 disables rate-limiting. |
| 187 | |
Gerrit Renker | 4886fca | 2010-08-29 19:23:13 +0000 | [diff] [blame] | 188 | |
Gerrit Renker | c281490 | 2007-11-21 10:14:31 -0200 | [diff] [blame] | 189 | IOCTLS |
| 190 | ====== |
| 191 | FIONREAD |
Mauro Carvalho Chehab | 33155ba | 2020-04-28 00:01:28 +0200 | [diff] [blame] | 192 | Works as in udp(7): returns in the ``int`` argument pointer the size of |
Gerrit Renker | c281490 | 2007-11-21 10:14:31 -0200 | [diff] [blame] | 193 | the next pending datagram in bytes, or 0 when no datagram is pending. |
| 194 | |
Gerrit Renker | 4886fca | 2010-08-29 19:23:13 +0000 | [diff] [blame] | 195 | |
| 196 | Other tunables |
| 197 | ============== |
| 198 | Per-route rto_min support |
| 199 | CCID-2 supports the RTAX_RTO_MIN per-route setting for the minimum value |
| 200 | of the RTO timer. This setting can be modified via the 'rto_min' option |
Mauro Carvalho Chehab | 33155ba | 2020-04-28 00:01:28 +0200 | [diff] [blame] | 201 | of iproute2; for example:: |
| 202 | |
Gerrit Renker | 4886fca | 2010-08-29 19:23:13 +0000 | [diff] [blame] | 203 | > ip route change 10.0.0.0/24 rto_min 250j dev wlan0 |
| 204 | > ip route add 10.0.0.254/32 rto_min 800j dev wlan0 |
| 205 | > ip route show dev wlan0 |
Mauro Carvalho Chehab | 33155ba | 2020-04-28 00:01:28 +0200 | [diff] [blame] | 206 | |
Gerrit Renker | 89858ad | 2010-08-29 19:23:14 +0000 | [diff] [blame] | 207 | CCID-3 also supports the rto_min setting: it is used to define the lower |
| 208 | bound for the expiry of the nofeedback timer. This can be useful on LANs |
| 209 | with very low RTTs (e.g., loopback, Gbit ethernet). |
Gerrit Renker | 4886fca | 2010-08-29 19:23:13 +0000 | [diff] [blame] | 210 | |
| 211 | |
Ian McDonald | 98069ff | 2005-11-10 13:04:33 -0800 | [diff] [blame] | 212 | Notes |
| 213 | ===== |
Ian McDonald | ddfe10b | 2006-11-20 18:42:45 -0200 | [diff] [blame] | 214 | DCCP does not travel through NAT successfully at present on many boxes. This is |
Gerrit Renker | 126acd5 | 2007-10-04 14:40:22 -0700 | [diff] [blame] | 215 | because the checksum covers the pseudo-header as per TCP and UDP. Linux NAT |
Ian McDonald | ddfe10b | 2006-11-20 18:42:45 -0200 | [diff] [blame] | 216 | support for DCCP has been added. |