Jakub Kicinski | f42c104 | 2019-05-21 18:57:14 -0700 | [diff] [blame] | 1 | .. _kernel_tls: |
| 2 | |
Jakub Kicinski | f3c0f3c | 2019-05-21 18:57:13 -0700 | [diff] [blame] | 3 | ========== |
| 4 | Kernel TLS |
| 5 | ========== |
| 6 | |
Dave Watson | 99c195f | 2017-06-14 11:37:51 -0700 | [diff] [blame] | 7 | Overview |
| 8 | ======== |
| 9 | |
| 10 | Transport Layer Security (TLS) is a Upper Layer Protocol (ULP) that runs over |
| 11 | TCP. TLS provides end-to-end data integrity and confidentiality. |
| 12 | |
| 13 | User interface |
| 14 | ============== |
| 15 | |
| 16 | Creating a TLS connection |
| 17 | ------------------------- |
| 18 | |
| 19 | First create a new TCP socket and set the TLS ULP. |
| 20 | |
Jakub Kicinski | f3c0f3c | 2019-05-21 18:57:13 -0700 | [diff] [blame] | 21 | .. code-block:: c |
| 22 | |
Dave Watson | 99c195f | 2017-06-14 11:37:51 -0700 | [diff] [blame] | 23 | sock = socket(AF_INET, SOCK_STREAM, 0); |
| 24 | setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls")); |
| 25 | |
| 26 | Setting the TLS ULP allows us to set/get TLS socket options. Currently |
| 27 | only the symmetric encryption is handled in the kernel. After the TLS |
| 28 | handshake is complete, we have all the parameters required to move the |
| 29 | data-path to the kernel. There is a separate socket option for moving |
| 30 | the transmit and the receive into the kernel. |
| 31 | |
Jakub Kicinski | f3c0f3c | 2019-05-21 18:57:13 -0700 | [diff] [blame] | 32 | .. code-block:: c |
| 33 | |
Dave Watson | 99c195f | 2017-06-14 11:37:51 -0700 | [diff] [blame] | 34 | /* From linux/tls.h */ |
| 35 | struct tls_crypto_info { |
| 36 | unsigned short version; |
| 37 | unsigned short cipher_type; |
| 38 | }; |
| 39 | |
| 40 | struct tls12_crypto_info_aes_gcm_128 { |
| 41 | struct tls_crypto_info info; |
| 42 | unsigned char iv[TLS_CIPHER_AES_GCM_128_IV_SIZE]; |
| 43 | unsigned char key[TLS_CIPHER_AES_GCM_128_KEY_SIZE]; |
| 44 | unsigned char salt[TLS_CIPHER_AES_GCM_128_SALT_SIZE]; |
| 45 | unsigned char rec_seq[TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE]; |
| 46 | }; |
| 47 | |
| 48 | |
| 49 | struct tls12_crypto_info_aes_gcm_128 crypto_info; |
| 50 | |
| 51 | crypto_info.info.version = TLS_1_2_VERSION; |
| 52 | crypto_info.info.cipher_type = TLS_CIPHER_AES_GCM_128; |
| 53 | memcpy(crypto_info.iv, iv_write, TLS_CIPHER_AES_GCM_128_IV_SIZE); |
| 54 | memcpy(crypto_info.rec_seq, seq_number_write, |
| 55 | TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE); |
| 56 | memcpy(crypto_info.key, cipher_key_write, TLS_CIPHER_AES_GCM_128_KEY_SIZE); |
| 57 | memcpy(crypto_info.salt, implicit_iv_write, TLS_CIPHER_AES_GCM_128_SALT_SIZE); |
| 58 | |
| 59 | setsockopt(sock, SOL_TLS, TLS_TX, &crypto_info, sizeof(crypto_info)); |
| 60 | |
Dave Watson | b6c535b | 2018-03-22 10:10:44 -0700 | [diff] [blame] | 61 | Transmit and receive are set separately, but the setup is the same, using either |
| 62 | TLS_TX or TLS_RX. |
| 63 | |
Dave Watson | 99c195f | 2017-06-14 11:37:51 -0700 | [diff] [blame] | 64 | Sending TLS application data |
| 65 | ---------------------------- |
| 66 | |
| 67 | After setting the TLS_TX socket option all application data sent over this |
| 68 | socket is encrypted using TLS and the parameters provided in the socket option. |
| 69 | For example, we can send an encrypted hello world record as follows: |
| 70 | |
Jakub Kicinski | f3c0f3c | 2019-05-21 18:57:13 -0700 | [diff] [blame] | 71 | .. code-block:: c |
| 72 | |
Dave Watson | 99c195f | 2017-06-14 11:37:51 -0700 | [diff] [blame] | 73 | const char *msg = "hello world\n"; |
| 74 | send(sock, msg, strlen(msg)); |
| 75 | |
| 76 | send() data is directly encrypted from the userspace buffer provided |
| 77 | to the encrypted kernel send buffer if possible. |
| 78 | |
| 79 | The sendfile system call will send the file's data over TLS records of maximum |
| 80 | length (2^14). |
| 81 | |
Jakub Kicinski | f3c0f3c | 2019-05-21 18:57:13 -0700 | [diff] [blame] | 82 | .. code-block:: c |
| 83 | |
Dave Watson | 99c195f | 2017-06-14 11:37:51 -0700 | [diff] [blame] | 84 | file = open(filename, O_RDONLY); |
| 85 | fstat(file, &stat); |
| 86 | sendfile(sock, file, &offset, stat.st_size); |
| 87 | |
| 88 | TLS records are created and sent after each send() call, unless |
| 89 | MSG_MORE is passed. MSG_MORE will delay creation of a record until |
| 90 | MSG_MORE is not passed, or the maximum record size is reached. |
| 91 | |
| 92 | The kernel will need to allocate a buffer for the encrypted data. |
| 93 | This buffer is allocated at the time send() is called, such that |
| 94 | either the entire send() call will return -ENOMEM (or block waiting |
| 95 | for memory), or the encryption will always succeed. If send() returns |
| 96 | -ENOMEM and some data was left on the socket buffer from a previous |
| 97 | call using MSG_MORE, the MSG_MORE data is left on the socket buffer. |
| 98 | |
Dave Watson | b6c535b | 2018-03-22 10:10:44 -0700 | [diff] [blame] | 99 | Receiving TLS application data |
| 100 | ------------------------------ |
| 101 | |
| 102 | After setting the TLS_RX socket option, all recv family socket calls |
| 103 | are decrypted using TLS parameters provided. A full TLS record must |
| 104 | be received before decryption can happen. |
| 105 | |
Jakub Kicinski | f3c0f3c | 2019-05-21 18:57:13 -0700 | [diff] [blame] | 106 | .. code-block:: c |
| 107 | |
Dave Watson | b6c535b | 2018-03-22 10:10:44 -0700 | [diff] [blame] | 108 | char buffer[16384]; |
| 109 | recv(sock, buffer, 16384); |
| 110 | |
| 111 | Received data is decrypted directly in to the user buffer if it is |
| 112 | large enough, and no additional allocations occur. If the userspace |
| 113 | buffer is too small, data is decrypted in the kernel and copied to |
| 114 | userspace. |
| 115 | |
Jakub Kicinski | f3c0f3c | 2019-05-21 18:57:13 -0700 | [diff] [blame] | 116 | ``EINVAL`` is returned if the TLS version in the received message does not |
Dave Watson | b6c535b | 2018-03-22 10:10:44 -0700 | [diff] [blame] | 117 | match the version passed in setsockopt. |
| 118 | |
Jakub Kicinski | f3c0f3c | 2019-05-21 18:57:13 -0700 | [diff] [blame] | 119 | ``EMSGSIZE`` is returned if the received message is too big. |
Dave Watson | b6c535b | 2018-03-22 10:10:44 -0700 | [diff] [blame] | 120 | |
Jakub Kicinski | f3c0f3c | 2019-05-21 18:57:13 -0700 | [diff] [blame] | 121 | ``EBADMSG`` is returned if decryption failed for any other reason. |
Dave Watson | b6c535b | 2018-03-22 10:10:44 -0700 | [diff] [blame] | 122 | |
Dave Watson | 99c195f | 2017-06-14 11:37:51 -0700 | [diff] [blame] | 123 | Send TLS control messages |
| 124 | ------------------------- |
| 125 | |
| 126 | Other than application data, TLS has control messages such as alert |
| 127 | messages (record type 21) and handshake messages (record type 22), etc. |
| 128 | These messages can be sent over the socket by providing the TLS record type |
| 129 | via a CMSG. For example the following function sends @data of @length bytes |
| 130 | using a record of type @record_type. |
| 131 | |
Jakub Kicinski | f3c0f3c | 2019-05-21 18:57:13 -0700 | [diff] [blame] | 132 | .. code-block:: c |
| 133 | |
| 134 | /* send TLS control message using record_type */ |
Dave Watson | 99c195f | 2017-06-14 11:37:51 -0700 | [diff] [blame] | 135 | static int klts_send_ctrl_message(int sock, unsigned char record_type, |
Jakub Kicinski | f3c0f3c | 2019-05-21 18:57:13 -0700 | [diff] [blame] | 136 | void *data, size_t length) |
Dave Watson | 99c195f | 2017-06-14 11:37:51 -0700 | [diff] [blame] | 137 | { |
| 138 | struct msghdr msg = {0}; |
| 139 | int cmsg_len = sizeof(record_type); |
| 140 | struct cmsghdr *cmsg; |
| 141 | char buf[CMSG_SPACE(cmsg_len)]; |
| 142 | struct iovec msg_iov; /* Vector of data to send/receive into. */ |
| 143 | |
| 144 | msg.msg_control = buf; |
| 145 | msg.msg_controllen = sizeof(buf); |
| 146 | cmsg = CMSG_FIRSTHDR(&msg); |
| 147 | cmsg->cmsg_level = SOL_TLS; |
| 148 | cmsg->cmsg_type = TLS_SET_RECORD_TYPE; |
| 149 | cmsg->cmsg_len = CMSG_LEN(cmsg_len); |
| 150 | *CMSG_DATA(cmsg) = record_type; |
| 151 | msg.msg_controllen = cmsg->cmsg_len; |
| 152 | |
| 153 | msg_iov.iov_base = data; |
| 154 | msg_iov.iov_len = length; |
| 155 | msg.msg_iov = &msg_iov; |
| 156 | msg.msg_iovlen = 1; |
| 157 | |
| 158 | return sendmsg(sock, &msg, 0); |
| 159 | } |
| 160 | |
| 161 | Control message data should be provided unencrypted, and will be |
| 162 | encrypted by the kernel. |
| 163 | |
Dave Watson | b6c535b | 2018-03-22 10:10:44 -0700 | [diff] [blame] | 164 | Receiving TLS control messages |
| 165 | ------------------------------ |
| 166 | |
| 167 | TLS control messages are passed in the userspace buffer, with message |
| 168 | type passed via cmsg. If no cmsg buffer is provided, an error is |
| 169 | returned if a control message is received. Data messages may be |
| 170 | received without a cmsg buffer set. |
| 171 | |
Jakub Kicinski | f3c0f3c | 2019-05-21 18:57:13 -0700 | [diff] [blame] | 172 | .. code-block:: c |
| 173 | |
Dave Watson | b6c535b | 2018-03-22 10:10:44 -0700 | [diff] [blame] | 174 | char buffer[16384]; |
| 175 | char cmsg[CMSG_SPACE(sizeof(unsigned char))]; |
| 176 | struct msghdr msg = {0}; |
| 177 | msg.msg_control = cmsg; |
| 178 | msg.msg_controllen = sizeof(cmsg); |
| 179 | |
| 180 | struct iovec msg_iov; |
| 181 | msg_iov.iov_base = buffer; |
| 182 | msg_iov.iov_len = 16384; |
| 183 | |
| 184 | msg.msg_iov = &msg_iov; |
| 185 | msg.msg_iovlen = 1; |
| 186 | |
| 187 | int ret = recvmsg(sock, &msg, 0 /* flags */); |
| 188 | |
| 189 | struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg); |
| 190 | if (cmsg->cmsg_level == SOL_TLS && |
| 191 | cmsg->cmsg_type == TLS_GET_RECORD_TYPE) { |
| 192 | int record_type = *((unsigned char *)CMSG_DATA(cmsg)); |
| 193 | // Do something with record_type, and control message data in |
| 194 | // buffer. |
| 195 | // |
| 196 | // Note that record_type may be == to application data (23). |
| 197 | } else { |
| 198 | // Buffer contains application data. |
| 199 | } |
| 200 | |
| 201 | recv will never return data from mixed types of TLS records. |
| 202 | |
Dave Watson | 99c195f | 2017-06-14 11:37:51 -0700 | [diff] [blame] | 203 | Integrating in to userspace TLS library |
| 204 | --------------------------------------- |
| 205 | |
| 206 | At a high level, the kernel TLS ULP is a replacement for the record |
| 207 | layer of a userspace TLS library. |
| 208 | |
Jakub Kicinski | f3c0f3c | 2019-05-21 18:57:13 -0700 | [diff] [blame] | 209 | A patchset to OpenSSL to use ktls as the record layer is |
| 210 | `here <https://github.com/Mellanox/openssl/commits/tls_rx2>`_. |
Dave Watson | 99c195f | 2017-06-14 11:37:51 -0700 | [diff] [blame] | 211 | |
Jakub Kicinski | f3c0f3c | 2019-05-21 18:57:13 -0700 | [diff] [blame] | 212 | `An example <https://github.com/ktls/af_ktls-tool/commits/RX>`_ |
| 213 | of calling send directly after a handshake using gnutls. |
| 214 | Since it doesn't implement a full record layer, control |
| 215 | messages are not supported. |
Jakub Kicinski | d26b698 | 2019-10-04 16:19:24 -0700 | [diff] [blame] | 216 | |
| 217 | Statistics |
| 218 | ========== |
| 219 | |
| 220 | TLS implementation exposes the following per-namespace statistics |
| 221 | (``/proc/net/tls_stat``): |
Jakub Kicinski | b32fd3c | 2019-10-04 16:19:25 -0700 | [diff] [blame] | 222 | |
| 223 | - ``TlsCurrTxSw``, ``TlsCurrRxSw`` - |
| 224 | number of TX and RX sessions currently installed where host handles |
| 225 | cryptography |
| 226 | |
| 227 | - ``TlsCurrTxDevice``, ``TlsCurrRxDevice`` - |
| 228 | number of TX and RX sessions currently installed where NIC handles |
| 229 | cryptography |
| 230 | |
| 231 | - ``TlsTxSw``, ``TlsRxSw`` - |
| 232 | number of TX and RX sessions opened with host cryptography |
| 233 | |
| 234 | - ``TlsTxDevice``, ``TlsRxDevice`` - |
| 235 | number of TX and RX sessions opened with NIC cryptography |
Jakub Kicinski | 5c5ec66 | 2019-10-04 16:19:26 -0700 | [diff] [blame] | 236 | |
| 237 | - ``TlsDecryptError`` - |
| 238 | record decryption failed (e.g. due to incorrect authentication tag) |
Jakub Kicinski | a4d26fd | 2019-10-04 16:19:27 -0700 | [diff] [blame] | 239 | |
| 240 | - ``TlsDeviceRxResync`` - |
| 241 | number of RX resyncs sent to NICs handling cryptography |