Mauro Carvalho Chehab | 97162a1 | 2019-06-08 23:27:03 -0300 | [diff] [blame] | 1 | ================================================================= |
| 2 | Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC) |
| 3 | ================================================================= |
| 4 | |
Vishwanathapura, Niranjana | c73690c | 2017-04-12 20:29:19 -0700 | [diff] [blame] | 5 | Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC) feature |
| 6 | supports Ethernet functionality over Omni-Path fabric by encapsulating |
| 7 | the Ethernet packets between HFI nodes. |
| 8 | |
| 9 | Architecture |
| 10 | ============= |
| 11 | The patterns of exchanges of Omni-Path encapsulated Ethernet packets |
| 12 | involves one or more virtual Ethernet switches overlaid on the Omni-Path |
| 13 | fabric topology. A subset of HFI nodes on the Omni-Path fabric are |
| 14 | permitted to exchange encapsulated Ethernet packets across a particular |
| 15 | virtual Ethernet switch. The virtual Ethernet switches are logical |
| 16 | abstractions achieved by configuring the HFI nodes on the fabric for |
| 17 | header generation and processing. In the simplest configuration all HFI |
| 18 | nodes across the fabric exchange encapsulated Ethernet packets over a |
| 19 | single virtual Ethernet switch. A virtual Ethernet switch, is effectively |
| 20 | an independent Ethernet network. The configuration is performed by an |
| 21 | Ethernet Manager (EM) which is part of the trusted Fabric Manager (FM) |
| 22 | application. HFI nodes can have multiple VNICs each connected to a |
| 23 | different virtual Ethernet switch. The below diagram presents a case |
Mauro Carvalho Chehab | 97162a1 | 2019-06-08 23:27:03 -0300 | [diff] [blame] | 24 | of two virtual Ethernet switches with two HFI nodes:: |
Vishwanathapura, Niranjana | c73690c | 2017-04-12 20:29:19 -0700 | [diff] [blame] | 25 | |
Mauro Carvalho Chehab | 97162a1 | 2019-06-08 23:27:03 -0300 | [diff] [blame] | 26 | +-------------------+ |
| 27 | | Subnet/ | |
| 28 | | Ethernet | |
| 29 | | Manager | |
| 30 | +-------------------+ |
| 31 | / / |
| 32 | / / |
| 33 | / / |
| 34 | / / |
| 35 | +-----------------------------+ +------------------------------+ |
| 36 | | Virtual Ethernet Switch | | Virtual Ethernet Switch | |
| 37 | | +---------+ +---------+ | | +---------+ +---------+ | |
| 38 | | | VPORT | | VPORT | | | | VPORT | | VPORT | | |
| 39 | +--+---------+----+---------+-+ +-+---------+----+---------+---+ |
| 40 | | \ / | |
| 41 | | \ / | |
| 42 | | \/ | |
| 43 | | / \ | |
| 44 | | / \ | |
| 45 | +-----------+------------+ +-----------+------------+ |
| 46 | | VNIC | VNIC | | VNIC | VNIC | |
| 47 | +-----------+------------+ +-----------+------------+ |
| 48 | | HFI | | HFI | |
| 49 | +------------------------+ +------------------------+ |
Vishwanathapura, Niranjana | c73690c | 2017-04-12 20:29:19 -0700 | [diff] [blame] | 50 | |
| 51 | |
| 52 | The Omni-Path encapsulated Ethernet packet format is as described below. |
| 53 | |
Mauro Carvalho Chehab | 97162a1 | 2019-06-08 23:27:03 -0300 | [diff] [blame] | 54 | ==================== ================================ |
| 55 | Bits Field |
| 56 | ==================== ================================ |
Vishwanathapura, Niranjana | c73690c | 2017-04-12 20:29:19 -0700 | [diff] [blame] | 57 | Quad Word 0: |
Mauro Carvalho Chehab | 97162a1 | 2019-06-08 23:27:03 -0300 | [diff] [blame] | 58 | 0-19 SLID (lower 20 bits) |
| 59 | 20-30 Length (in Quad Words) |
| 60 | 31 BECN bit |
| 61 | 32-51 DLID (lower 20 bits) |
| 62 | 52-56 SC (Service Class) |
| 63 | 57-59 RC (Routing Control) |
| 64 | 60 FECN bit |
| 65 | 61-62 L2 (=10, 16B format) |
| 66 | 63 LT (=1, Link Transfer Head Flit) |
Vishwanathapura, Niranjana | c73690c | 2017-04-12 20:29:19 -0700 | [diff] [blame] | 67 | |
| 68 | Quad Word 1: |
Mauro Carvalho Chehab | 97162a1 | 2019-06-08 23:27:03 -0300 | [diff] [blame] | 69 | 0-7 L4 type (=0x78 ETHERNET) |
| 70 | 8-11 SLID[23:20] |
| 71 | 12-15 DLID[23:20] |
| 72 | 16-31 PKEY |
| 73 | 32-47 Entropy |
| 74 | 48-63 Reserved |
Vishwanathapura, Niranjana | c73690c | 2017-04-12 20:29:19 -0700 | [diff] [blame] | 75 | |
| 76 | Quad Word 2: |
Mauro Carvalho Chehab | 97162a1 | 2019-06-08 23:27:03 -0300 | [diff] [blame] | 77 | 0-15 Reserved |
| 78 | 16-31 L4 header |
| 79 | 32-63 Ethernet Packet |
Vishwanathapura, Niranjana | c73690c | 2017-04-12 20:29:19 -0700 | [diff] [blame] | 80 | |
| 81 | Quad Words 3 to N-1: |
Mauro Carvalho Chehab | 97162a1 | 2019-06-08 23:27:03 -0300 | [diff] [blame] | 82 | 0-63 Ethernet packet (pad extended) |
Vishwanathapura, Niranjana | c73690c | 2017-04-12 20:29:19 -0700 | [diff] [blame] | 83 | |
| 84 | Quad Word N (last): |
Mauro Carvalho Chehab | 97162a1 | 2019-06-08 23:27:03 -0300 | [diff] [blame] | 85 | 0-23 Ethernet packet (pad extended) |
| 86 | 24-55 ICRC |
| 87 | 56-61 Tail |
| 88 | 62-63 LT (=01, Link Transfer Tail Flit) |
| 89 | ==================== ================================ |
Vishwanathapura, Niranjana | c73690c | 2017-04-12 20:29:19 -0700 | [diff] [blame] | 90 | |
| 91 | Ethernet packet is padded on the transmit side to ensure that the VNIC OPA |
| 92 | packet is quad word aligned. The 'Tail' field contains the number of bytes |
| 93 | padded. On the receive side the 'Tail' field is read and the padding is |
| 94 | removed (along with ICRC, Tail and OPA header) before passing packet up |
| 95 | the network stack. |
| 96 | |
| 97 | The L4 header field contains the virtual Ethernet switch id the VNIC port |
| 98 | belongs to. On the receive side, this field is used to de-multiplex the |
| 99 | received VNIC packets to different VNIC ports. |
| 100 | |
| 101 | Driver Design |
| 102 | ============== |
| 103 | Intel OPA VNIC software design is presented in the below diagram. |
| 104 | OPA VNIC functionality has a HW dependent component and a HW |
| 105 | independent component. |
| 106 | |
| 107 | The support has been added for IB device to allocate and free the RDMA |
| 108 | netdev devices. The RDMA netdev supports interfacing with the network |
| 109 | stack thus creating standard network interfaces. OPA_VNIC is an RDMA |
| 110 | netdev device type. |
| 111 | |
| 112 | The HW dependent VNIC functionality is part of the HFI1 driver. It |
| 113 | implements the verbs to allocate and free the OPA_VNIC RDMA netdev. |
| 114 | It involves HW resource allocation/management for VNIC functionality. |
| 115 | It interfaces with the network stack and implements the required |
| 116 | net_device_ops functions. It expects Omni-Path encapsulated Ethernet |
| 117 | packets in the transmit path and provides HW access to them. It strips |
| 118 | the Omni-Path header from the received packets before passing them up |
| 119 | the network stack. It also implements the RDMA netdev control operations. |
| 120 | |
| 121 | The OPA VNIC module implements the HW independent VNIC functionality. |
| 122 | It consists of two parts. The VNIC Ethernet Management Agent (VEMA) |
| 123 | registers itself with IB core as an IB client and interfaces with the |
| 124 | IB MAD stack. It exchanges the management information with the Ethernet |
| 125 | Manager (EM) and the VNIC netdev. The VNIC netdev part allocates and frees |
| 126 | the OPA_VNIC RDMA netdev devices. It overrides the net_device_ops functions |
| 127 | set by HW dependent VNIC driver where required to accommodate any control |
| 128 | operation. It also handles the encapsulation of Ethernet packets with an |
| 129 | Omni-Path header in the transmit path. For each VNIC interface, the |
| 130 | information required for encapsulation is configured by the EM via VEMA MAD |
| 131 | interface. It also passes any control information to the HW dependent driver |
Mauro Carvalho Chehab | 97162a1 | 2019-06-08 23:27:03 -0300 | [diff] [blame] | 132 | by invoking the RDMA netdev control operations:: |
Vishwanathapura, Niranjana | c73690c | 2017-04-12 20:29:19 -0700 | [diff] [blame] | 133 | |
| 134 | +-------------------+ +----------------------+ |
| 135 | | | | Linux | |
| 136 | | IB MAD | | Network | |
| 137 | | | | Stack | |
| 138 | +-------------------+ +----------------------+ |
| 139 | | | | |
| 140 | | | | |
| 141 | +----------------------------+ | |
| 142 | | | | |
| 143 | | OPA VNIC Module | | |
| 144 | | (OPA VNIC RDMA Netdev | | |
| 145 | | & EMA functions) | | |
| 146 | | | | |
| 147 | +----------------------------+ | |
| 148 | | | |
| 149 | | | |
| 150 | +------------------+ | |
| 151 | | IB core | | |
| 152 | +------------------+ | |
| 153 | | | |
| 154 | | | |
| 155 | +--------------------------------------------+ |
| 156 | | | |
| 157 | | HFI1 Driver with VNIC support | |
| 158 | | | |
| 159 | +--------------------------------------------+ |