Sridhar Samudrala | cfc80d9 | 2018-05-24 09:55:15 -0700 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0 |
| 2 | |
| 3 | ============ |
| 4 | NET_FAILOVER |
| 5 | ============ |
| 6 | |
| 7 | Overview |
| 8 | ======== |
| 9 | |
| 10 | The net_failover driver provides an automated failover mechanism via APIs |
Jonathan Neuschäfer | f8a0fea | 2020-03-03 21:22:05 +0100 | [diff] [blame] | 11 | to create and destroy a failover master netdev and manages a primary and |
Sridhar Samudrala | cfc80d9 | 2018-05-24 09:55:15 -0700 | [diff] [blame] | 12 | standby slave netdevs that get registered via the generic failover |
Jonathan Neuschäfer | f8a0fea | 2020-03-03 21:22:05 +0100 | [diff] [blame] | 13 | infrastructure. |
Sridhar Samudrala | cfc80d9 | 2018-05-24 09:55:15 -0700 | [diff] [blame] | 14 | |
| 15 | The failover netdev acts a master device and controls 2 slave devices. The |
| 16 | original paravirtual interface is registered as 'standby' slave netdev and |
| 17 | a passthru/vf device with the same MAC gets registered as 'primary' slave |
| 18 | netdev. Both 'standby' and 'failover' netdevs are associated with the same |
| 19 | 'pci' device. The user accesses the network interface via 'failover' netdev. |
| 20 | The 'failover' netdev chooses 'primary' netdev as default for transmits when |
| 21 | it is available with link up and running. |
| 22 | |
| 23 | This can be used by paravirtual drivers to enable an alternate low latency |
| 24 | datapath. It also enables hypervisor controlled live migration of a VM with |
| 25 | direct attached VF by failing over to the paravirtual datapath when the VF |
| 26 | is unplugged. |
Sridhar Samudrala | ba5e442 | 2018-05-24 09:55:17 -0700 | [diff] [blame] | 27 | |
| 28 | virtio-net accelerated datapath: STANDBY mode |
| 29 | ============================================= |
| 30 | |
| 31 | net_failover enables hypervisor controlled accelerated datapath to virtio-net |
Jonathan Neuschäfer | f8a0fea | 2020-03-03 21:22:05 +0100 | [diff] [blame] | 32 | enabled VMs in a transparent manner with no/minimal guest userspace changes. |
Sridhar Samudrala | ba5e442 | 2018-05-24 09:55:17 -0700 | [diff] [blame] | 33 | |
| 34 | To support this, the hypervisor needs to enable VIRTIO_NET_F_STANDBY |
| 35 | feature on the virtio-net interface and assign the same MAC address to both |
| 36 | virtio-net and VF interfaces. |
| 37 | |
Vasudev Kamath | 738baea | 2021-11-16 12:51:48 +0530 | [diff] [blame] | 38 | Here is an example libvirt XML snippet that shows such configuration: |
Tobin C. Harding | 2880984 | 2018-07-12 07:42:50 +1000 | [diff] [blame] | 39 | :: |
Sridhar Samudrala | ba5e442 | 2018-05-24 09:55:17 -0700 | [diff] [blame] | 40 | |
Tobin C. Harding | 2880984 | 2018-07-12 07:42:50 +1000 | [diff] [blame] | 41 | <interface type='network'> |
| 42 | <mac address='52:54:00:00:12:53'/> |
| 43 | <source network='enp66s0f0_br'/> |
| 44 | <target dev='tap01'/> |
| 45 | <model type='virtio'/> |
| 46 | <driver name='vhost' queues='4'/> |
| 47 | <link state='down'/> |
Vasudev Kamath | 738baea | 2021-11-16 12:51:48 +0530 | [diff] [blame] | 48 | <teaming type='persistent'/> |
| 49 | <alias name='ua-backup0'/> |
Tobin C. Harding | 2880984 | 2018-07-12 07:42:50 +1000 | [diff] [blame] | 50 | </interface> |
| 51 | <interface type='hostdev' managed='yes'> |
| 52 | <mac address='52:54:00:00:12:53'/> |
| 53 | <source> |
| 54 | <address type='pci' domain='0x0000' bus='0x42' slot='0x02' function='0x5'/> |
| 55 | </source> |
Vasudev Kamath | 738baea | 2021-11-16 12:51:48 +0530 | [diff] [blame] | 56 | <teaming type='transient' persistent='ua-backup0'/> |
Tobin C. Harding | 2880984 | 2018-07-12 07:42:50 +1000 | [diff] [blame] | 57 | </interface> |
Sridhar Samudrala | ba5e442 | 2018-05-24 09:55:17 -0700 | [diff] [blame] | 58 | |
Vasudev Kamath | 738baea | 2021-11-16 12:51:48 +0530 | [diff] [blame] | 59 | In this configuration, the first device definition is for the virtio-net |
| 60 | interface and this acts as the 'persistent' device indicating that this |
| 61 | interface will always be plugged in. This is specified by the 'teaming' tag with |
| 62 | required attribute type having value 'persistent'. The link state for the |
| 63 | virtio-net device is set to 'down' to ensure that the 'failover' netdev prefers |
| 64 | the VF passthrough device for normal communication. The virtio-net device will |
| 65 | be brought UP during live migration to allow uninterrupted communication. |
| 66 | |
| 67 | The second device definition is for the VF passthrough interface. Here the |
| 68 | 'teaming' tag is provided with type 'transient' indicating that this device may |
| 69 | periodically be unplugged. A second attribute - 'persistent' is provided and |
| 70 | points to the alias name declared for the virtio-net device. |
| 71 | |
Sridhar Samudrala | ba5e442 | 2018-05-24 09:55:17 -0700 | [diff] [blame] | 72 | Booting a VM with the above configuration will result in the following 3 |
Vasudev Kamath | 738baea | 2021-11-16 12:51:48 +0530 | [diff] [blame] | 73 | interfaces created in the VM: |
Tobin C. Harding | 2880984 | 2018-07-12 07:42:50 +1000 | [diff] [blame] | 74 | :: |
Sridhar Samudrala | ba5e442 | 2018-05-24 09:55:17 -0700 | [diff] [blame] | 75 | |
Tobin C. Harding | 2880984 | 2018-07-12 07:42:50 +1000 | [diff] [blame] | 76 | 4: ens10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 |
| 77 | link/ether 52:54:00:00:12:53 brd ff:ff:ff:ff:ff:ff |
| 78 | inet 192.168.12.53/24 brd 192.168.12.255 scope global dynamic ens10 |
| 79 | valid_lft 42482sec preferred_lft 42482sec |
| 80 | inet6 fe80::97d8:db2:8c10:b6d6/64 scope link |
| 81 | valid_lft forever preferred_lft forever |
Vasudev Kamath | 738baea | 2021-11-16 12:51:48 +0530 | [diff] [blame] | 82 | 5: ens10nsby: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master ens10 state DOWN group default qlen 1000 |
Tobin C. Harding | 2880984 | 2018-07-12 07:42:50 +1000 | [diff] [blame] | 83 | link/ether 52:54:00:00:12:53 brd ff:ff:ff:ff:ff:ff |
| 84 | 7: ens11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ens10 state UP group default qlen 1000 |
| 85 | link/ether 52:54:00:00:12:53 brd ff:ff:ff:ff:ff:ff |
Sridhar Samudrala | ba5e442 | 2018-05-24 09:55:17 -0700 | [diff] [blame] | 86 | |
Vasudev Kamath | 738baea | 2021-11-16 12:51:48 +0530 | [diff] [blame] | 87 | Here, ens10 is the 'failover' master interface, ens10nsby is the slave 'standby' |
| 88 | virtio-net interface, and ens11 is the slave 'primary' VF passthrough interface. |
| 89 | |
| 90 | One point to note here is that some user space network configuration daemons |
| 91 | like systemd-networkd, ifupdown, etc, do not understand the 'net_failover' |
| 92 | device; and on the first boot, the VM might end up with both 'failover' device |
| 93 | and VF accquiring IP addresses (either same or different) from the DHCP server. |
| 94 | This will result in lack of connectivity to the VM. So some tweaks might be |
| 95 | needed to these network configuration daemons to make sure that an IP is |
| 96 | received only on the 'failover' device. |
| 97 | |
| 98 | Below is the patch snippet used with 'cloud-ifupdown-helper' script found on |
| 99 | Debian cloud images: |
| 100 | |
| 101 | :: |
| 102 | @@ -27,6 +27,8 @@ do_setup() { |
| 103 | local working="$cfgdir/.$INTERFACE" |
| 104 | local final="$cfgdir/$INTERFACE" |
| 105 | |
| 106 | + if [ -d "/sys/class/net/${INTERFACE}/master" ]; then exit 0; fi |
| 107 | + |
| 108 | if ifup --no-act "$INTERFACE" > /dev/null 2>&1; then |
| 109 | # interface is already known to ifupdown, no need to generate cfg |
| 110 | log "Skipping configuration generation for $INTERFACE" |
| 111 | |
Sridhar Samudrala | ba5e442 | 2018-05-24 09:55:17 -0700 | [diff] [blame] | 112 | |
| 113 | Live Migration of a VM with SR-IOV VF & virtio-net in STANDBY mode |
| 114 | ================================================================== |
| 115 | |
| 116 | net_failover also enables hypervisor controlled live migration to be supported |
| 117 | with VMs that have direct attached SR-IOV VF devices by automatic failover to |
| 118 | the paravirtual datapath when the VF is unplugged. |
| 119 | |
Vasudev Kamath | 738baea | 2021-11-16 12:51:48 +0530 | [diff] [blame] | 120 | Here is a sample script that shows the steps to initiate live migration from |
| 121 | the source hypervisor. Note: It is assumed that the VM is connected to a |
| 122 | software bridge 'br0' which has a single VF attached to it along with the vnet |
| 123 | device to the VM. This is not the VF that was passthrough'd to the VM (seen in |
| 124 | the vf.xml file). |
Tobin C. Harding | 2880984 | 2018-07-12 07:42:50 +1000 | [diff] [blame] | 125 | :: |
Sridhar Samudrala | ba5e442 | 2018-05-24 09:55:17 -0700 | [diff] [blame] | 126 | |
Vasudev Kamath | 738baea | 2021-11-16 12:51:48 +0530 | [diff] [blame] | 127 | # cat vf.xml |
Tobin C. Harding | 2880984 | 2018-07-12 07:42:50 +1000 | [diff] [blame] | 128 | <interface type='hostdev' managed='yes'> |
| 129 | <mac address='52:54:00:00:12:53'/> |
| 130 | <source> |
| 131 | <address type='pci' domain='0x0000' bus='0x42' slot='0x02' function='0x5'/> |
| 132 | </source> |
Vasudev Kamath | 738baea | 2021-11-16 12:51:48 +0530 | [diff] [blame] | 133 | <teaming type='transient' persistent='ua-backup0'/> |
Tobin C. Harding | 2880984 | 2018-07-12 07:42:50 +1000 | [diff] [blame] | 134 | </interface> |
Sridhar Samudrala | ba5e442 | 2018-05-24 09:55:17 -0700 | [diff] [blame] | 135 | |
Vasudev Kamath | 738baea | 2021-11-16 12:51:48 +0530 | [diff] [blame] | 136 | # Source Hypervisor migrate.sh |
Tobin C. Harding | 2880984 | 2018-07-12 07:42:50 +1000 | [diff] [blame] | 137 | #!/bin/bash |
Sridhar Samudrala | ba5e442 | 2018-05-24 09:55:17 -0700 | [diff] [blame] | 138 | |
Vasudev Kamath | 738baea | 2021-11-16 12:51:48 +0530 | [diff] [blame] | 139 | DOMAIN=vm-01 |
| 140 | PF=ens6np0 |
| 141 | VF=ens6v1 # VF attached to the bridge. |
| 142 | VF_NUM=1 |
| 143 | TAP_IF=vmtap01 # virtio-net interface in the VM. |
| 144 | VF_XML=vf.xml |
Sridhar Samudrala | ba5e442 | 2018-05-24 09:55:17 -0700 | [diff] [blame] | 145 | |
Tobin C. Harding | 2880984 | 2018-07-12 07:42:50 +1000 | [diff] [blame] | 146 | MAC=52:54:00:00:12:53 |
| 147 | ZERO_MAC=00:00:00:00:00:00 |
Sridhar Samudrala | ba5e442 | 2018-05-24 09:55:17 -0700 | [diff] [blame] | 148 | |
Vasudev Kamath | 738baea | 2021-11-16 12:51:48 +0530 | [diff] [blame] | 149 | # Set the virtio-net interface up. |
Tobin C. Harding | 2880984 | 2018-07-12 07:42:50 +1000 | [diff] [blame] | 150 | virsh domif-setlink $DOMAIN $TAP_IF up |
Vasudev Kamath | 738baea | 2021-11-16 12:51:48 +0530 | [diff] [blame] | 151 | |
| 152 | # Remove the VF that was passthrough'd to the VM. |
| 153 | virsh detach-device --live --config $DOMAIN $VF_XML |
| 154 | |
Tobin C. Harding | 2880984 | 2018-07-12 07:42:50 +1000 | [diff] [blame] | 155 | ip link set $PF vf $VF_NUM mac $ZERO_MAC |
Sridhar Samudrala | ba5e442 | 2018-05-24 09:55:17 -0700 | [diff] [blame] | 156 | |
Vasudev Kamath | 738baea | 2021-11-16 12:51:48 +0530 | [diff] [blame] | 157 | # Add FDB entry for traffic to continue going to the VM via |
| 158 | # the VF -> br0 -> vnet interface path. |
| 159 | bridge fdb add $MAC dev $VF |
| 160 | bridge fdb add $MAC dev $TAP_IF master |
Sridhar Samudrala | ba5e442 | 2018-05-24 09:55:17 -0700 | [diff] [blame] | 161 | |
Vasudev Kamath | 738baea | 2021-11-16 12:51:48 +0530 | [diff] [blame] | 162 | # Migrate the VM |
| 163 | virsh migrate --live --persistent $DOMAIN qemu+ssh://$REMOTE_HOST/system |
| 164 | |
| 165 | # Clean up FDB entries after migration completes. |
| 166 | bridge fdb del $MAC dev $VF |
| 167 | bridge fdb del $MAC dev $TAP_IF master |
| 168 | |
| 169 | On the destination hypervisor, a shared bridge 'br0' is created before migration |
| 170 | starts, and a VF from the destination PF is added to the bridge. Similarly an |
| 171 | appropriate FDB entry is added. |
| 172 | |
| 173 | The following script is executed on the destination hypervisor once migration |
| 174 | completes, and it reattaches the VF to the VM and brings down the virtio-net |
| 175 | interface. |
| 176 | |
| 177 | :: |
| 178 | # reattach-vf.sh |
Tobin C. Harding | 2880984 | 2018-07-12 07:42:50 +1000 | [diff] [blame] | 179 | #!/bin/bash |
Sridhar Samudrala | ba5e442 | 2018-05-24 09:55:17 -0700 | [diff] [blame] | 180 | |
Vasudev Kamath | 738baea | 2021-11-16 12:51:48 +0530 | [diff] [blame] | 181 | bridge fdb del 52:54:00:00:12:53 dev ens36v0 |
| 182 | bridge fdb del 52:54:00:00:12:53 dev vmtap01 master |
| 183 | virsh attach-device --config --live vm01 vf.xml |
| 184 | virsh domif-setlink vm01 vmtap01 down |