blob: 3a662f2b4d6ec64e8bfc2b78fac5c0da9cf95ce4 [file] [log] [blame]
Sridhar Samudralacfc80d92018-05-24 09:55:15 -07001.. SPDX-License-Identifier: GPL-2.0
2
3============
4NET_FAILOVER
5============
6
7Overview
8========
9
10The net_failover driver provides an automated failover mechanism via APIs
Jonathan Neuschäferf8a0fea2020-03-03 21:22:05 +010011to create and destroy a failover master netdev and manages a primary and
Sridhar Samudralacfc80d92018-05-24 09:55:15 -070012standby slave netdevs that get registered via the generic failover
Jonathan Neuschäferf8a0fea2020-03-03 21:22:05 +010013infrastructure.
Sridhar Samudralacfc80d92018-05-24 09:55:15 -070014
15The failover netdev acts a master device and controls 2 slave devices. The
16original paravirtual interface is registered as 'standby' slave netdev and
17a passthru/vf device with the same MAC gets registered as 'primary' slave
18netdev. Both 'standby' and 'failover' netdevs are associated with the same
19'pci' device. The user accesses the network interface via 'failover' netdev.
20The 'failover' netdev chooses 'primary' netdev as default for transmits when
21it is available with link up and running.
22
23This can be used by paravirtual drivers to enable an alternate low latency
24datapath. It also enables hypervisor controlled live migration of a VM with
25direct attached VF by failing over to the paravirtual datapath when the VF
26is unplugged.
Sridhar Samudralaba5e4422018-05-24 09:55:17 -070027
28virtio-net accelerated datapath: STANDBY mode
29=============================================
30
31net_failover enables hypervisor controlled accelerated datapath to virtio-net
Jonathan Neuschäferf8a0fea2020-03-03 21:22:05 +010032enabled VMs in a transparent manner with no/minimal guest userspace changes.
Sridhar Samudralaba5e4422018-05-24 09:55:17 -070033
34To support this, the hypervisor needs to enable VIRTIO_NET_F_STANDBY
35feature on the virtio-net interface and assign the same MAC address to both
36virtio-net and VF interfaces.
37
Vasudev Kamath738baea2021-11-16 12:51:48 +053038Here is an example libvirt XML snippet that shows such configuration:
Tobin C. Harding28809842018-07-12 07:42:50 +100039::
Sridhar Samudralaba5e4422018-05-24 09:55:17 -070040
Tobin C. Harding28809842018-07-12 07:42:50 +100041 <interface type='network'>
42 <mac address='52:54:00:00:12:53'/>
43 <source network='enp66s0f0_br'/>
44 <target dev='tap01'/>
45 <model type='virtio'/>
46 <driver name='vhost' queues='4'/>
47 <link state='down'/>
Vasudev Kamath738baea2021-11-16 12:51:48 +053048 <teaming type='persistent'/>
49 <alias name='ua-backup0'/>
Tobin C. Harding28809842018-07-12 07:42:50 +100050 </interface>
51 <interface type='hostdev' managed='yes'>
52 <mac address='52:54:00:00:12:53'/>
53 <source>
54 <address type='pci' domain='0x0000' bus='0x42' slot='0x02' function='0x5'/>
55 </source>
Vasudev Kamath738baea2021-11-16 12:51:48 +053056 <teaming type='transient' persistent='ua-backup0'/>
Tobin C. Harding28809842018-07-12 07:42:50 +100057 </interface>
Sridhar Samudralaba5e4422018-05-24 09:55:17 -070058
Vasudev Kamath738baea2021-11-16 12:51:48 +053059In this configuration, the first device definition is for the virtio-net
60interface and this acts as the 'persistent' device indicating that this
61interface will always be plugged in. This is specified by the 'teaming' tag with
62required attribute type having value 'persistent'. The link state for the
63virtio-net device is set to 'down' to ensure that the 'failover' netdev prefers
64the VF passthrough device for normal communication. The virtio-net device will
65be brought UP during live migration to allow uninterrupted communication.
66
67The second device definition is for the VF passthrough interface. Here the
68'teaming' tag is provided with type 'transient' indicating that this device may
69periodically be unplugged. A second attribute - 'persistent' is provided and
70points to the alias name declared for the virtio-net device.
71
Sridhar Samudralaba5e4422018-05-24 09:55:17 -070072Booting a VM with the above configuration will result in the following 3
Vasudev Kamath738baea2021-11-16 12:51:48 +053073interfaces created in the VM:
Tobin C. Harding28809842018-07-12 07:42:50 +100074::
Sridhar Samudralaba5e4422018-05-24 09:55:17 -070075
Tobin C. Harding28809842018-07-12 07:42:50 +100076 4: ens10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
77 link/ether 52:54:00:00:12:53 brd ff:ff:ff:ff:ff:ff
78 inet 192.168.12.53/24 brd 192.168.12.255 scope global dynamic ens10
79 valid_lft 42482sec preferred_lft 42482sec
80 inet6 fe80::97d8:db2:8c10:b6d6/64 scope link
81 valid_lft forever preferred_lft forever
Vasudev Kamath738baea2021-11-16 12:51:48 +053082 5: ens10nsby: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master ens10 state DOWN group default qlen 1000
Tobin C. Harding28809842018-07-12 07:42:50 +100083 link/ether 52:54:00:00:12:53 brd ff:ff:ff:ff:ff:ff
84 7: ens11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ens10 state UP group default qlen 1000
85 link/ether 52:54:00:00:12:53 brd ff:ff:ff:ff:ff:ff
Sridhar Samudralaba5e4422018-05-24 09:55:17 -070086
Vasudev Kamath738baea2021-11-16 12:51:48 +053087Here, ens10 is the 'failover' master interface, ens10nsby is the slave 'standby'
88virtio-net interface, and ens11 is the slave 'primary' VF passthrough interface.
89
90One point to note here is that some user space network configuration daemons
91like systemd-networkd, ifupdown, etc, do not understand the 'net_failover'
92device; and on the first boot, the VM might end up with both 'failover' device
93and VF accquiring IP addresses (either same or different) from the DHCP server.
94This will result in lack of connectivity to the VM. So some tweaks might be
95needed to these network configuration daemons to make sure that an IP is
96received only on the 'failover' device.
97
98Below is the patch snippet used with 'cloud-ifupdown-helper' script found on
99Debian cloud images:
100
101::
102 @@ -27,6 +27,8 @@ do_setup() {
103 local working="$cfgdir/.$INTERFACE"
104 local final="$cfgdir/$INTERFACE"
105
106 + if [ -d "/sys/class/net/${INTERFACE}/master" ]; then exit 0; fi
107 +
108 if ifup --no-act "$INTERFACE" > /dev/null 2>&1; then
109 # interface is already known to ifupdown, no need to generate cfg
110 log "Skipping configuration generation for $INTERFACE"
111
Sridhar Samudralaba5e4422018-05-24 09:55:17 -0700112
113Live Migration of a VM with SR-IOV VF & virtio-net in STANDBY mode
114==================================================================
115
116net_failover also enables hypervisor controlled live migration to be supported
117with VMs that have direct attached SR-IOV VF devices by automatic failover to
118the paravirtual datapath when the VF is unplugged.
119
Vasudev Kamath738baea2021-11-16 12:51:48 +0530120Here is a sample script that shows the steps to initiate live migration from
121the source hypervisor. Note: It is assumed that the VM is connected to a
122software bridge 'br0' which has a single VF attached to it along with the vnet
123device to the VM. This is not the VF that was passthrough'd to the VM (seen in
124the vf.xml file).
Tobin C. Harding28809842018-07-12 07:42:50 +1000125::
Sridhar Samudralaba5e4422018-05-24 09:55:17 -0700126
Vasudev Kamath738baea2021-11-16 12:51:48 +0530127 # cat vf.xml
Tobin C. Harding28809842018-07-12 07:42:50 +1000128 <interface type='hostdev' managed='yes'>
129 <mac address='52:54:00:00:12:53'/>
130 <source>
131 <address type='pci' domain='0x0000' bus='0x42' slot='0x02' function='0x5'/>
132 </source>
Vasudev Kamath738baea2021-11-16 12:51:48 +0530133 <teaming type='transient' persistent='ua-backup0'/>
Tobin C. Harding28809842018-07-12 07:42:50 +1000134 </interface>
Sridhar Samudralaba5e4422018-05-24 09:55:17 -0700135
Vasudev Kamath738baea2021-11-16 12:51:48 +0530136 # Source Hypervisor migrate.sh
Tobin C. Harding28809842018-07-12 07:42:50 +1000137 #!/bin/bash
Sridhar Samudralaba5e4422018-05-24 09:55:17 -0700138
Vasudev Kamath738baea2021-11-16 12:51:48 +0530139 DOMAIN=vm-01
140 PF=ens6np0
141 VF=ens6v1 # VF attached to the bridge.
142 VF_NUM=1
143 TAP_IF=vmtap01 # virtio-net interface in the VM.
144 VF_XML=vf.xml
Sridhar Samudralaba5e4422018-05-24 09:55:17 -0700145
Tobin C. Harding28809842018-07-12 07:42:50 +1000146 MAC=52:54:00:00:12:53
147 ZERO_MAC=00:00:00:00:00:00
Sridhar Samudralaba5e4422018-05-24 09:55:17 -0700148
Vasudev Kamath738baea2021-11-16 12:51:48 +0530149 # Set the virtio-net interface up.
Tobin C. Harding28809842018-07-12 07:42:50 +1000150 virsh domif-setlink $DOMAIN $TAP_IF up
Vasudev Kamath738baea2021-11-16 12:51:48 +0530151
152 # Remove the VF that was passthrough'd to the VM.
153 virsh detach-device --live --config $DOMAIN $VF_XML
154
Tobin C. Harding28809842018-07-12 07:42:50 +1000155 ip link set $PF vf $VF_NUM mac $ZERO_MAC
Sridhar Samudralaba5e4422018-05-24 09:55:17 -0700156
Vasudev Kamath738baea2021-11-16 12:51:48 +0530157 # Add FDB entry for traffic to continue going to the VM via
158 # the VF -> br0 -> vnet interface path.
159 bridge fdb add $MAC dev $VF
160 bridge fdb add $MAC dev $TAP_IF master
Sridhar Samudralaba5e4422018-05-24 09:55:17 -0700161
Vasudev Kamath738baea2021-11-16 12:51:48 +0530162 # Migrate the VM
163 virsh migrate --live --persistent $DOMAIN qemu+ssh://$REMOTE_HOST/system
164
165 # Clean up FDB entries after migration completes.
166 bridge fdb del $MAC dev $VF
167 bridge fdb del $MAC dev $TAP_IF master
168
169On the destination hypervisor, a shared bridge 'br0' is created before migration
170starts, and a VF from the destination PF is added to the bridge. Similarly an
171appropriate FDB entry is added.
172
173The following script is executed on the destination hypervisor once migration
174completes, and it reattaches the VF to the VM and brings down the virtio-net
175interface.
176
177::
178 # reattach-vf.sh
Tobin C. Harding28809842018-07-12 07:42:50 +1000179 #!/bin/bash
Sridhar Samudralaba5e4422018-05-24 09:55:17 -0700180
Vasudev Kamath738baea2021-11-16 12:51:48 +0530181 bridge fdb del 52:54:00:00:12:53 dev ens36v0
182 bridge fdb del 52:54:00:00:12:53 dev vmtap01 master
183 virsh attach-device --config --live vm01 vf.xml
184 virsh domif-setlink vm01 vmtap01 down