blob: 54128c50d508ef27e5c6f2026fc5dddd0df47ead [file] [log] [blame]
Pablo Neira Ayuso19b351f2018-03-28 15:00:43 +02001Netfilter's flowtable infrastructure
2====================================
3
4This documentation describes the software flowtable infrastructure available in
5Netfilter since Linux kernel 4.16.
6
7Overview
8--------
9
10Initial packets follow the classic forwarding path, once the flow enters the
11established state according to the conntrack semantics (ie. we have seen traffic
12in both directions), then you can decide to offload the flow to the flowtable
13from the forward chain via the 'flow offload' action available in nftables.
14
15Packets that find an entry in the flowtable (ie. flowtable hit) are sent to the
16output netdevice via neigh_xmit(), hence, they bypass the classic forwarding
17path (the visible effect is that you do not see these packets from any of the
18netfilter hooks coming after the ingress). In case of flowtable miss, the packet
19follows the classic forward path.
20
21The flowtable uses a resizable hashtable, lookups are based on the following
227-tuple selectors: source, destination, layer 3 and layer 4 protocols, source
23and destination ports and the input interface (useful in case there are several
24conntrack zones in place).
25
26Flowtables are populated via the 'flow offload' nftables action, so the user can
27selectively specify what flows are placed into the flow table. Hence, packets
28follow the classic forwarding path unless the user explicitly instruct packets
29to use this new alternative forwarding path via nftables policy.
30
31This is represented in Fig.1, which describes the classic forwarding path
32including the Netfilter hooks and the flowtable fastpath bypass.
33
34 userspace process
35 ^ |
36 | |
37 _____|____ ____\/___
38 / \ / \
39 | input | | output |
40 \__________/ \_________/
41 ^ |
42 | |
43 _________ __________ --------- _____\/_____
44 / \ / \ |Routing | / \
45 --> ingress ---> prerouting ---> |decision| | postrouting |--> neigh_xmit
46 \_________/ \__________/ ---------- \____________/ ^
47 | ^ | | ^ |
48 flowtable | | ____\/___ | |
49 | | | / \ | |
50 __\/___ | --------->| forward |------------ |
51 |-----| | \_________/ |
52 |-----| | 'flow offload' rule |
53 |-----| | adds entry to |
54 |_____| | flowtable |
55 | | |
56 / \ | |
57 /hit\_no_| |
58 \ ? / |
59 \ / |
60 |__yes_________________fastpath bypass ____________________________|
61
62 Fig.1 Netfilter hooks and flowtable interactions
63
64The flowtable entry also stores the NAT configuration, so all packets are
65mangled according to the NAT policy that matches the initial packets that went
66through the classic forwarding path. The TTL is decremented before calling
67neigh_xmit(). Fragmented traffic is passed up to follow the classic forwarding
68path given that the transport selectors are missing, therefore flowtable lookup
69is not possible.
70
71Example configuration
72---------------------
73
74Enabling the flowtable bypass is relatively easy, you only need to create a
75flowtable and add one rule to your forward chain.
76
77 table inet x {
78 flowtable f {
79 hook ingress priority 0 devices = { eth0, eth1 };
80 }
81 chain y {
82 type filter hook forward priority 0; policy accept;
83 ip protocol tcp flow offload @f
84 counter packets 0 bytes 0
85 }
86 }
87
88This example adds the flowtable 'f' to the ingress hook of the eth0 and eth1
89netdevices. You can create as many flowtables as you want in case you need to
90perform resource partitioning. The flowtable priority defines the order in which
91hooks are run in the pipeline, this is convenient in case you already have a
92nftables ingress chain (make sure the flowtable priority is smaller than the
93nftables ingress chain hence the flowtable runs before in the pipeline).
94
95The 'flow offload' action from the forward chain 'y' adds an entry to the
96flowtable for the TCP syn-ack packet coming in the reply direction. Once the
97flow is offloaded, you will observe that the counter rule in the example above
98does not get updated for the packets that are being forwarded through the
99forwarding bypass.
100
101More reading
102------------
103
104This documentation is based on the LWN.net articles [1][2]. Rafal Milecki also
105made a very complete and comprehensive summary called "A state of network
106acceleration" that describes how things were before this infrastructure was
107mailined [3] and it also makes a rough summary of this work [4].
108
109[1] https://lwn.net/Articles/738214/
110[2] https://lwn.net/Articles/742164/
111[3] http://lists.infradead.org/pipermail/lede-dev/2018-January/010830.html
112[4] http://lists.infradead.org/pipermail/lede-dev/2018-January/010829.html