blob: f1435b7fcdb700878fc59f6f523622f4632b6540 [file] [log] [blame]
Mauro Carvalho Chehabaee11342020-04-28 00:01:35 +02001.. SPDX-License-Identifier: GPL-2.0
2
3============================
4LC-trie implementation notes
5============================
Robert Olssonb2f57102005-07-05 16:38:26 -07006
7Node types
8----------
Mauro Carvalho Chehabaee11342020-04-28 00:01:35 +02009leaf
Robert Olssonb2f57102005-07-05 16:38:26 -070010 An end node with data. This has a copy of the relevant key, along
11 with 'hlist' with routing table entries sorted by prefix length.
12 See struct leaf and struct leaf_info.
13
14trie node or tnode
15 An internal node, holding an array of child (leaf or tnode) pointers,
16 indexed through a subset of the key. See Level Compression.
17
18A few concepts explained
19------------------------
Mauro Carvalho Chehabaee11342020-04-28 00:01:35 +020020Bits (tnode)
Robert Olssonb2f57102005-07-05 16:38:26 -070021 The number of bits in the key segment used for indexing into the
22 child array - the "child index". See Level Compression.
23
24Pos (tnode)
25 The position (in the key) of the key segment used for indexing into
26 the child array. See Path Compression.
27
28Path Compression / skipped bits
29 Any given tnode is linked to from the child array of its parent, using
Mauro Carvalho Chehabaee11342020-04-28 00:01:35 +020030 a segment of the key specified by the parent's "pos" and "bits"
Robert Olssonb2f57102005-07-05 16:38:26 -070031 In certain cases, this tnode's own "pos" will not be immediately
32 adjacent to the parent (pos+bits), but there will be some bits
33 in the key skipped over because they represent a single path with no
34 deviations. These "skipped bits" constitute Path Compression.
35 Note that the search algorithm will simply skip over these bits when
36 searching, making it necessary to save the keys in the leaves to
37 verify that they actually do match the key we are searching for.
38
39Level Compression / child arrays
40 the trie is kept level balanced moving, under certain conditions, the
41 children of a full child (see "full_children") up one level, so that
42 instead of a pure binary tree, each internal node ("tnode") may
43 contain an arbitrarily large array of links to several children.
44 Conversely, a tnode with a mostly empty child array (see empty_children)
45 may be "halved", having some of its children moved downwards one level,
46 in order to avoid ever-increasing child arrays.
47
48empty_children
49 the number of positions in the child array of a given tnode that are
50 NULL.
51
52full_children
53 the number of children of a given tnode that aren't path compressed.
54 (in other words, they aren't NULL or leaves and their "pos" is equal
55 to this tnode's "pos"+"bits").
56
57 (The word "full" here is used more in the sense of "complete" than
58 as the opposite of "empty", which might be a tad confusing.)
59
60Comments
61---------
62
Mauro Carvalho Chehabaee11342020-04-28 00:01:35 +020063We have tried to keep the structure of the code as close to fib_hash as
64possible to allow verification and help up reviewing.
Robert Olssonb2f57102005-07-05 16:38:26 -070065
66fib_find_node()
67 A good start for understanding this code. This function implements a
68 straightforward trie lookup.
69
70fib_insert_node()
71 Inserts a new leaf node in the trie. This is bit more complicated than
72 fib_find_node(). Inserting a new node means we might have to run the
73 level compression algorithm on part of the trie.
74
75trie_leaf_remove()
76 Looks up a key, deletes it and runs the level compression algorithm.
77
78trie_rebalance()
79 The key function for the dynamic trie after any change in the trie
Duan Jiongfd223062014-12-15 14:58:53 +080080 it is run to optimize and reorganize. It will walk the trie upwards
81 towards the root from a given tnode, doing a resize() at each step
Robert Olssonb2f57102005-07-05 16:38:26 -070082 to implement level compression.
83
84resize()
85 Analyzes a tnode and optimizes the child array size by either inflating
Matt LaPlantea2ffd272006-10-03 22:49:15 +020086 or shrinking it repeatedly until it fulfills the criteria for optimal
Robert Olssonb2f57102005-07-05 16:38:26 -070087 level compression. This part follows the original paper pretty closely
88 and there may be some room for experimentation here.
89
90inflate()
91 Doubles the size of the child array within a tnode. Used by resize().
92
93halve()
94 Halves the size of the child array within a tnode - the inverse of
95 inflate(). Used by resize();
96
97fn_trie_insert(), fn_trie_delete(), fn_trie_select_default()
98 The route manipulation functions. Should conform pretty closely to the
99 corresponding functions in fib_hash.
100
101fn_trie_flush()
102 This walks the full trie (using nextleaf()) and searches for empty
103 leaves which have to be removed.
104
105fn_trie_dump()
106 Dumps the routing table ordered by prefix length. This is somewhat
107 slower than the corresponding fib_hash function, as we have to walk the
108 entire trie for each prefix length. In comparison, fib_hash is organized
109 as one "zone"/hash per prefix length.
110
111Locking
112-------
113
114fib_lock is used for an RW-lock in the same way that this is done in fib_hash.
115However, the functions are somewhat separated for other possible locking
116scenarios. It might conceivably be possible to run trie_rebalance via RCU
117to avoid read_lock in the fn_trie_lookup() function.
118
119Main lookup mechanism
120---------------------
121fn_trie_lookup() is the main lookup function.
122
123The lookup is in its simplest form just like fib_find_node(). We descend the
124trie, key segment by key segment, until we find a leaf. check_leaf() does
125the fib_semantic_match in the leaf's sorted prefix hlist.
126
127If we find a match, we are done.
128
129If we don't find a match, we enter prefix matching mode. The prefix length,
130starting out at the same as the key length, is reduced one step at a time,
131and we backtrack upwards through the trie trying to find a longest matching
132prefix. The goal is always to reach a leaf and get a positive result from the
133fib_semantic_match mechanism.
134
135Inside each tnode, the search for longest matching prefix consists of searching
136through the child array, chopping off (zeroing) the least significant "1" of
137the child index until we find a match or the child index consists of nothing but
138zeros.
139
140At this point we backtrack (t->stats.backtrack++) up the trie, continuing to
141chop off part of the key in order to find the longest matching prefix.
142
143At this point we will repeatedly descend subtries to look for a match, and there
144are some optimizations available that can provide us with "shortcuts" to avoid
145descending into dead ends. Look for "HL_OPTIMIZE" sections in the code.
146
147To alleviate any doubts about the correctness of the route selection process,
148a new netlink operation has been added. Look for NETLINK_FIB_LOOKUP, which
149gives userland access to fib_lookup().