Jakub Kicinski | 9b03808 | 2020-04-09 14:21:58 -0700 | [diff] [blame] | 1 | ====================================================== |
Tal Gilboa | faf4db00 | 2018-03-21 20:33:45 +0200 | [diff] [blame] | 2 | Net DIM - Generic Network Dynamic Interrupt Moderation |
| 3 | ====================================================== |
| 4 | |
Jakub Kicinski | 9b03808 | 2020-04-09 14:21:58 -0700 | [diff] [blame] | 5 | :Author: Tal Gilboa <talgi@mellanox.com> |
Tal Gilboa | faf4db00 | 2018-03-21 20:33:45 +0200 | [diff] [blame] | 6 | |
Jakub Kicinski | 9b03808 | 2020-04-09 14:21:58 -0700 | [diff] [blame] | 7 | .. contents:: :depth: 2 |
Tal Gilboa | faf4db00 | 2018-03-21 20:33:45 +0200 | [diff] [blame] | 8 | |
Jakub Kicinski | 9b03808 | 2020-04-09 14:21:58 -0700 | [diff] [blame] | 9 | Assumptions |
| 10 | =========== |
Tal Gilboa | faf4db00 | 2018-03-21 20:33:45 +0200 | [diff] [blame] | 11 | |
| 12 | This document assumes the reader has basic knowledge in network drivers |
| 13 | and in general interrupt moderation. |
| 14 | |
| 15 | |
Jakub Kicinski | 9b03808 | 2020-04-09 14:21:58 -0700 | [diff] [blame] | 16 | Introduction |
| 17 | ============ |
Tal Gilboa | faf4db00 | 2018-03-21 20:33:45 +0200 | [diff] [blame] | 18 | |
| 19 | Dynamic Interrupt Moderation (DIM) (in networking) refers to changing the |
| 20 | interrupt moderation configuration of a channel in order to optimize packet |
| 21 | processing. The mechanism includes an algorithm which decides if and how to |
| 22 | change moderation parameters for a channel, usually by performing an analysis on |
| 23 | runtime data sampled from the system. Net DIM is such a mechanism. In each |
| 24 | iteration of the algorithm, it analyses a given sample of the data, compares it |
| 25 | to the previous sample and if required, it can decide to change some of the |
| 26 | interrupt moderation configuration fields. The data sample is composed of data |
| 27 | bandwidth, the number of packets and the number of events. The time between |
| 28 | samples is also measured. Net DIM compares the current and the previous data and |
| 29 | returns an adjusted interrupt moderation configuration object. In some cases, |
| 30 | the algorithm might decide not to change anything. The configuration fields are |
| 31 | the minimum duration (microseconds) allowed between events and the maximum |
| 32 | number of wanted packets per event. The Net DIM algorithm ascribes importance to |
| 33 | increase bandwidth over reducing interrupt rate. |
| 34 | |
| 35 | |
Jakub Kicinski | 9b03808 | 2020-04-09 14:21:58 -0700 | [diff] [blame] | 36 | Net DIM Algorithm |
| 37 | ================= |
Tal Gilboa | faf4db00 | 2018-03-21 20:33:45 +0200 | [diff] [blame] | 38 | |
| 39 | Each iteration of the Net DIM algorithm follows these steps: |
Jakub Kicinski | 9b03808 | 2020-04-09 14:21:58 -0700 | [diff] [blame] | 40 | |
| 41 | #. Calculates new data sample. |
| 42 | #. Compares it to previous sample. |
| 43 | #. Makes a decision - suggests interrupt moderation configuration fields. |
| 44 | #. Applies a schedule work function, which applies suggested configuration. |
Tal Gilboa | faf4db00 | 2018-03-21 20:33:45 +0200 | [diff] [blame] | 45 | |
| 46 | The first two steps are straightforward, both the new and the previous data are |
| 47 | supplied by the driver registered to Net DIM. The previous data is the new data |
| 48 | supplied to the previous iteration. The comparison step checks the difference |
| 49 | between the new and previous data and decides on the result of the last step. |
| 50 | A step would result as "better" if bandwidth increases and as "worse" if |
| 51 | bandwidth reduces. If there is no change in bandwidth, the packet rate is |
| 52 | compared in a similar fashion - increase == "better" and decrease == "worse". |
| 53 | In case there is no change in the packet rate as well, the interrupt rate is |
| 54 | compared. Here the algorithm tries to optimize for lower interrupt rate so an |
| 55 | increase in the interrupt rate is considered "worse" and a decrease is |
| 56 | considered "better". Step #2 has an optimization for avoiding false results: it |
| 57 | only considers a difference between samples as valid if it is greater than a |
| 58 | certain percentage. Also, since Net DIM does not measure anything by itself, it |
| 59 | assumes the data provided by the driver is valid. |
| 60 | |
| 61 | Step #3 decides on the suggested configuration based on the result from step #2 |
| 62 | and the internal state of the algorithm. The states reflect the "direction" of |
| 63 | the algorithm: is it going left (reducing moderation), right (increasing |
| 64 | moderation) or standing still. Another optimization is that if a decision |
| 65 | to stay still is made multiple times, the interval between iterations of the |
| 66 | algorithm would increase in order to reduce calculation overhead. Also, after |
| 67 | "parking" on one of the most left or most right decisions, the algorithm may |
| 68 | decide to verify this decision by taking a step in the other direction. This is |
| 69 | done in order to avoid getting stuck in a "deep sleep" scenario. Once a |
| 70 | decision is made, an interrupt moderation configuration is selected from |
| 71 | the predefined profiles. |
| 72 | |
| 73 | The last step is to notify the registered driver that it should apply the |
| 74 | suggested configuration. This is done by scheduling a work function, defined by |
| 75 | the Net DIM API and provided by the registered driver. |
| 76 | |
| 77 | As you can see, Net DIM itself does not actively interact with the system. It |
| 78 | would have trouble making the correct decisions if the wrong data is supplied to |
| 79 | it and it would be useless if the work function would not apply the suggested |
| 80 | configuration. This does, however, allow the registered driver some room for |
| 81 | manoeuvre as it may provide partial data or ignore the algorithm suggestion |
| 82 | under some conditions. |
| 83 | |
| 84 | |
Jakub Kicinski | 9b03808 | 2020-04-09 14:21:58 -0700 | [diff] [blame] | 85 | Registering a Network Device to DIM |
| 86 | =================================== |
Tal Gilboa | faf4db00 | 2018-03-21 20:33:45 +0200 | [diff] [blame] | 87 | |
Jakub Kicinski | 9b03808 | 2020-04-09 14:21:58 -0700 | [diff] [blame] | 88 | Net DIM API exposes the main function net_dim(). |
| 89 | This function is the entry point to the Net |
Tal Gilboa | faf4db00 | 2018-03-21 20:33:45 +0200 | [diff] [blame] | 90 | DIM algorithm and has to be called every time the driver would like to check if |
| 91 | it should change interrupt moderation parameters. The driver should provide two |
Jakub Kicinski | 9b03808 | 2020-04-09 14:21:58 -0700 | [diff] [blame] | 92 | data structures: :c:type:`struct dim <dim>` and |
| 93 | :c:type:`struct dim_sample <dim_sample>`. :c:type:`struct dim <dim>` |
Tal Gilboa | faf4db00 | 2018-03-21 20:33:45 +0200 | [diff] [blame] | 94 | describes the state of DIM for a specific object (RX queue, TX queue, |
| 95 | other queues, etc.). This includes the current selected profile, previous data |
| 96 | samples, the callback function provided by the driver and more. |
Jakub Kicinski | 9b03808 | 2020-04-09 14:21:58 -0700 | [diff] [blame] | 97 | :c:type:`struct dim_sample <dim_sample>` describes a data sample, |
| 98 | which will be compared to the data sample stored in :c:type:`struct dim <dim>` |
| 99 | in order to decide on the algorithm's next |
Tal Gilboa | faf4db00 | 2018-03-21 20:33:45 +0200 | [diff] [blame] | 100 | step. The sample should include bytes, packets and interrupts, measured by |
| 101 | the driver. |
| 102 | |
| 103 | In order to use Net DIM from a networking driver, the driver needs to call the |
| 104 | main net_dim() function. The recommended method is to call net_dim() on each |
| 105 | interrupt. Since Net DIM has a built-in moderation and it might decide to skip |
| 106 | iterations under certain conditions, there is no need to moderate the net_dim() |
| 107 | calls as well. As mentioned above, the driver needs to provide an object of type |
Jakub Kicinski | 9b03808 | 2020-04-09 14:21:58 -0700 | [diff] [blame] | 108 | :c:type:`struct dim <dim>` to the net_dim() function call. It is advised for |
| 109 | each entity using Net DIM to hold a :c:type:`struct dim <dim>` as part of its |
| 110 | data structure and use it as the main Net DIM API object. |
| 111 | The :c:type:`struct dim_sample <dim_sample>` should hold the latest |
Tal Gilboa | faf4db00 | 2018-03-21 20:33:45 +0200 | [diff] [blame] | 112 | bytes, packets and interrupts count. No need to perform any calculations, just |
| 113 | include the raw data. |
| 114 | |
| 115 | The net_dim() call itself does not return anything. Instead Net DIM relies on |
| 116 | the driver to provide a callback function, which is called when the algorithm |
| 117 | decides to make a change in the interrupt moderation parameters. This callback |
| 118 | will be scheduled and run in a separate thread in order not to add overhead to |
| 119 | the data flow. After the work is done, Net DIM algorithm needs to be set to |
| 120 | the proper state in order to move to the next iteration. |
| 121 | |
| 122 | |
Jakub Kicinski | 9b03808 | 2020-04-09 14:21:58 -0700 | [diff] [blame] | 123 | Example |
| 124 | ======= |
Tal Gilboa | faf4db00 | 2018-03-21 20:33:45 +0200 | [diff] [blame] | 125 | |
| 126 | The following code demonstrates how to register a driver to Net DIM. The actual |
| 127 | usage is not complete but it should make the outline of the usage clear. |
| 128 | |
Jakub Kicinski | 9b03808 | 2020-04-09 14:21:58 -0700 | [diff] [blame] | 129 | .. code-block:: c |
Tal Gilboa | faf4db00 | 2018-03-21 20:33:45 +0200 | [diff] [blame] | 130 | |
Jakub Kicinski | 9b03808 | 2020-04-09 14:21:58 -0700 | [diff] [blame] | 131 | #include <linux/dim.h> |
Tal Gilboa | faf4db00 | 2018-03-21 20:33:45 +0200 | [diff] [blame] | 132 | |
Jakub Kicinski | 9b03808 | 2020-04-09 14:21:58 -0700 | [diff] [blame] | 133 | /* Callback for net DIM to schedule on a decision to change moderation */ |
| 134 | void my_driver_do_dim_work(struct work_struct *work) |
| 135 | { |
Jacob Keller | 2168da4 | 2019-10-09 12:18:31 -0700 | [diff] [blame] | 136 | /* Get struct dim from struct work_struct */ |
| 137 | struct dim *dim = container_of(work, struct dim, |
| 138 | work); |
Tal Gilboa | faf4db00 | 2018-03-21 20:33:45 +0200 | [diff] [blame] | 139 | /* Do interrupt moderation related stuff */ |
| 140 | ... |
| 141 | |
| 142 | /* Signal net DIM work is done and it should move to next iteration */ |
Jacob Keller | 2168da4 | 2019-10-09 12:18:31 -0700 | [diff] [blame] | 143 | dim->state = DIM_START_MEASURE; |
Jakub Kicinski | 9b03808 | 2020-04-09 14:21:58 -0700 | [diff] [blame] | 144 | } |
Tal Gilboa | faf4db00 | 2018-03-21 20:33:45 +0200 | [diff] [blame] | 145 | |
Jakub Kicinski | 9b03808 | 2020-04-09 14:21:58 -0700 | [diff] [blame] | 146 | /* My driver's interrupt handler */ |
| 147 | int my_driver_handle_interrupt(struct my_driver_entity *my_entity, ...) |
| 148 | { |
Tal Gilboa | faf4db00 | 2018-03-21 20:33:45 +0200 | [diff] [blame] | 149 | ... |
| 150 | /* A struct to hold current measured data */ |
Jacob Keller | 2168da4 | 2019-10-09 12:18:31 -0700 | [diff] [blame] | 151 | struct dim_sample dim_sample; |
Tal Gilboa | faf4db00 | 2018-03-21 20:33:45 +0200 | [diff] [blame] | 152 | ... |
| 153 | /* Initiate data sample struct with current data */ |
Jacob Keller | 2168da4 | 2019-10-09 12:18:31 -0700 | [diff] [blame] | 154 | dim_update_sample(my_entity->events, |
| 155 | my_entity->packets, |
| 156 | my_entity->bytes, |
| 157 | &dim_sample); |
Tal Gilboa | faf4db00 | 2018-03-21 20:33:45 +0200 | [diff] [blame] | 158 | /* Call net DIM */ |
| 159 | net_dim(&my_entity->dim, dim_sample); |
| 160 | ... |
Jakub Kicinski | 9b03808 | 2020-04-09 14:21:58 -0700 | [diff] [blame] | 161 | } |
Tal Gilboa | faf4db00 | 2018-03-21 20:33:45 +0200 | [diff] [blame] | 162 | |
Jakub Kicinski | 9b03808 | 2020-04-09 14:21:58 -0700 | [diff] [blame] | 163 | /* My entity's initialization function (my_entity was already allocated) */ |
| 164 | int my_driver_init_my_entity(struct my_driver_entity *my_entity, ...) |
| 165 | { |
Tal Gilboa | faf4db00 | 2018-03-21 20:33:45 +0200 | [diff] [blame] | 166 | ... |
| 167 | /* Initiate struct work_struct with my driver's callback function */ |
| 168 | INIT_WORK(&my_entity->dim.work, my_driver_do_dim_work); |
| 169 | ... |
Jakub Kicinski | 9b03808 | 2020-04-09 14:21:58 -0700 | [diff] [blame] | 170 | } |
Randy Dunlap | 9d85928 | 2020-04-09 14:21:59 -0700 | [diff] [blame] | 171 | |
| 172 | Dynamic Interrupt Moderation (DIM) library API |
| 173 | ============================================== |
| 174 | |
| 175 | .. kernel-doc:: include/linux/dim.h |
| 176 | :internal: |