blob: 87d1372da879dc5433da570be3f1df8ffd0d4ff9 [file] [log] [blame]
Mauro Carvalho Chehabe3866722017-05-16 10:00:04 -03001===========
2NTB Drivers
3===========
Allen Hubbea1bd3ba2015-04-09 10:33:20 -04004
5NTB (Non-Transparent Bridge) is a type of PCI-Express bridge chip that connects
Serge Semincdcca8962016-12-14 02:49:19 +03006the separate memory systems of two or more computers to the same PCI-Express
7fabric. Existing NTB hardware supports a common feature set: doorbell
8registers and memory translation windows, as well as non common features like
9scratchpad and message registers. Scratchpad registers are read-and-writable
10registers that are accessible from either side of the device, so that peers can
11exchange a small amount of information at a fixed address. Message registers can
12be utilized for the same purpose. Additionally they are provided with with
13special status bits to make sure the information isn't rewritten by another
14peer. Doorbell registers provide a way for peers to send interrupt events.
15Memory windows allow translated read and write access to the peer memory.
Allen Hubbea1bd3ba2015-04-09 10:33:20 -040016
Mauro Carvalho Chehabe3866722017-05-16 10:00:04 -030017NTB Core Driver (ntb)
18=====================
Allen Hubbea1bd3ba2015-04-09 10:33:20 -040019
20The NTB core driver defines an api wrapping the common feature set, and allows
21clients interested in NTB features to discover NTB the devices supported by
22hardware drivers. The term "client" is used here to mean an upper layer
23component making use of the NTB api. The term "driver," or "hardware driver,"
24is used here to mean a driver for a specific vendor and model of NTB hardware.
25
Mauro Carvalho Chehabe3866722017-05-16 10:00:04 -030026NTB Client Drivers
27==================
Allen Hubbea1bd3ba2015-04-09 10:33:20 -040028
29NTB client drivers should register with the NTB core driver. After
30registering, the client probe and remove functions will be called appropriately
31as ntb hardware, or hardware drivers, are inserted and removed. The
32registration uses the Linux Device framework, so it should feel familiar to
33anyone who has written a pci driver.
34
Linus Torvalds486088b2017-07-15 12:58:58 -070035NTB Typical client driver implementation
36----------------------------------------
Serge Semincdcca8962016-12-14 02:49:19 +030037
38Primary purpose of NTB is to share some peace of memory between at least two
39systems. So the NTB device features like Scratchpad/Message registers are
40mainly used to perform the proper memory window initialization. Typically
41there are two types of memory window interfaces supported by the NTB API:
42inbound translation configured on the local ntb port and outbound translation
43configured by the peer, on the peer ntb port. The first type is
Mauro Carvalho Chehab3ac10b02019-04-10 06:56:23 -030044depicted on the next figure::
Serge Semincdcca8962016-12-14 02:49:19 +030045
Mauro Carvalho Chehab3ac10b02019-04-10 06:56:23 -030046 Inbound translation:
47
Serge Semincdcca8962016-12-14 02:49:19 +030048 Memory: Local NTB Port: Peer NTB Port: Peer MMIO:
49 ____________
50 | dma-mapped |-ntb_mw_set_trans(addr) |
51 | memory | _v____________ | ______________
52 | (addr) |<======| MW xlat addr |<====| MW base addr |<== memory-mapped IO
53 |------------| |--------------| | |--------------|
54
55So typical scenario of the first type memory window initialization looks:
561) allocate a memory region, 2) put translated address to NTB config,
573) somehow notify a peer device of performed initialization, 4) peer device
58maps corresponding outbound memory window so to have access to the shared
59memory region.
60
61The second type of interface, that implies the shared windows being
Mauro Carvalho Chehab3ac10b02019-04-10 06:56:23 -030062initialized by a peer device, is depicted on the figure::
Serge Semincdcca8962016-12-14 02:49:19 +030063
Mauro Carvalho Chehab3ac10b02019-04-10 06:56:23 -030064 Outbound translation:
65
Serge Semincdcca8962016-12-14 02:49:19 +030066 Memory: Local NTB Port: Peer NTB Port: Peer MMIO:
67 ____________ ______________
68 | dma-mapped | | | MW base addr |<== memory-mapped IO
69 | memory | | |--------------|
70 | (addr) |<===================| MW xlat addr |<-ntb_peer_mw_set_trans(addr)
71 |------------| | |--------------|
72
73Typical scenario of the second type interface initialization would be:
741) allocate a memory region, 2) somehow deliver a translated address to a peer
75device, 3) peer puts the translated address to NTB config, 4) peer device maps
76outbound memory window so to have access to the shared memory region.
77
78As one can see the described scenarios can be combined in one portable
79algorithm.
Mauro Carvalho Chehab59bc64f2019-04-10 06:56:26 -030080
Serge Semincdcca8962016-12-14 02:49:19 +030081 Local device:
82 1) Allocate memory for a shared window
83 2) Initialize memory window by translated address of the allocated region
84 (it may fail if local memory window initialization is unsupported)
85 3) Send the translated address and memory window index to a peer device
Mauro Carvalho Chehab59bc64f2019-04-10 06:56:26 -030086
Serge Semincdcca8962016-12-14 02:49:19 +030087 Peer device:
88 1) Initialize memory window with retrieved address of the allocated
89 by another device memory region (it may fail if peer memory window
90 initialization is unsupported)
91 2) Map outbound memory window
92
93In accordance with this scenario, the NTB Memory Window API can be used as
94follows:
Mauro Carvalho Chehab59bc64f2019-04-10 06:56:26 -030095
Serge Semincdcca8962016-12-14 02:49:19 +030096 Local device:
97 1) ntb_mw_count(pidx) - retrieve number of memory ranges, which can
98 be allocated for memory windows between local device and peer device
99 of port with specified index.
100 2) ntb_get_align(pidx, midx) - retrieve parameters restricting the
101 shared memory region alignment and size. Then memory can be properly
102 allocated.
103 3) Allocate physically contiguous memory region in compliance with
104 restrictions retrieved in 2).
105 4) ntb_mw_set_trans(pidx, midx) - try to set translation address of
106 the memory window with specified index for the defined peer device
107 (it may fail if local translated address setting is not supported)
108 5) Send translated base address (usually together with memory window
109 number) to the peer device using, for instance, scratchpad or message
110 registers.
Mauro Carvalho Chehab59bc64f2019-04-10 06:56:26 -0300111
Serge Semincdcca8962016-12-14 02:49:19 +0300112 Peer device:
113 1) ntb_peer_mw_set_trans(pidx, midx) - try to set received from other
114 device (related to pidx) translated address for specified memory
115 window. It may fail if retrieved address, for instance, exceeds
116 maximum possible address or isn't properly aligned.
117 2) ntb_peer_mw_get_addr(widx) - retrieve MMIO address to map the memory
118 window so to have an access to the shared memory.
119
120Also it is worth to note, that method ntb_mw_count(pidx) should return the
121same value as ntb_peer_mw_count() on the peer with port index - pidx.
122
Mauro Carvalho Chehabe3866722017-05-16 10:00:04 -0300123NTB Transport Client (ntb\_transport) and NTB Netdev (ntb\_netdev)
124------------------------------------------------------------------
Allen Hubbee26a5842015-04-09 10:33:20 -0400125
126The primary client for NTB is the Transport client, used in tandem with NTB
127Netdev. These drivers function together to create a logical link to the peer,
128across the ntb, to exchange packets of network data. The Transport client
129establishes a logical link to the peer, and creates queue pairs to exchange
130messages and data. The NTB Netdev then creates an ethernet device using a
131Transport queue pair. Network data is copied between socket buffers and the
132Transport queue pair buffer. The Transport client may be used for other things
133besides Netdev, however no other applications have yet been written.
134
Mauro Carvalho Chehabe3866722017-05-16 10:00:04 -0300135NTB Ping Pong Test Client (ntb\_pingpong)
136-----------------------------------------
Allen Hubbe963de472015-04-15 11:12:41 -0400137
138The Ping Pong test client serves as a demonstration to exercise the doorbell
139and scratchpad registers of NTB hardware, and as an example simple NTB client.
140Ping Pong enables the link when started, waits for the NTB link to come up, and
141then proceeds to read and write the doorbell scratchpad registers of the NTB.
142The peers interrupt each other using a bit mask of doorbell bits, which is
143shifted by one in each round, to test the behavior of multiple doorbell bits
144and interrupt vectors. The Ping Pong driver also reads the first local
145scratchpad, and writes the value plus one to the first peer scratchpad, each
146round before writing the peer doorbell register.
147
148Module Parameters:
149
150* unsafe - Some hardware has known issues with scratchpad and doorbell
151 registers. By default, Ping Pong will not attempt to exercise such
152 hardware. You may override this behavior at your own risk by setting
153 unsafe=1.
154* delay\_ms - Specify the delay between receiving a doorbell
155 interrupt event and setting the peer doorbell register for the next
156 round.
157* init\_db - Specify the doorbell bits to start new series of rounds. A new
158 series begins once all the doorbell bits have been shifted out of
159 range.
160* dyndbg - It is suggested to specify dyndbg=+p when loading this module, and
161 then to observe debugging output on the console.
162
Mauro Carvalho Chehabe3866722017-05-16 10:00:04 -0300163NTB Tool Test Client (ntb\_tool)
164--------------------------------
Allen Hubbe578b8812015-05-21 02:51:39 -0400165
166The Tool test client serves for debugging, primarily, ntb hardware and drivers.
167The Tool provides access through debugfs for reading, setting, and clearing the
168NTB doorbell, and reading and writing scratchpads.
169
170The Tool does not currently have any module parameters.
171
172Debugfs Files:
173
Mauro Carvalho Chehabe3866722017-05-16 10:00:04 -0300174* *debugfs*/ntb\_tool/*hw*/
175 A directory in debugfs will be created for each
Allen Hubbe578b8812015-05-21 02:51:39 -0400176 NTB device probed by the tool. This directory is shortened to *hw*
177 below.
Mauro Carvalho Chehabe3866722017-05-16 10:00:04 -0300178* *hw*/db
179 This file is used to read, set, and clear the local doorbell. Not
Allen Hubbe578b8812015-05-21 02:51:39 -0400180 all operations may be supported by all hardware. To read the doorbell,
181 read the file. To set the doorbell, write `s` followed by the bits to
182 set (eg: `echo 's 0x0101' > db`). To clear the doorbell, write `c`
183 followed by the bits to clear.
Mauro Carvalho Chehabe3866722017-05-16 10:00:04 -0300184* *hw*/mask
185 This file is used to read, set, and clear the local doorbell mask.
Allen Hubbe578b8812015-05-21 02:51:39 -0400186 See *db* for details.
Mauro Carvalho Chehabe3866722017-05-16 10:00:04 -0300187* *hw*/peer\_db
188 This file is used to read, set, and clear the peer doorbell.
Allen Hubbe578b8812015-05-21 02:51:39 -0400189 See *db* for details.
Mauro Carvalho Chehabe3866722017-05-16 10:00:04 -0300190* *hw*/peer\_mask
191 This file is used to read, set, and clear the peer doorbell
Allen Hubbe578b8812015-05-21 02:51:39 -0400192 mask. See *db* for details.
Mauro Carvalho Chehabe3866722017-05-16 10:00:04 -0300193* *hw*/spad
194 This file is used to read and write local scratchpads. To read
Allen Hubbe578b8812015-05-21 02:51:39 -0400195 the values of all scratchpads, read the file. To write values, write a
196 series of pairs of scratchpad number and value
197 (eg: `echo '4 0x123 7 0xabc' > spad`
198 # to set scratchpads `4` and `7` to `0x123` and `0xabc`, respectively).
Mauro Carvalho Chehabe3866722017-05-16 10:00:04 -0300199* *hw*/peer\_spad
200 This file is used to read and write peer scratchpads. See
Allen Hubbe578b8812015-05-21 02:51:39 -0400201 *spad* for details.
202
Logan Gunthorped9c53aa2019-05-23 16:31:00 -0600203NTB MSI Test Client (ntb\_msi\_test)
204------------------------------------
205
206The MSI test client serves to test and debug the MSI library which
207allows for passing MSI interrupts across NTB memory windows. The
208test client is interacted with through the debugfs filesystem:
209
210* *debugfs*/ntb\_tool/*hw*/
211 A directory in debugfs will be created for each
212 NTB device probed by the tool. This directory is shortened to *hw*
213 below.
214* *hw*/port
215 This file describes the local port number
216* *hw*/irq*_occurrences
217 One occurrences file exists for each interrupt and, when read,
218 returns the number of times the interrupt has been triggered.
219* *hw*/peer*/port
220 This file describes the port number for each peer
221* *hw*/peer*/count
222 This file describes the number of interrupts that can be
223 triggered on each peer
224* *hw*/peer*/trigger
225 Writing an interrupt number (any number less than the value
226 specified in count) will trigger the interrupt on the
227 specified peer. That peer's interrupt's occurrence file
228 should be incremented.
229
Mauro Carvalho Chehabe3866722017-05-16 10:00:04 -0300230NTB Hardware Drivers
231====================
Allen Hubbea1bd3ba2015-04-09 10:33:20 -0400232
233NTB hardware drivers should register devices with the NTB core driver. After
234registering, clients probe and remove functions will be called.
Allen Hubbee26a5842015-04-09 10:33:20 -0400235
Mauro Carvalho Chehabe3866722017-05-16 10:00:04 -0300236NTB Intel Hardware Driver (ntb\_hw\_intel)
237------------------------------------------
Allen Hubbee26a5842015-04-09 10:33:20 -0400238
239The Intel hardware driver supports NTB on Xeon and Atom CPUs.
240
241Module Parameters:
242
Mauro Carvalho Chehabe3866722017-05-16 10:00:04 -0300243* b2b\_mw\_idx
244 If the peer ntb is to be accessed via a memory window, then use
Allen Hubbee26a5842015-04-09 10:33:20 -0400245 this memory window to access the peer ntb. A value of zero or positive
246 starts from the first mw idx, and a negative value starts from the last
247 mw idx. Both sides MUST set the same value here! The default value is
248 `-1`.
Mauro Carvalho Chehabe3866722017-05-16 10:00:04 -0300249* b2b\_mw\_share
250 If the peer ntb is to be accessed via a memory window, and if
Allen Hubbee26a5842015-04-09 10:33:20 -0400251 the memory window is large enough, still allow the client to use the
252 second half of the memory window for address translation to the peer.
Mauro Carvalho Chehabe3866722017-05-16 10:00:04 -0300253* xeon\_b2b\_usd\_bar2\_addr64
254 If using B2B topology on Xeon hardware, use
Dave Jiang2f887b92015-05-20 12:55:47 -0400255 this 64 bit address on the bus between the NTB devices for the window
256 at BAR2, on the upstream side of the link.
257* xeon\_b2b\_usd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
258* xeon\_b2b\_usd\_bar4\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
259* xeon\_b2b\_usd\_bar5\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
260* xeon\_b2b\_dsd\_bar2\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
261* xeon\_b2b\_dsd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
262* xeon\_b2b\_dsd\_bar4\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
263* xeon\_b2b\_dsd\_bar5\_addr32 - See *xeon\_b2b\_bar2\_addr64*.