Blame - Documentation/driver-api/dmaengine/provider.rst - SHIFTPHONES/mainline/linux

blob: 954422c2b7046d028ed6c9ba4eb194415f3acf5a [file] [log] [blame]

Vinod Koul	77fe661	2017-11-03 10:19:38 +0530	[diff] [blame]	1	==================================
				2	DMAengine controller documentation
				3	==================================
				4
				5	Hardware Introduction
				6	=====================
				7
				8	Most of the Slave DMA controllers have the same general principles of
				9	operations.
				10
				11	They have a given number of channels to use for the DMA transfers, and
				12	a given number of requests lines.
				13
				14	Requests and channels are pretty much orthogonal. Channels can be used
				15	to serve several to any requests. To simplify, channels are the
				16	entities that will be doing the copy, and requests what endpoints are
				17	involved.
				18
				19	The request lines actually correspond to physical lines going from the
				20	DMA-eligible devices to the controller itself. Whenever the device
				21	will want to start a transfer, it will assert a DMA request (DRQ) by
				22	asserting that request line.
				23
				24	A very simple DMA controller would only take into account a single
				25	parameter: the transfer size. At each clock cycle, it would transfer a
				26	byte of data from one buffer to another, until the transfer size has
				27	been reached.
				28
				29	That wouldn't work well in the real world, since slave devices might
				30	require a specific number of bits to be transferred in a single
				31	cycle. For example, we may want to transfer as much data as the
				32	physical bus allows to maximize performances when doing a simple
				33	memory copy operation, but our audio device could have a narrower FIFO
				34	that requires data to be written exactly 16 or 24 bits at a time. This
				35	is why most if not all of the DMA controllers can adjust this, using a
				36	parameter called the transfer width.
				37
				38	Moreover, some DMA controllers, whenever the RAM is used as a source
				39	or destination, can group the reads or writes in memory into a buffer,
				40	so instead of having a lot of small memory accesses, which is not
				41	really efficient, you'll get several bigger transfers. This is done
				42	using a parameter called the burst size, that defines how many single
				43	reads/writes it's allowed to do without the controller splitting the
				44	transfer into smaller sub-transfers.
				45
				46	Our theoretical DMA controller would then only be able to do transfers
				47	that involve a single contiguous block of data. However, some of the
				48	transfers we usually have are not, and want to copy data from
				49	non-contiguous buffers to a contiguous buffer, which is called
				50	scatter-gather.
				51
				52	DMAEngine, at least for mem2dev transfers, require support for
				53	scatter-gather. So we're left with two cases here: either we have a
				54	quite simple DMA controller that doesn't support it, and we'll have to
				55	implement it in software, or we have a more advanced DMA controller,
				56	that implements in hardware scatter-gather.
				57
				58	The latter are usually programmed using a collection of chunks to
				59	transfer, and whenever the transfer is started, the controller will go
				60	over that collection, doing whatever we programmed there.
				61
				62	This collection is usually either a table or a linked list. You will
				63	then push either the address of the table and its number of elements,
				64	or the first item of the list to one channel of the DMA controller,
				65	and whenever a DRQ will be asserted, it will go through the collection
				66	to know where to fetch the data from.
				67
				68	Either way, the format of this collection is completely dependent on
				69	your hardware. Each DMA controller will require a different structure,
				70	but all of them will require, for every chunk, at least the source and
				71	destination addresses, whether it should increment these addresses or
				72	not and the three parameters we saw earlier: the burst size, the
				73	transfer width and the transfer size.
				74
				75	The one last thing is that usually, slave devices won't issue DRQ by
				76	default, and you have to enable this in your slave device driver first
				77	whenever you're willing to use DMA.
				78
				79	These were just the general memory-to-memory (also called mem2mem) or
				80	memory-to-device (mem2dev) kind of transfers. Most devices often
				81	support other kind of transfers or memory operations that dmaengine
				82	support and will be detailed later in this document.
				83
				84	DMA Support in Linux
				85	====================
				86
				87	Historically, DMA controller drivers have been implemented using the
				88	async TX API, to offload operations such as memory copy, XOR,
				89	cryptography, etc., basically any memory to memory operation.
				90
				91	Over time, the need for memory to device transfers arose, and
				92	dmaengine was extended. Nowadays, the async TX API is written as a
				93	layer on top of dmaengine, and acts as a client. Still, dmaengine
				94	accommodates that API in some cases, and made some design choices to
				95	ensure that it stayed compatible.
				96
				97	For more information on the Async TX API, please look the relevant
Mauro Carvalho Chehab	ddc9239	2020-06-15 08:50:10 +0200	[diff] [blame^]	98	documentation file in Documentation/crypto/async-tx-api.rst.
Vinod Koul	77fe661	2017-11-03 10:19:38 +0530	[diff] [blame]	99
				100	DMAEngine APIs
				101	==============
				102
				103	``struct dma_device`` Initialization
				104	------------------------------------
				105
				106	Just like any other kernel framework, the whole DMAEngine registration
				107	relies on the driver filling a structure and registering against the
				108	framework. In our case, that structure is dma_device.
				109
				110	The first thing you need to do in your driver is to allocate this
				111	structure. Any of the usual memory allocators will do, but you'll also
				112	need to initialize a few fields in there:
				113
Luca Ceresoli	881053f	2017-12-30 23:53:07 +0100	[diff] [blame]	114	- ``channels``: should be initialized as a list using the
Vinod Koul	77fe661	2017-11-03 10:19:38 +0530	[diff] [blame]	115	INIT_LIST_HEAD macro for example
				116
Luca Ceresoli	881053f	2017-12-30 23:53:07 +0100	[diff] [blame]	117	- ``src_addr_widths``:
Vinod Koul	77fe661	2017-11-03 10:19:38 +0530	[diff] [blame]	118	should contain a bitmask of the supported source transfer width
				119
Luca Ceresoli	881053f	2017-12-30 23:53:07 +0100	[diff] [blame]	120	- ``dst_addr_widths``:
Vinod Koul	77fe661	2017-11-03 10:19:38 +0530	[diff] [blame]	121	should contain a bitmask of the supported destination transfer width
				122
Luca Ceresoli	881053f	2017-12-30 23:53:07 +0100	[diff] [blame]	123	- ``directions``:
Vinod Koul	77fe661	2017-11-03 10:19:38 +0530	[diff] [blame]	124	should contain a bitmask of the supported slave directions
				125	(i.e. excluding mem2mem transfers)
				126
Luca Ceresoli	881053f	2017-12-30 23:53:07 +0100	[diff] [blame]	127	- ``residue_granularity``:
Luca Ceresoli	a5d3320	2017-12-30 23:53:06 +0100	[diff] [blame]	128	granularity of the transfer residue reported to dma_set_residue.
				129	This can be either:
Vinod Koul	77fe661	2017-11-03 10:19:38 +0530	[diff] [blame]	130
Luca Ceresoli	a5d3320	2017-12-30 23:53:06 +0100	[diff] [blame]	131	- Descriptor:
				132	your device doesn't support any kind of residue
				133	reporting. The framework will only know that a particular
				134	transaction descriptor is done.
Vinod Koul	77fe661	2017-11-03 10:19:38 +0530	[diff] [blame]	135
Luca Ceresoli	a5d3320	2017-12-30 23:53:06 +0100	[diff] [blame]	136	- Segment:
				137	your device is able to report which chunks have been transferred
Vinod Koul	77fe661	2017-11-03 10:19:38 +0530	[diff] [blame]	138
Luca Ceresoli	a5d3320	2017-12-30 23:53:06 +0100	[diff] [blame]	139	- Burst:
				140	your device is able to report which burst have been transferred
Vinod Koul	77fe661	2017-11-03 10:19:38 +0530	[diff] [blame]	141
Luca Ceresoli	881053f	2017-12-30 23:53:07 +0100	[diff] [blame]	142	- ``dev``: should hold the pointer to the ``struct device`` associated
Luca Ceresoli	a5d3320	2017-12-30 23:53:06 +0100	[diff] [blame]	143	to your current driver instance.
Vinod Koul	77fe661	2017-11-03 10:19:38 +0530	[diff] [blame]	144
				145	Supported transaction types
				146	---------------------------
				147
				148	The next thing you need is to set which transaction types your device
				149	(and driver) supports.
				150
				151	Our ``dma_device structure`` has a field called cap_mask that holds the
				152	various types of transaction supported, and you need to modify this
				153	mask using the dma_cap_set function, with various flags depending on
				154	transaction types you support as an argument.
				155
				156	All those capabilities are defined in the ``dma_transaction_type enum``,
				157	in ``include/linux/dmaengine.h``
				158
				159	Currently, the types available are:
				160
				161	- DMA_MEMCPY
				162
				163	- The device is able to do memory to memory copies
				164
				165	- DMA_XOR
				166
				167	- The device is able to perform XOR operations on memory areas
				168
				169	- Used to accelerate XOR intensive tasks, such as RAID5
				170
				171	- DMA_XOR_VAL
				172
				173	- The device is able to perform parity check using the XOR
				174	algorithm against a memory buffer.
				175
				176	- DMA_PQ
				177
				178	- The device is able to perform RAID6 P+Q computations, P being a
				179	simple XOR, and Q being a Reed-Solomon algorithm.
				180
				181	- DMA_PQ_VAL
				182
				183	- The device is able to perform parity check using RAID6 P+Q
				184	algorithm against a memory buffer.
				185
				186	- DMA_INTERRUPT
				187
				188	- The device is able to trigger a dummy transfer that will
				189	generate periodic interrupts
				190
				191	- Used by the client drivers to register a callback that will be
				192	called on a regular basis through the DMA controller interrupt
				193
				194	- DMA_PRIVATE
				195
				196	- The devices only supports slave transfers, and as such isn't
				197	available for async transfers.
				198
				199	- DMA_ASYNC_TX
				200
				201	- Must not be set by the device, and will be set by the framework
				202	if needed
				203
				204	- TODO: What is it about?
				205
				206	- DMA_SLAVE
				207
				208	- The device can handle device to memory transfers, including
				209	scatter-gather transfers.
				210
				211	- While in the mem2mem case we were having two distinct types to
				212	deal with a single chunk to copy or a collection of them, here,
				213	we just have a single transaction type that is supposed to
				214	handle both.
				215
				216	- If you want to transfer a single contiguous memory buffer,
				217	simply build a scatter list with only one item.
				218
				219	- DMA_CYCLIC
				220
				221	- The device can handle cyclic transfers.
				222
				223	- A cyclic transfer is a transfer where the chunk collection will
				224	loop over itself, with the last item pointing to the first.
				225
				226	- It's usually used for audio transfers, where you want to operate
				227	on a single ring buffer that you will fill with your audio data.
				228
				229	- DMA_INTERLEAVE
				230
				231	- The device supports interleaved transfer.
				232
				233	- These transfers can transfer data from a non-contiguous buffer
				234	to a non-contiguous buffer, opposed to DMA_SLAVE that can
				235	transfer data from a non-contiguous data set to a continuous
				236	destination buffer.
				237
				238	- It's usually used for 2d content transfers, in which case you
				239	want to transfer a portion of uncompressed data directly to the
				240	display to print it
				241
				242	These various types will also affect how the source and destination
				243	addresses change over time.
				244
				245	Addresses pointing to RAM are typically incremented (or decremented)
				246	after each transfer. In case of a ring buffer, they may loop
				247	(DMA_CYCLIC). Addresses pointing to a device's register (e.g. a FIFO)
				248	are typically fixed.
				249
Peter Ujfalusi	7d083ae	2019-12-23 13:04:43 +0200	[diff] [blame]	250	Per descriptor metadata support
				251	-------------------------------
				252	Some data movement architecture (DMA controller and peripherals) uses metadata
				253	associated with a transaction. The DMA controller role is to transfer the
				254	payload and the metadata alongside.
				255	The metadata itself is not used by the DMA engine itself, but it contains
				256	parameters, keys, vectors, etc for peripheral or from the peripheral.
				257
				258	The DMAengine framework provides a generic ways to facilitate the metadata for
				259	descriptors. Depending on the architecture the DMA driver can implement either
				260	or both of the methods and it is up to the client driver to choose which one
				261	to use.
				262
				263	- DESC_METADATA_CLIENT
				264
				265	The metadata buffer is allocated/provided by the client driver and it is
				266	attached (via the dmaengine_desc_attach_metadata() helper to the descriptor.
				267
				268	From the DMA driver the following is expected for this mode:
Mauro Carvalho Chehab	cf7da89	2020-03-03 16:50:33 +0100	[diff] [blame]	269
Peter Ujfalusi	7d083ae	2019-12-23 13:04:43 +0200	[diff] [blame]	270	- DMA_MEM_TO_DEV / DEV_MEM_TO_MEM
Mauro Carvalho Chehab	cf7da89	2020-03-03 16:50:33 +0100	[diff] [blame]	271
Peter Ujfalusi	7d083ae	2019-12-23 13:04:43 +0200	[diff] [blame]	272	The data from the provided metadata buffer should be prepared for the DMA
				273	controller to be sent alongside of the payload data. Either by copying to a
				274	hardware descriptor, or highly coupled packet.
Mauro Carvalho Chehab	cf7da89	2020-03-03 16:50:33 +0100	[diff] [blame]	275
Peter Ujfalusi	7d083ae	2019-12-23 13:04:43 +0200	[diff] [blame]	276	- DMA_DEV_TO_MEM
Mauro Carvalho Chehab	cf7da89	2020-03-03 16:50:33 +0100	[diff] [blame]	277
Peter Ujfalusi	7d083ae	2019-12-23 13:04:43 +0200	[diff] [blame]	278	On transfer completion the DMA driver must copy the metadata to the client
				279	provided metadata buffer before notifying the client about the completion.
				280	After the transfer completion, DMA drivers must not touch the metadata
				281	buffer provided by the client.
				282
				283	- DESC_METADATA_ENGINE
				284
				285	The metadata buffer is allocated/managed by the DMA driver. The client driver
				286	can ask for the pointer, maximum size and the currently used size of the
				287	metadata and can directly update or read it. dmaengine_desc_get_metadata_ptr()
				288	and dmaengine_desc_set_metadata_len() is provided as helper functions.
				289
				290	From the DMA driver the following is expected for this mode:
Mauro Carvalho Chehab	cf7da89	2020-03-03 16:50:33 +0100	[diff] [blame]	291
				292	- get_metadata_ptr()
				293
Peter Ujfalusi	7d083ae	2019-12-23 13:04:43 +0200	[diff] [blame]	294	Should return a pointer for the metadata buffer, the maximum size of the
				295	metadata buffer and the currently used / valid (if any) bytes in the buffer.
Mauro Carvalho Chehab	cf7da89	2020-03-03 16:50:33 +0100	[diff] [blame]	296
				297	- set_metadata_len()
				298
Peter Ujfalusi	7d083ae	2019-12-23 13:04:43 +0200	[diff] [blame]	299	It is called by the clients after it have placed the metadata to the buffer
				300	to let the DMA driver know the number of valid bytes provided.
				301
				302	Note: since the client will ask for the metadata pointer in the completion
				303	callback (in DMA_DEV_TO_MEM case) the DMA driver must ensure that the
				304	descriptor is not freed up prior the callback is called.
				305
Vinod Koul	77fe661	2017-11-03 10:19:38 +0530	[diff] [blame]	306	Device operations
				307	-----------------
				308
				309	Our dma_device structure also requires a few function pointers in
				310	order to implement the actual logic, now that we described what
				311	operations we were able to perform.
				312
				313	The functions that we have to fill in there, and hence have to
				314	implement, obviously depend on the transaction types you reported as
				315	supported.
				316
				317	- ``device_alloc_chan_resources``
				318
				319	- ``device_free_chan_resources``
				320
				321	- These functions will be called whenever a driver will call
				322	``dma_request_channel`` or ``dma_release_channel`` for the first/last
				323	time on the channel associated to that driver.
				324
				325	- They are in charge of allocating/freeing all the needed
				326	resources in order for that channel to be useful for your driver.
				327
				328	- These functions can sleep.
				329
				330	- ``device_prep_dma_*``
				331
				332	- These functions are matching the capabilities you registered
				333	previously.
				334
				335	- These functions all take the buffer or the scatterlist relevant
				336	for the transfer being prepared, and should create a hardware
				337	descriptor or a list of hardware descriptors from it
				338
				339	- These functions can be called from an interrupt context
				340
				341	- Any allocation you might do should be using the GFP_NOWAIT
				342	flag, in order not to potentially sleep, but without depleting
				343	the emergency pool either.
				344
				345	- Drivers should try to pre-allocate any memory they might need
				346	during the transfer setup at probe time to avoid putting to
				347	much pressure on the nowait allocator.
				348
				349	- It should return a unique instance of the
				350	``dma_async_tx_descriptor structure``, that further represents this
				351	particular transfer.
				352
				353	- This structure can be initialized using the function
				354	``dma_async_tx_descriptor_init``.
				355
				356	- You'll also need to set two fields in this structure:
				357
				358	- flags:
				359	TODO: Can it be modified by the driver itself, or
				360	should it be always the flags passed in the arguments
				361
				362	- tx_submit: A pointer to a function you have to implement,
				363	that is supposed to push the current transaction descriptor to a
				364	pending queue, waiting for issue_pending to be called.
				365
				366	- In this structure the function pointer callback_result can be
				367	initialized in order for the submitter to be notified that a
				368	transaction has completed. In the earlier code the function pointer
				369	callback has been used. However it does not provide any status to the
				370	transaction and will be deprecated. The result structure defined as
				371	``dmaengine_result`` that is passed in to callback_result
				372	has two fields:
				373
				374	- result: This provides the transfer result defined by
				375	``dmaengine_tx_result``. Either success or some error condition.
				376
				377	- residue: Provides the residue bytes of the transfer for those that
				378	support residue.
				379
				380	- ``device_issue_pending``
				381
				382	- Takes the first transaction descriptor in the pending queue,
				383	and starts the transfer. Whenever that transfer is done, it
				384	should move to the next transaction in the list.
				385
				386	- This function can be called in an interrupt context
				387
				388	- ``device_tx_status``
				389
				390	- Should report the bytes left to go over on the given channel
				391
				392	- Should only care about the transaction descriptor passed as
				393	argument, not the currently active one on a given channel
				394
				395	- The tx_state argument might be NULL
				396
				397	- Should use dma_set_residue to report it
				398
				399	- In the case of a cyclic transfer, it should only take into
				400	account the current period.
				401
				402	- This function can be called in an interrupt context.
				403
				404	- device_config
				405
				406	- Reconfigures the channel with the configuration given as argument
				407
				408	- This command should NOT perform synchronously, or on any
				409	currently queued transfers, but only on subsequent ones
				410
				411	- In this case, the function will receive a ``dma_slave_config``
				412	structure pointer as an argument, that will detail which
				413	configuration to use.
				414
				415	- Even though that structure contains a direction field, this
				416	field is deprecated in favor of the direction argument given to
				417	the prep_* functions
				418
				419	- This call is mandatory for slave operations only. This should NOT be
				420	set or expected to be set for memcpy operations.
				421	If a driver support both, it should use this call for slave
				422	operations only and not for memcpy ones.
				423
				424	- device_pause
				425
				426	- Pauses a transfer on the channel
				427
				428	- This command should operate synchronously on the channel,
				429	pausing right away the work of the given channel
				430
				431	- device_resume
				432
				433	- Resumes a transfer on the channel
				434
				435	- This command should operate synchronously on the channel,
				436	resuming right away the work of the given channel
				437
				438	- device_terminate_all
				439
				440	- Aborts all the pending and ongoing transfers on the channel
				441
				442	- For aborted transfers the complete callback should not be called
				443
				444	- Can be called from atomic context or from within a complete
				445	callback of a descriptor. Must not sleep. Drivers must be able
				446	to handle this correctly.
				447
				448	- Termination may be asynchronous. The driver does not have to
				449	wait until the currently active transfer has completely stopped.
				450	See device_synchronize.
				451
				452	- device_synchronize
				453
				454	- Must synchronize the termination of a channel to the current
				455	context.
				456
				457	- Must make sure that memory for previously submitted
				458	descriptors is no longer accessed by the DMA controller.
				459
				460	- Must make sure that all complete callbacks for previously
				461	submitted descriptors have finished running and none are
				462	scheduled to run.
				463
				464	- May sleep.
				465
				466
				467	Misc notes
				468	==========
				469
				470	(stuff that should be documented, but don't really know
				471	where to put them)
				472
				473	``dma_run_dependencies``
				474
				475	- Should be called at the end of an async TX transfer, and can be
				476	ignored in the slave transfers case.
				477
				478	- Makes sure that dependent operations are run before marking it
				479	as complete.
				480
				481	dma_cookie_t
				482
				483	- it's a DMA transaction ID that will increment over time.
				484
				485	- Not really relevant any more since the introduction of ``virt-dma``
				486	that abstracts it away.
				487
				488	DMA_CTRL_ACK
				489
				490	- If clear, the descriptor cannot be reused by provider until the
				491	client acknowledges receipt, i.e. has has a chance to establish any
				492	dependency chains
				493
				494	- This can be acked by invoking async_tx_ack()
				495
				496	- If set, does not mean descriptor can be reused
				497
				498	DMA_CTRL_REUSE
				499
				500	- If set, the descriptor can be reused after being completed. It should
				501	not be freed by provider if this flag is set.
				502
				503	- The descriptor should be prepared for reuse by invoking
				504	``dmaengine_desc_set_reuse()`` which will set DMA_CTRL_REUSE.
				505
				506	- ``dmaengine_desc_set_reuse()`` will succeed only when channel support
				507	reusable descriptor as exhibited by capabilities
				508
				509	- As a consequence, if a device driver wants to skip the
				510	``dma_map_sg()`` and ``dma_unmap_sg()`` in between 2 transfers,
				511	because the DMA'd data wasn't used, it can resubmit the transfer right after
				512	its completion.
				513
				514	- Descriptor can be freed in few ways
				515
				516	- Clearing DMA_CTRL_REUSE by invoking
				517	``dmaengine_desc_clear_reuse()`` and submitting for last txn
				518
				519	- Explicitly invoking ``dmaengine_desc_free()``, this can succeed only
				520	when DMA_CTRL_REUSE is already set
				521
				522	- Terminating the channel
				523
				524	- DMA_PREP_CMD
				525
				526	- If set, the client driver tells DMA controller that passed data in DMA
				527	API is command data.
				528
				529	- Interpretation of command data is DMA controller specific. It can be
				530	used for issuing commands to other peripherals/register reads/register
				531	writes for which the descriptor should be in different format from
				532	normal data descriptors.
				533
				534	General Design Notes
				535	====================
				536
				537	Most of the DMAEngine drivers you'll see are based on a similar design
				538	that handles the end of transfer interrupts in the handler, but defer
				539	most work to a tasklet, including the start of a new transfer whenever
				540	the previous transfer ended.
				541
				542	This is a rather inefficient design though, because the inter-transfer
				543	latency will be not only the interrupt latency, but also the
				544	scheduling latency of the tasklet, which will leave the channel idle
				545	in between, which will slow down the global transfer rate.
				546
				547	You should avoid this kind of practice, and instead of electing a new
				548	transfer in your tasklet, move that part to the interrupt handler in
				549	order to have a shorter idle window (that we can't really avoid
				550	anyway).
				551
				552	Glossary
				553	========
				554
				555	- Burst: A number of consecutive read or write operations that
				556	can be queued to buffers before being flushed to memory.
				557
				558	- Chunk: A contiguous collection of bursts
				559
				560	- Transfer: A collection of chunks (be it contiguous or not)