Blame - Documentation/core-api/ioctl.rst - SHIFTPHONES/mainline/linux

blob: c455db0e16272076c3455eef7571c919127ab4ba [file] [log] [blame]

Arnd Bergmann	8ce156d	2019-12-03 10:57:23 +0100	[diff] [blame]	1	======================
				2	ioctl based interfaces
				3	======================
				4
				5	ioctl() is the most common way for applications to interface
				6	with device drivers. It is flexible and easily extended by adding new
				7	commands and can be passed through character devices, block devices as
				8	well as sockets and other special file descriptors.
				9
				10	However, it is also very easy to get ioctl command definitions wrong,
				11	and hard to fix them later without breaking existing applications,
				12	so this documentation tries to help developers get it right.
				13
				14	Command number definitions
				15	==========================
				16
				17	The command number, or request number, is the second argument passed to
				18	the ioctl system call. While this can be any 32-bit number that uniquely
				19	identifies an action for a particular driver, there are a number of
				20	conventions around defining them.
				21
				22	``include/uapi/asm-generic/ioctl.h`` provides four macros for defining
				23	ioctl commands that follow modern conventions: ``_IO``, ``_IOR``,
				24	``_IOW``, and ``_IOWR``. These should be used for all new commands,
				25	with the correct parameters:
				26
				27	_IO/_IOR/_IOW/_IOWR
				28	The macro name specifies how the argument will be used. It may be a
				29	pointer to data to be passed into the kernel (_IOW), out of the kernel
				30	(_IOR), or both (_IOWR). _IO can indicate either commands with no
				31	argument or those passing an integer value instead of a pointer.
				32	It is recommended to only use _IO for commands without arguments,
				33	and use pointers for passing data.
				34
				35	type
				36	An 8-bit number, often a character literal, specific to a subsystem
				37	or driver, and listed in :doc:`../userspace-api/ioctl/ioctl-number`
				38
				39	nr
				40	An 8-bit number identifying the specific command, unique for a give
				41	value of 'type'
				42
				43	data_type
				44	The name of the data type pointed to by the argument, the command number
				45	encodes the ``sizeof(data_type)`` value in a 13-bit or 14-bit integer,
				46	leading to a limit of 8191 bytes for the maximum size of the argument.
				47	Note: do not pass sizeof(data_type) type into _IOR/_IOW/IOWR, as that
				48	will lead to encoding sizeof(sizeof(data_type)), i.e. sizeof(size_t).
				49	_IO does not have a data_type parameter.
				50
				51
				52	Interface versions
				53	==================
				54
				55	Some subsystems use version numbers in data structures to overload
				56	commands with different interpretations of the argument.
				57
				58	This is generally a bad idea, since changes to existing commands tend
				59	to break existing applications.
				60
				61	A better approach is to add a new ioctl command with a new number. The
				62	old command still needs to be implemented in the kernel for compatibility,
				63	but this can be a wrapper around the new implementation.
				64
				65	Return code
				66	===========
				67
				68	ioctl commands can return negative error codes as documented in errno(3);
				69	these get turned into errno values in user space. On success, the return
				70	code should be zero. It is also possible but not recommended to return
				71	a positive 'long' value.
				72
				73	When the ioctl callback is called with an unknown command number, the
				74	handler returns either -ENOTTY or -ENOIOCTLCMD, which also results in
				75	-ENOTTY being returned from the system call. Some subsystems return
				76	-ENOSYS or -EINVAL here for historic reasons, but this is wrong.
				77
				78	Prior to Linux 5.5, compat_ioctl handlers were required to return
				79	-ENOIOCTLCMD in order to use the fallback conversion into native
				80	commands. As all subsystems are now responsible for handling compat
				81	mode themselves, this is no longer needed, but it may be important to
				82	consider when backporting bug fixes to older kernels.
				83
				84	Timestamps
				85	==========
				86
				87	Traditionally, timestamps and timeout values are passed as ``struct
				88	timespec`` or ``struct timeval``, but these are problematic because of
				89	incompatible definitions of these structures in user space after the
				90	move to 64-bit time_t.
				91
				92	The ``struct __kernel_timespec`` type can be used instead to be embedded
				93	in other data structures when separate second/nanosecond values are
				94	desired, or passed to user space directly. This is still not ideal though,
				95	as the structure matches neither the kernel's timespec64 nor the user
				96	space timespec exactly. The get_timespec64() and put_timespec64() helper
				97	functions can be used to ensure that the layout remains compatible with
				98	user space and the padding is treated correctly.
				99
				100	As it is cheap to convert seconds to nanoseconds, but the opposite
				101	requires an expensive 64-bit division, a simple __u64 nanosecond value
				102	can be simpler and more efficient.
				103
				104	Timeout values and timestamps should ideally use CLOCK_MONOTONIC time,
				105	as returned by ktime_get_ns() or ktime_get_ts64(). Unlike
				106	CLOCK_REALTIME, this makes the timestamps immune from jumping backwards
				107	or forwards due to leap second adjustments and clock_settime() calls.
				108
				109	ktime_get_real_ns() can be used for CLOCK_REALTIME timestamps that
				110	need to be persistent across a reboot or between multiple machines.
				111
				112	32-bit compat mode
				113	==================
				114
				115	In order to support 32-bit user space running on a 64-bit machine, each
				116	subsystem or driver that implements an ioctl callback handler must also
				117	implement the corresponding compat_ioctl handler.
				118
				119	As long as all the rules for data structures are followed, this is as
				120	easy as setting the .compat_ioctl pointer to a helper function such as
				121	compat_ptr_ioctl() or blkdev_compat_ptr_ioctl().
				122
				123	compat_ptr()
				124	------------
				125
				126	On the s390 architecture, 31-bit user space has ambiguous representations
				127	for data pointers, with the upper bit being ignored. When running such
				128	a process in compat mode, the compat_ptr() helper must be used to
				129	clear the upper bit of a compat_uptr_t and turn it into a valid 64-bit
				130	pointer. On other architectures, this macro only performs a cast to a
				131	``void __user *`` pointer.
				132
				133	In an compat_ioctl() callback, the last argument is an unsigned long,
				134	which can be interpreted as either a pointer or a scalar depending on
				135	the command. If it is a scalar, then compat_ptr() must not be used, to
				136	ensure that the 64-bit kernel behaves the same way as a 32-bit kernel
				137	for arguments with the upper bit set.
				138
				139	The compat_ptr_ioctl() helper can be used in place of a custom
				140	compat_ioctl file operation for drivers that only take arguments that
				141	are pointers to compatible data structures.
				142
				143	Structure layout
				144	----------------
				145
				146	Compatible data structures have the same layout on all architectures,
				147	avoiding all problematic members:
				148
				149	* ``long`` and ``unsigned long`` are the size of a register, so
				150	they can be either 32-bit or 64-bit wide and cannot be used in portable
				151	data structures. Fixed-length replacements are ``__s32``, ``__u32``,
				152	``__s64`` and ``__u64``.
				153
				154	* Pointers have the same problem, in addition to requiring the
				155	use of compat_ptr(). The best workaround is to use ``__u64``
				156	in place of pointers, which requires a cast to ``uintptr_t`` in user
				157	space, and the use of u64_to_user_ptr() in the kernel to convert
				158	it back into a user pointer.
				159
				160	* On the x86-32 (i386) architecture, the alignment of 64-bit variables
				161	is only 32-bit, but they are naturally aligned on most other
				162	architectures including x86-64. This means a structure like::
				163
				164	struct foo {
				165	__u32 a;
				166	__u64 b;
				167	__u32 c;
				168	};
				169
				170	has four bytes of padding between a and b on x86-64, plus another four
				171	bytes of padding at the end, but no padding on i386, and it needs a
				172	compat_ioctl conversion handler to translate between the two formats.
				173
				174	To avoid this problem, all structures should have their members
				175	naturally aligned, or explicit reserved fields added in place of the
				176	implicit padding. The ``pahole`` tool can be used for checking the
				177	alignment.
				178
				179	* On ARM OABI user space, structures are padded to multiples of 32-bit,
				180	making some structs incompatible with modern EABI kernels if they
				181	do not end on a 32-bit boundary.
				182
				183	* On the m68k architecture, struct members are not guaranteed to have an
				184	alignment greater than 16-bit, which is a problem when relying on
				185	implicit padding.
				186
				187	* Bitfields and enums generally work as one would expect them to,
				188	but some properties of them are implementation-defined, so it is better
				189	to avoid them completely in ioctl interfaces.
				190
				191	* ``char`` members can be either signed or unsigned, depending on
				192	the architecture, so the __u8 and __s8 types should be used for 8-bit
				193	integer values, though char arrays are clearer for fixed-length strings.
				194
				195	Information leaks
				196	=================
				197
				198	Uninitialized data must not be copied back to user space, as this can
				199	cause an information leak, which can be used to defeat kernel address
				200	space layout randomization (KASLR), helping in an attack.
				201
				202	For this reason (and for compat support) it is best to avoid any
				203	implicit padding in data structures. Where there is implicit padding
				204	in an existing structure, kernel drivers must be careful to fully
				205	initialize an instance of the structure before copying it to user
				206	space. This is usually done by calling memset() before assigning to
				207	individual members.
				208
				209	Subsystem abstractions
				210	======================
				211
				212	While some device drivers implement their own ioctl function, most
				213	subsystems implement the same command for multiple drivers. Ideally the
				214	subsystem has an .ioctl() handler that copies the arguments from and
				215	to user space, passing them into subsystem specific callback functions
				216	through normal kernel pointers.
				217
				218	This helps in various ways:
				219
				220	* Applications written for one driver are more likely to work for
				221	another one in the same subsystem if there are no subtle differences
				222	in the user space ABI.
				223
				224	* The complexity of user space access and data structure layout is done
				225	in one place, reducing the potential for implementation bugs.
				226
				227	* It is more likely to be reviewed by experienced developers
				228	that can spot problems in the interface when the ioctl is shared
				229	between multiple drivers than when it is only used in a single driver.
				230
				231	Alternatives to ioctl
				232	=====================
				233
				234	There are many cases in which ioctl is not the best solution for a
				235	problem. Alternatives include:
				236
				237	* System calls are a better choice for a system-wide feature that
				238	is not tied to a physical device or constrained by the file system
				239	permissions of a character device node
				240
				241	* netlink is the preferred way of configuring any network related
				242	objects through sockets.
				243
				244	* debugfs is used for ad-hoc interfaces for debugging functionality
				245	that does not need to be exposed as a stable interface to applications.
				246
				247	* sysfs is a good way to expose the state of an in-kernel object
				248	that is not tied to a file descriptor.
				249
				250	* configfs can be used for more complex configuration than sysfs
				251
				252	* A custom file system can provide extra flexibility with a simple
				253	user interface but adds a lot of complexity to the implementation.