Blame - Documentation/admin-guide/reporting-issues.rst - SHIFTPHONES/mainline/linux

blob: d7ac13f789cce981eac2bb6b7d421be0fabdae94 [file] [log] [blame]

Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	1	.. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0)
				2	..
				3	If you want to distribute this text under CC-BY-4.0 only, please use 'The
				4	Linux kernel developers' for author attribution and link this as source:
				5	https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/admin-guide/reporting-issues.rst
				6	..
				7	Note: Only the content of this RST file as found in the Linux kernel sources
				8	is available under CC-BY-4.0, as versions of this text that were processed
				9	(for example by the kernel's build system) might contain content taken from
				10	files which use a more restrictive license.
				11
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	12
				13	Reporting issues
				14	++++++++++++++++
				15
				16
				17	The short guide (aka TL;DR)
				18	===========================
				19
Thorsten Leemhuis	4d2f46a	2021-03-30 16:13:06 +0200	[diff] [blame]	20	Are you facing a regression with vanilla kernels from the same stable or
				21	longterm series? One still supported? Then search the `LKML
				22	<https://lore.kernel.org/lkml/>`_ and the `Linux stable mailing list
				23	<https://lore.kernel.org/stable/>`_ archives for matching reports to join. If
				24	you don't find any, install `the latest release from that series
				25	<https://kernel.org/>`_. If it still shows the issue, report it to the stable
Thorsten Leemhuis	6161a4b	2021-04-09 13:47:24 +0200	[diff] [blame]	26	mailing list (stable@vger.kernel.org) and CC the regressions list
Thorsten Leemhuis	0043f0b	2021-04-15 12:29:14 +0200	[diff] [blame]	27	(regressions@lists.linux.dev); ideally also CC the maintainer and the mailing
				28	list for the subsystem in question.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	29
Thorsten Leemhuis	4d2f46a	2021-03-30 16:13:06 +0200	[diff] [blame]	30	In all other cases try your best guess which kernel part might be causing the
				31	issue. Check the :ref:`MAINTAINERS <maintainers>` file for how its developers
				32	expect to be told about problems, which most of the time will be by email with a
				33	mailing list in CC. Check the destination's archives for matching reports;
				34	search the `LKML <https://lore.kernel.org/lkml/>`_ and the web, too. If you
				35	don't find any to join, install `the latest mainline kernel
				36	<https://kernel.org/>`_. If the issue is present there, send a report.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	37
Thorsten Leemhuis	4d2f46a	2021-03-30 16:13:06 +0200	[diff] [blame]	38	The issue was fixed there, but you would like to see it resolved in a still
				39	supported stable or longterm series as well? Then install its latest release.
				40	If it shows the problem, search for the change that fixed it in mainline and
				41	check if backporting is in the works or was discarded; if it's neither, ask
				42	those who handled the change for it.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	43
Thorsten Leemhuis	4d2f46a	2021-03-30 16:13:06 +0200	[diff] [blame]	44	General remarks: When installing and testing a kernel as outlined above,
				45	ensure it's vanilla (IOW: not patched and not using add-on modules). Also make
				46	sure it's built and running in a healthy environment and not already tainted
				47	before the issue occurs.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	48
Thorsten Leemhuis	6161a4b	2021-04-09 13:47:24 +0200	[diff] [blame]	49	If you are facing multiple issues with the Linux kernel at once, report each
				50	separately. While writing your report, include all information relevant to the
				51	issue, like the kernel and the distro used. In case of a regression, CC the
Thorsten Leemhuis	0043f0b	2021-04-15 12:29:14 +0200	[diff] [blame]	52	regressions mailing list (regressions@lists.linux.dev) to your report. Also try
				53	to pin-point the culprit with a bisection; if you succeed, include its
				54	commit-id and CC everyone in the sign-off-by chain.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	55
Thorsten Leemhuis	4d2f46a	2021-03-30 16:13:06 +0200	[diff] [blame]	56	Once the report is out, answer any questions that come up and help where you
				57	can. That includes keeping the ball rolling by occasionally retesting with newer
				58	releases and sending a status update afterwards.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	59
				60	Step-by-step guide how to report issues to the kernel maintainers
				61	=================================================================
				62
				63	The above TL;DR outlines roughly how to report issues to the Linux kernel
				64	developers. It might be all that's needed for people already familiar with
				65	reporting issues to Free/Libre & Open Source Software (FLOSS) projects. For
				66	everyone else there is this section. It is more detailed and uses a
				67	step-by-step approach. It still tries to be brief for readability and leaves
				68	out a lot of details; those are described below the step-by-step guide in a
				69	reference section, which explains each of the steps in more detail.
				70
				71	Note: this section covers a few more aspects than the TL;DR and does things in
				72	a slightly different order. That's in your interest, to make sure you notice
				73	early if an issue that looks like a Linux kernel problem is actually caused by
				74	something else. These steps thus help to ensure the time you invest in this
				75	process won't feel wasted in the end:
				76
Thorsten Leemhuis	2dfa9eb	2021-03-19 20:27:46 +0100	[diff] [blame]	77	* Are you facing an issue with a Linux kernel a hardware or software vendor
				78	provided? Then in almost all cases you are better off to stop reading this
				79	document and reporting the issue to your vendor instead, unless you are
				80	willing to install the latest Linux version yourself. Be aware the latter
				81	will often be needed anyway to hunt down and fix issues.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	82
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	83	* Perform a rough search for existing reports with your favorite internet
Thorsten Leemhuis	58c5394	2021-03-30 16:13:07 +0200	[diff] [blame]	84	search engine; additionally, check the archives of the `Linux Kernel Mailing
				85	List (LKML) <https://lore.kernel.org/lkml/>`_. If you find matching reports,
				86	join the discussion instead of sending a new one.
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	87
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	88	* See if the issue you are dealing with qualifies as regression, security
				89	issue, or a really severe problem: those are 'issues of high priority' that
				90	need special handling in some steps that are about to follow.
				91
Thorsten Leemhuis	4f08d7a	2021-03-19 20:27:47 +0100	[diff] [blame]	92	* Make sure it's not the kernel's surroundings that are causing the issue
				93	you face.
				94
				95	* Create a fresh backup and put system repair and restore tools at hand.
				96
				97	* Ensure your system does not enhance its kernels by building additional
				98	kernel modules on-the-fly, which solutions like DKMS might be doing locally
				99	without your knowledge.
				100
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	101	* Check if your kernel was 'tainted' when the issue occurred, as the event
				102	that made the kernel set this flag might be causing the issue you face.
				103
Thorsten Leemhuis	4f08d7a	2021-03-19 20:27:47 +0100	[diff] [blame]	104	* Write down coarsely how to reproduce the issue. If you deal with multiple
				105	issues at once, create separate notes for each of them and make sure they
				106	work independently on a freshly booted system. That's needed, as each issue
				107	needs to get reported to the kernel developers separately, unless they are
				108	strongly entangled.
				109
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	110	* If you are facing a regression within a stable or longterm version line
				111	(say something broke when updating from 5.10.4 to 5.10.5), scroll down to
				112	'Dealing with regressions within a stable and longterm kernel line'.
				113
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	114	* Locate the driver or kernel subsystem that seems to be causing the issue.
				115	Find out how and where its developers expect reports. Note: most of the
				116	time this won't be bugzilla.kernel.org, as issues typically need to be sent
				117	by mail to a maintainer and a public mailing list.
				118
				119	* Search the archives of the bug tracker or mailing list in question
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	120	thoroughly for reports that might match your issue. If you find anything,
				121	join the discussion instead of sending a new report.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	122
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	123	After these preparations you'll now enter the main part:
				124
Thorsten Leemhuis	2dfa9eb	2021-03-19 20:27:46 +0100	[diff] [blame]	125	* Unless you are already running the latest 'mainline' Linux kernel, better
				126	go and install it for the reporting process. Testing and reporting with
				127	the latest 'stable' Linux can be an acceptable alternative in some
				128	situations; during the merge window that actually might be even the best
				129	approach, but in that development phase it can be an even better idea to
				130	suspend your efforts for a few days anyway. Whatever version you choose,
				131	ideally use a 'vanilla' build. Ignoring these advices will dramatically
				132	increase the risk your report will be rejected or ignored.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	133
				134	* Ensure the kernel you just installed does not 'taint' itself when
				135	running.
				136
				137	* Reproduce the issue with the kernel you just installed. If it doesn't show
Thorsten Leemhuis	613f969	2021-03-19 20:27:45 +0100	[diff] [blame]	138	up there, scroll down to the instructions for issues only happening with
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	139	stable and longterm kernels.
				140
				141	* Optimize your notes: try to find and write the most straightforward way to
				142	reproduce your issue. Make sure the end result has all the important
				143	details, and at the same time is easy to read and understand for others
				144	that hear about it for the first time. And if you learned something in this
				145	process, consider searching again for existing reports about the issue.
				146
Thorsten Leemhuis	315c4e4	2021-02-15 18:28:57 +0100	[diff] [blame]	147	* If your failure involves a 'panic', 'Oops', 'warning', or 'BUG', consider
				148	decoding the kernel log to find the line of code that triggered the error.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	149
				150	* If your problem is a regression, try to narrow down when the issue was
				151	introduced as much as possible.
				152
				153	* Start to compile the report by writing a detailed description about the
				154	issue. Always mention a few things: the latest kernel version you installed
				155	for reproducing, the Linux Distribution used, and your notes on how to
				156	reproduce the issue. Ideally, make the kernel's build configuration
				157	(.config) and the output from ``dmesg`` available somewhere on the net and
				158	link to it. Include or upload all other information that might be relevant,
				159	like the output/screenshot of an Oops or the output from ``lspci``. Once
				160	you wrote this main part, insert a normal length paragraph on top of it
				161	outlining the issue and the impact quickly. On top of this add one sentence
				162	that briefly describes the problem and gets people to read on. Now give the
				163	thing a descriptive title or subject that yet again is shorter. Then you're
				164	ready to send or file the report like the MAINTAINERS file told you, unless
				165	you are dealing with one of those 'issues of high priority': they need
				166	special care which is explained in 'Special handling for high priority
				167	issues' below.
				168
				169	* Wait for reactions and keep the thing rolling until you can accept the
				170	outcome in one way or the other. Thus react publicly and in a timely manner
				171	to any inquiries. Test proposed fixes. Do proactive testing: retest with at
				172	least every first release candidate (RC) of a new mainline version and
				173	report your results. Send friendly reminders if things stall. And try to
				174	help yourself, if you don't get any help or if it's unsatisfying.
				175
				176
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	177	Reporting regressions within a stable and longterm kernel line
				178	--------------------------------------------------------------
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	179
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	180	This subsection is for you, if you followed above process and got sent here at
				181	the point about regression within a stable or longterm kernel version line. You
				182	face one of those if something breaks when updating from 5.10.4 to 5.10.5 (a
				183	switch from 5.9.15 to 5.10.5 does not qualify). The developers want to fix such
				184	regressions as quickly as possible, hence there is a streamlined process to
				185	report them:
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	186
				187	* Check if the kernel developers still maintain the Linux kernel version
Thorsten Leemhuis	58c5394	2021-03-30 16:13:07 +0200	[diff] [blame]	188	line you care about: go to the `front page of kernel.org
				189	<https://kernel.org/>`_ and make sure it mentions
				190	the latest release of the particular version line without an '[EOL]' tag.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	191
Thorsten Leemhuis	58c5394	2021-03-30 16:13:07 +0200	[diff] [blame]	192	* Check the archives of the `Linux stable mailing list
				193	<https://lore.kernel.org/stable/>`_ for existing reports.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	194
				195	* Install the latest release from the particular version line as a vanilla
				196	kernel. Ensure this kernel is not tainted and still shows the problem, as
Thorsten Leemhuis	58c5394	2021-03-30 16:13:07 +0200	[diff] [blame]	197	the issue might have already been fixed there. If you first noticed the
				198	problem with a vendor kernel, check a vanilla build of the last version
Thorsten Leemhuis	6161a4b	2021-04-09 13:47:24 +0200	[diff] [blame]	199	known to work performs fine as well.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	200
Thorsten Leemhuis	58c5394	2021-03-30 16:13:07 +0200	[diff] [blame]	201	* Send a short problem report to the Linux stable mailing list
Thorsten Leemhuis	6161a4b	2021-04-09 13:47:24 +0200	[diff] [blame]	202	(stable@vger.kernel.org) and CC the Linux regressions mailing list
Thorsten Leemhuis	0043f0b	2021-04-15 12:29:14 +0200	[diff] [blame]	203	(regressions@lists.linux.dev); if you suspect the cause in a particular
				204	subsystem, CC its maintainer and its mailing list. Roughly describe the
				205	issue and ideally explain how to reproduce it. Mention the first version
				206	that shows the problem and the last version that's working fine. Then
				207	wait for further instructions.
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	208
				209	The reference section below explains each of these steps in more detail.
				210
				211
				212	Reporting issues only occurring in older kernel version lines
				213	-------------------------------------------------------------
				214
				215	This subsection is for you, if you tried the latest mainline kernel as outlined
				216	above, but failed to reproduce your issue there; at the same time you want to
Thorsten Leemhuis	58c5394	2021-03-30 16:13:07 +0200	[diff] [blame]	217	see the issue fixed in a still supported stable or longterm series or vendor
				218	kernels regularly rebased on those. If that the case, follow these steps:
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	219
				220	* Prepare yourself for the possibility that going through the next few steps
				221	might not get the issue solved in older releases: the fix might be too big
				222	or risky to get backported there.
				223
				224	* Perform the first three steps in the section "Dealing with regressions
				225	within a stable and longterm kernel line" above.
				226
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	227	* Search the Linux kernel version control system for the change that fixed
				228	the issue in mainline, as its commit message might tell you if the fix is
				229	scheduled for backporting already. If you don't find anything that way,
				230	search the appropriate mailing lists for posts that discuss such an issue
				231	or peer-review possible fixes; then check the discussions if the fix was
				232	deemed unsuitable for backporting. If backporting was not considered at
				233	all, join the newest discussion, asking if it's in the cards.
				234
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	235	* One of the former steps should lead to a solution. If that doesn't work
				236	out, ask the maintainers for the subsystem that seems to be causing the
				237	issue for advice; CC the mailing list for the particular subsystem as well
				238	as the stable mailing list.
				239
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	240	The reference section below explains each of these steps in more detail.
				241
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	242
				243	Reference section: Reporting issues to the kernel maintainers
				244	=============================================================
				245
				246	The detailed guides above outline all the major steps in brief fashion, which
				247	should be enough for most people. But sometimes there are situations where even
				248	experienced users might wonder how to actually do one of those steps. That's
				249	what this section is for, as it will provide a lot more details on each of the
				250	above steps. Consider this as reference documentation: it's possible to read it
				251	from top to bottom. But it's mainly meant to skim over and a place to look up
				252	details how to actually perform those steps.
				253
				254	A few words of general advice before digging into the details:
				255
				256	* The Linux kernel developers are well aware this process is complicated and
				257	demands more than other FLOSS projects. We'd love to make it simpler. But
				258	that would require work in various places as well as some infrastructure,
				259	which would need constant maintenance; nobody has stepped up to do that
				260	work, so that's just how things are for now.
				261
				262	* A warranty or support contract with some vendor doesn't entitle you to
				263	request fixes from developers in the upstream Linux kernel community: such
				264	contracts are completely outside the scope of the Linux kernel, its
				265	development community, and this document. That's why you can't demand
				266	anything such a contract guarantees in this context, not even if the
				267	developer handling the issue works for the vendor in question. If you want
				268	to claim your rights, use the vendor's support channel instead. When doing
				269	so, you might want to mention you'd like to see the issue fixed in the
				270	upstream Linux kernel; motivate them by saying it's the only way to ensure
				271	the fix in the end will get incorporated in all Linux distributions.
				272
				273	* If you never reported an issue to a FLOSS project before you should consider
				274	reading `How to Report Bugs Effectively
				275	<https://www.chiark.greenend.org.uk/~sgtatham/bugs.html>`_, `How To Ask
				276	Questions The Smart Way
				277	<http://www.catb.org/esr/faqs/smart-questions.html>`_, and `How to ask good
				278	questions <https://jvns.ca/blog/good-questions/>`_.
				279
				280	With that off the table, find below the details on how to properly report
				281	issues to the Linux kernel developers.
				282
				283
				284	Make sure you're using the upstream Linux kernel
				285	------------------------------------------------
				286
Thorsten Leemhuis	2dfa9eb	2021-03-19 20:27:46 +0100	[diff] [blame]	287	*Are you facing an issue with a Linux kernel a hardware or software vendor
				288	provided? Then in almost all cases you are better off to stop reading this
				289	document and reporting the issue to your vendor instead, unless you are
				290	willing to install the latest Linux version yourself. Be aware the latter
				291	will often be needed anyway to hunt down and fix issues.*
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	292
				293	Like most programmers, Linux kernel developers don't like to spend time dealing
Thorsten Leemhuis	2dfa9eb	2021-03-19 20:27:46 +0100	[diff] [blame]	294	with reports for issues that don't even happen with their current code. It's
				295	just a waste everybody's time, especially yours. Unfortunately such situations
				296	easily happen when it comes to the kernel and often leads to frustration on both
				297	sides. That's because almost all Linux-based kernels pre-installed on devices
				298	(Computers, Laptops, Smartphones, Routers, …) and most shipped by Linux
				299	distributors are quite distant from the official Linux kernel as distributed by
				300	kernel.org: these kernels from these vendors are often ancient from the point of
				301	Linux development or heavily modified, often both.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	302
Thorsten Leemhuis	58c5394	2021-03-30 16:13:07 +0200	[diff] [blame]	303	Most of these vendor kernels are quite unsuitable for reporting issues to the
Thorsten Leemhuis	2dfa9eb	2021-03-19 20:27:46 +0100	[diff] [blame]	304	Linux kernel developers: an issue you face with one of them might have been
				305	fixed by the Linux kernel developers months or years ago already; additionally,
				306	the modifications and enhancements by the vendor might be causing the issue you
				307	face, even if they look small or totally unrelated. That's why you should report
				308	issues with these kernels to the vendor. Its developers should look into the
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	309	report and, in case it turns out to be an upstream issue, fix it directly
Thorsten Leemhuis	2dfa9eb	2021-03-19 20:27:46 +0100	[diff] [blame]	310	upstream or forward the report there. In practice that often does not work out
				311	or might not what you want. You thus might want to consider circumventing the
				312	vendor by installing the very latest Linux kernel core yourself. If that's an
				313	option for you move ahead in this process, as a later step in this guide will
				314	explain how to do that once it rules out other potential causes for your issue.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	315
Thorsten Leemhuis	2dfa9eb	2021-03-19 20:27:46 +0100	[diff] [blame]	316	Note, the previous paragraph is starting with the word 'most', as sometimes
				317	developers in fact are willing to handle reports about issues occurring with
				318	vendor kernels. If they do in the end highly depends on the developers and the
				319	issue in question. Your chances are quite good if the distributor applied only
				320	small modifications to a kernel based on a recent Linux version; that for
				321	example often holds true for the mainline kernels shipped by Debian GNU/Linux
				322	Sid or Fedora Rawhide. Some developers will also accept reports about issues
				323	with kernels from distributions shipping the latest stable kernel, as long as
				324	its only slightly modified; that for example is often the case for Arch Linux,
				325	regular Fedora releases, and openSUSE Tumbleweed. But keep in mind, you better
				326	want to use a mainline Linux and avoid using a stable kernel for this
				327	process, as outlined in the section 'Install a fresh kernel for testing' in more
				328	detail.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	329
Thorsten Leemhuis	2dfa9eb	2021-03-19 20:27:46 +0100	[diff] [blame]	330	Obviously you are free to ignore all this advice and report problems with an old
				331	or heavily modified vendor kernel to the upstream Linux developers. But note,
				332	those often get rejected or ignored, so consider yourself warned. But it's still
				333	better than not reporting the issue at all: sometimes such reports directly or
				334	indirectly will help to get the issue fixed over time.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	335
Thorsten Leemhuis	9bc4430	2021-03-19 20:27:48 +0100	[diff] [blame]	336
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	337	Search for existing reports, first run
				338	--------------------------------------
Thorsten Leemhuis	9bc4430	2021-03-19 20:27:48 +0100	[diff] [blame]	339
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	340	*Perform a rough search for existing reports with your favorite internet
				341	search engine; additionally, check the archives of the Linux Kernel Mailing
				342	List (LKML). If you find matching reports, join the discussion instead of
				343	sending a new one.*
Thorsten Leemhuis	9bc4430	2021-03-19 20:27:48 +0100	[diff] [blame]	344
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	345	Reporting an issue that someone else already brought forward is often a waste of
				346	time for everyone involved, especially you as the reporter. So it's in your own
				347	interest to thoroughly check if somebody reported the issue already. At this
				348	step of the process it's okay to just perform a rough search: a later step will
				349	tell you to perform a more detailed search once you know where your issue needs
				350	to be reported to. Nevertheless, do not hurry with this step of the reporting
				351	process, it can save you time and trouble.
Thorsten Leemhuis	9bc4430	2021-03-19 20:27:48 +0100	[diff] [blame]	352
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	353	Simply search the internet with your favorite search engine first. Afterwards,
				354	search the `Linux Kernel Mailing List (LKML) archives
				355	<https://lore.kernel.org/lkml/>`_.
Thorsten Leemhuis	9bc4430	2021-03-19 20:27:48 +0100	[diff] [blame]	356
				357	If you get flooded with results consider telling your search engine to limit
				358	search timeframe to the past month or year. And wherever you search, make sure
				359	to use good search terms; vary them a few times, too. While doing so try to
				360	look at the issue from the perspective of someone else: that will help you to
				361	come up with other words to use as search terms. Also make sure not to use too
				362	many search terms at once. Remember to search with and without information like
				363	the name of the kernel driver or the name of the affected hardware component.
				364	But its exact brand name (say 'ASUS Red Devil Radeon RX 5700 XT Gaming OC')
				365	often is not much helpful, as it is too specific. Instead try search terms like
				366	the model line (Radeon 5700 or Radeon 5000) and the code name of the main chip
				367	('Navi' or 'Navi10') with and without its manufacturer ('AMD').
				368
				369	In case you find an existing report about your issue, join the discussion, as
				370	you might be able to provide valuable additional information. That can be
				371	important even when a fix is prepared or in its final stages already, as
				372	developers might look for people that can provide additional information or
				373	test a proposed fix. Jump to the section 'Duties after the report went out' for
				374	details on how to get properly involved.
				375
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	376	Note, searching `bugzilla.kernel.org <https://bugzilla.kernel.org/>`_ might also
				377	be a good idea, as that might provide valuable insights or turn up matching
				378	reports. If you find the latter, just keep in mind: most subsystems expect
				379	reports in different places, as described below in the section "Check where you
				380	need to report your issue". The developers that should take care of the issue
				381	thus might not even be aware of the bugzilla ticket. Hence, check the ticket if
				382	the issue already got reported as outlined in this document and if not consider
				383	doing so.
				384
Thorsten Leemhuis	9bc4430	2021-03-19 20:27:48 +0100	[diff] [blame]	385
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	386	Issue of high priority?
				387	-----------------------
				388
				389	*See if the issue you are dealing with qualifies as regression, security
				390	issue, or a really severe problem: those are 'issues of high priority' that
				391	need special handling in some steps that are about to follow.*
				392
				393	Linus Torvalds and the leading Linux kernel developers want to see some issues
				394	fixed as soon as possible, hence there are 'issues of high priority' that get
				395	handled slightly differently in the reporting process. Three type of cases
				396	qualify: regressions, security issues, and really severe problems.
				397
				398	You deal with a 'regression' if something that worked with an older version of
				399	the Linux kernel does not work with a newer one or somehow works worse with it.
				400	It thus is a regression when a WiFi driver that did a fine job with Linux 5.7
				401	somehow misbehaves with 5.8 or doesn't work at all. It's also a regression if
				402	an application shows erratic behavior with a newer kernel, which might happen
				403	due to incompatible changes in the interface between the kernel and the
				404	userland (like procfs and sysfs). Significantly reduced performance or
				405	increased power consumption also qualify as regression. But keep in mind: the
				406	new kernel needs to be built with a configuration that is similar to the one
				407	from the old kernel (see below how to achieve that). That's because the kernel
				408	developers sometimes can not avoid incompatibilities when implementing new
				409	features; but to avoid regressions such features have to be enabled explicitly
				410	during build time configuration.
				411
				412	What qualifies as security issue is left to your judgment. Consider reading
				413	'Documentation/admin-guide/security-bugs.rst' before proceeding, as it
				414	provides additional details how to best handle security issues.
				415
				416	An issue is a 'really severe problem' when something totally unacceptably bad
				417	happens. That's for example the case when a Linux kernel corrupts the data it's
				418	handling or damages hardware it's running on. You're also dealing with a severe
				419	issue when the kernel suddenly stops working with an error message ('kernel
				420	panic') or without any farewell note at all. Note: do not confuse a 'panic' (a
				421	fatal error where the kernel stop itself) with a 'Oops' (a recoverable error),
				422	as the kernel remains running after the latter.
				423
				424
Thorsten Leemhuis	4f08d7a	2021-03-19 20:27:47 +0100	[diff] [blame]	425	Ensure a healthy environment
				426	----------------------------
				427
				428	*Make sure it's not the kernel's surroundings that are causing the issue
				429	you face.*
				430
				431	Problems that look a lot like a kernel issue are sometimes caused by build or
				432	runtime environment. It's hard to rule out that problem completely, but you
				433	should minimize it:
				434
				435	* Use proven tools when building your kernel, as bugs in the compiler or the
				436	binutils can cause the resulting kernel to misbehave.
				437
				438	* Ensure your computer components run within their design specifications;
				439	that's especially important for the main processor, the main memory, and the
				440	motherboard. Therefore, stop undervolting or overclocking when facing a
				441	potential kernel issue.
				442
				443	* Try to make sure it's not faulty hardware that is causing your issue. Bad
				444	main memory for example can result in a multitude of issues that will
				445	manifest itself in problems looking like kernel issues.
				446
				447	* If you're dealing with a filesystem issue, you might want to check the file
				448	system in question with ``fsck``, as it might be damaged in a way that leads
				449	to unexpected kernel behavior.
				450
				451	* When dealing with a regression, make sure it's not something else that
				452	changed in parallel to updating the kernel. The problem for example might be
				453	caused by other software that was updated at the same time. It can also
				454	happen that a hardware component coincidentally just broke when you rebooted
				455	into a new kernel for the first time. Updating the systems BIOS or changing
				456	something in the BIOS Setup can also lead to problems that on look a lot
				457	like a kernel regression.
				458
				459
				460	Prepare for emergencies
				461	-----------------------
				462
				463	Create a fresh backup and put system repair and restore tools at hand.
				464
				465	Reminder, you are dealing with computers, which sometimes do unexpected things,
				466	especially if you fiddle with crucial parts like the kernel of its operating
				467	system. That's what you are about to do in this process. Thus, make sure to
				468	create a fresh backup; also ensure you have all tools at hand to repair or
				469	reinstall the operating system as well as everything you need to restore the
				470	backup.
				471
				472
				473	Make sure your kernel doesn't get enhanced
				474	------------------------------------------
				475
				476	*Ensure your system does not enhance its kernels by building additional
				477	kernel modules on-the-fly, which solutions like DKMS might be doing locally
				478	without your knowledge.*
				479
				480	The risk your issue report gets ignored or rejected dramatically increases if
				481	your kernel gets enhanced in any way. That's why you should remove or disable
				482	mechanisms like akmods and DKMS: those build add-on kernel modules
				483	automatically, for example when you install a new Linux kernel or boot it for
				484	the first time. Also remove any modules they might have installed. Then reboot
				485	before proceeding.
				486
				487	Note, you might not be aware that your system is using one of these solutions:
				488	they often get set up silently when you install Nvidia's proprietary graphics
				489	driver, VirtualBox, or other software that requires a some support from a
				490	module not part of the Linux kernel. That why your might need to uninstall the
				491	packages with such software to get rid of any 3rd party kernel module.
				492
				493
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	494	Check 'taint' flag
				495	------------------
				496
				497	*Check if your kernel was 'tainted' when the issue occurred, as the event
				498	that made the kernel set this flag might be causing the issue you face.*
				499
				500	The kernel marks itself with a 'taint' flag when something happens that might
				501	lead to follow-up errors that look totally unrelated. The issue you face might
				502	be such an error if your kernel is tainted. That's why it's in your interest to
				503	rule this out early before investing more time into this process. This is the
				504	only reason why this step is here, as this process later will tell you to
				505	install the latest mainline kernel; you will need to check the taint flag again
				506	then, as that's when it matters because it's the kernel the report will focus
				507	on.
				508
				509	On a running system is easy to check if the kernel tainted itself: if ``cat
				510	/proc/sys/kernel/tainted`` returns '0' then the kernel is not tainted and
				511	everything is fine. Checking that file is impossible in some situations; that's
				512	why the kernel also mentions the taint status when it reports an internal
				513	problem (a 'kernel bug'), a recoverable error (a 'kernel Oops') or a
				514	non-recoverable error before halting operation (a 'kernel panic'). Look near
				515	the top of the error messages printed when one of these occurs and search for a
				516	line starting with 'CPU:'. It should end with 'Not tainted' if the kernel was
				517	not tainted when it noticed the problem; it was tainted if you see 'Tainted:'
				518	followed by a few spaces and some letters.
				519
				520	If your kernel is tainted, study 'Documentation/admin-guide/tainted-kernels.rst'
				521	to find out why. Try to eliminate the reason. Often it's caused by one these
				522	three things:
				523
				524	1. A recoverable error (a 'kernel Oops') occurred and the kernel tainted
				525	itself, as the kernel knows it might misbehave in strange ways after that
				526	point. In that case check your kernel or system log and look for a section
				527	that starts with this::
				528
				529	Oops: 0000 [#1] SMP
				530
				531	That's the first Oops since boot-up, as the '#1' between the brackets shows.
				532	Every Oops and any other problem that happens after that point might be a
				533	follow-up problem to that first Oops, even if both look totally unrelated.
				534	Rule this out by getting rid of the cause for the first Oops and reproducing
				535	the issue afterwards. Sometimes simply restarting will be enough, sometimes
				536	a change to the configuration followed by a reboot can eliminate the Oops.
				537	But don't invest too much time into this at this point of the process, as
				538	the cause for the Oops might already be fixed in the newer Linux kernel
				539	version you are going to install later in this process.
				540
				541	2. Your system uses a software that installs its own kernel modules, for
				542	example Nvidia's proprietary graphics driver or VirtualBox. The kernel
				543	taints itself when it loads such module from external sources (even if
				544	they are Open Source): they sometimes cause errors in unrelated kernel
				545	areas and thus might be causing the issue you face. You therefore have to
				546	prevent those modules from loading when you want to report an issue to the
				547	Linux kernel developers. Most of the time the easiest way to do that is:
				548	temporarily uninstall such software including any modules they might have
				549	installed. Afterwards reboot.
				550
				551	3. The kernel also taints itself when it's loading a module that resides in
				552	the staging tree of the Linux kernel source. That's a special area for
				553	code (mostly drivers) that does not yet fulfill the normal Linux kernel
				554	quality standards. When you report an issue with such a module it's
				555	obviously okay if the kernel is tainted; just make sure the module in
				556	question is the only reason for the taint. If the issue happens in an
				557	unrelated area reboot and temporarily block the module from being loaded
				558	by specifying ``foo.blacklist=1`` as kernel parameter (replace 'foo' with
				559	the name of the module in question).
				560
				561
Thorsten Leemhuis	4f08d7a	2021-03-19 20:27:47 +0100	[diff] [blame]	562	Document how to reproduce issue
				563	-------------------------------
				564
				565	*Write down coarsely how to reproduce the issue. If you deal with multiple
				566	issues at once, create separate notes for each of them and make sure they
				567	work independently on a freshly booted system. That's needed, as each issue
				568	needs to get reported to the kernel developers separately, unless they are
				569	strongly entangled.*
				570
				571	If you deal with multiple issues at once, you'll have to report each of them
				572	separately, as they might be handled by different developers. Describing
				573	various issues in one report also makes it quite difficult for others to tear
				574	it apart. Hence, only combine issues in one report if they are very strongly
				575	entangled.
				576
				577	Additionally, during the reporting process you will have to test if the issue
				578	happens with other kernel versions. Therefore, it will make your work easier if
				579	you know exactly how to reproduce an issue quickly on a freshly booted system.
				580
				581	Note: it's often fruitless to report issues that only happened once, as they
				582	might be caused by a bit flip due to cosmic radiation. That's why you should
				583	try to rule that out by reproducing the issue before going further. Feel free
				584	to ignore this advice if you are experienced enough to tell a one-time error
				585	due to faulty hardware apart from a kernel issue that rarely happens and thus
				586	is hard to reproduce.
				587
				588
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	589	Regression in stable or longterm kernel?
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	590	----------------------------------------
				591
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	592	*If you are facing a regression within a stable or longterm version line
				593	(say something broke when updating from 5.10.4 to 5.10.5), scroll down to
				594	'Dealing with regressions within a stable and longterm kernel line'.*
				595
				596	Regression within a stable and longterm kernel version line are something the
				597	Linux developers want to fix badly, as such issues are even more unwanted than
				598	regression in the main development branch, as they can quickly affect a lot of
				599	people. The developers thus want to learn about such issues as quickly as
				600	possible, hence there is a streamlined process to report them. Note,
				601	regressions with newer kernel version line (say something broke when switching
				602	from 5.9.15 to 5.10.5) do not qualify.
				603
				604
				605	Check where you need to report your issue
				606	-----------------------------------------
				607
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	608	*Locate the driver or kernel subsystem that seems to be causing the issue.
				609	Find out how and where its developers expect reports. Note: most of the
				610	time this won't be bugzilla.kernel.org, as issues typically need to be sent
				611	by mail to a maintainer and a public mailing list.*
				612
				613	It's crucial to send your report to the right people, as the Linux kernel is a
				614	big project and most of its developers are only familiar with a small subset of
				615	it. Quite a few programmers for example only care for just one driver, for
				616	example one for a WiFi chip; its developer likely will only have small or no
				617	knowledge about the internals of remote or unrelated "subsystems", like the TCP
				618	stack, the PCIe/PCI subsystem, memory management or file systems.
				619
				620	Problem is: the Linux kernel lacks a central bug tracker where you can simply
				621	file your issue and make it reach the developers that need to know about it.
				622	That's why you have to find the right place and way to report issues yourself.
				623	You can do that with the help of a script (see below), but it mainly targets
				624	kernel developers and experts. For everybody else the MAINTAINERS file is the
				625	better place.
				626
				627	How to read the MAINTAINERS file
				628	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				629	To illustrate how to use the :ref:`MAINTAINERS <maintainers>` file, lets assume
				630	the WiFi in your Laptop suddenly misbehaves after updating the kernel. In that
				631	case it's likely an issue in the WiFi driver. Obviously it could also be some
				632	code it builds upon, but unless you suspect something like that stick to the
				633	driver. If it's really something else, the driver's developers will get the
				634	right people involved.
				635
				636	Sadly, there is no way to check which code is driving a particular hardware
				637	component that is both universal and easy.
				638
				639	In case of a problem with the WiFi driver you for example might want to look at
				640	the output of ``lspci -k``, as it lists devices on the PCI/PCIe bus and the
				641	kernel module driving it::
				642
				643	[user@something ~]$ lspci -k
				644	[...]
				645	3a:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter (rev 32)
				646	Subsystem: Bigfoot Networks, Inc. Device 1535
				647	Kernel driver in use: ath10k_pci
				648	Kernel modules: ath10k_pci
				649	[...]
				650
				651	But this approach won't work if your WiFi chip is connected over USB or some
				652	other internal bus. In those cases you might want to check your WiFi manager or
				653	the output of ``ip link``. Look for the name of the problematic network
				654	interface, which might be something like 'wlp58s0'. This name can be used like
				655	this to find the module driving it::
				656
				657	[user@something ~]$ realpath --relative-to=/sys/module/ /sys/class/net/wlp58s0/device/driver/module
				658	ath10k_pci
				659
				660	In case tricks like these don't bring you any further, try to search the
				661	internet on how to narrow down the driver or subsystem in question. And if you
				662	are unsure which it is: just try your best guess, somebody will help you if you
				663	guessed poorly.
				664
				665	Once you know the driver or subsystem, you want to search for it in the
				666	MAINTAINERS file. In the case of 'ath10k_pci' you won't find anything, as the
				667	name is too specific. Sometimes you will need to search on the net for help;
				668	but before doing so, try a somewhat shorted or modified name when searching the
				669	MAINTAINERS file, as then you might find something like this::
				670
				671	QUALCOMM ATHEROS ATH10K WIRELESS DRIVER
				672	Mail: A. Some Human <shuman@example.com>
				673	Mailing list: ath10k@lists.infradead.org
				674	Status: Supported
				675	Web-page: https://wireless.wiki.kernel.org/en/users/Drivers/ath10k
				676	SCM: git git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git
				677	Files: drivers/net/wireless/ath/ath10k/
				678
				679	Note: the line description will be abbreviations, if you read the plain
				680	MAINTAINERS file found in the root of the Linux source tree. 'Mail:' for
				681	example will be 'M:', 'Mailing list:' will be 'L', and 'Status:' will be 'S:'.
				682	A section near the top of the file explains these and other abbreviations.
				683
				684	First look at the line 'Status'. Ideally it should be 'Supported' or
				685	'Maintained'. If it states 'Obsolete' then you are using some outdated approach
				686	that was replaced by a newer solution you need to switch to. Sometimes the code
				687	only has someone who provides 'Odd Fixes' when feeling motivated. And with
				688	'Orphan' you are totally out of luck, as nobody takes care of the code anymore.
				689	That only leaves these options: arrange yourself to live with the issue, fix it
				690	yourself, or find a programmer somewhere willing to fix it.
				691
				692	After checking the status, look for a line starting with 'bugs:': it will tell
				693	you where to find a subsystem specific bug tracker to file your issue. The
				694	example above does not have such a line. That is the case for most sections, as
				695	Linux kernel development is completely driven by mail. Very few subsystems use
				696	a bug tracker, and only some of those rely on bugzilla.kernel.org.
				697
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	698	In this and many other cases you thus have to look for lines starting with
				699	'Mail:' instead. Those mention the name and the email addresses for the
				700	maintainers of the particular code. Also look for a line starting with 'Mailing
				701	list:', which tells you the public mailing list where the code is developed.
				702	Your report later needs to go by mail to those addresses. Additionally, for all
				703	issue reports sent by email, make sure to add the Linux Kernel Mailing List
				704	(LKML) <linux-kernel@vger.kernel.org> to CC. Don't omit either of the mailing
				705	lists when sending your issue report by mail later! Maintainers are busy people
				706	and might leave some work for other developers on the subsystem specific list;
				707	and LKML is important to have one place where all issue reports can be found.
				708
				709
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	710	Finding the maintainers with the help of a script
				711	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				712
				713	For people that have the Linux sources at hand there is a second option to find
				714	the proper place to report: the script 'scripts/get_maintainer.pl' which tries
				715	to find all people to contact. It queries the MAINTAINERS file and needs to be
				716	called with a path to the source code in question. For drivers compiled as
				717	module if often can be found with a command like this::
				718
				719	$ modinfo ath10k_pci \| grep filename \| sed 's!/lib/modules/.*/kernel/!!; s!filename:!!; s!\.ko$\\|\.xz$!!'
				720	drivers/net/wireless/ath/ath10k/ath10k_pci.ko
				721
				722	Pass parts of this to the script::
				723
				724	$ ./scripts/get_maintainer.pl -f drivers/net/wireless/ath/ath10k*
				725	Some Human <shuman@example.com> (supporter:QUALCOMM ATHEROS ATH10K WIRELESS DRIVER)
				726	Another S. Human <asomehuman@example.com> (maintainer:NETWORKING DRIVERS)
				727	ath10k@lists.infradead.org (open list:QUALCOMM ATHEROS ATH10K WIRELESS DRIVER)
				728	linux-wireless@vger.kernel.org (open list:NETWORKING DRIVERS (WIRELESS))
				729	netdev@vger.kernel.org (open list:NETWORKING DRIVERS)
				730	linux-kernel@vger.kernel.org (open list)
				731
				732	Don't sent your report to all of them. Send it to the maintainers, which the
				733	script calls "supporter:"; additionally CC the most specific mailing list for
				734	the code as well as the Linux Kernel Mailing List (LKML). In this case you thus
				735	would need to send the report to 'Some Human <shuman@example.com>' with
				736	'ath10k@lists.infradead.org' and 'linux-kernel@vger.kernel.org' in CC.
				737
				738	Note: in case you cloned the Linux sources with git you might want to call
				739	``get_maintainer.pl`` a second time with ``--git``. The script then will look
				740	at the commit history to find which people recently worked on the code in
				741	question, as they might be able to help. But use these results with care, as it
				742	can easily send you in a wrong direction. That for example happens quickly in
				743	areas rarely changed (like old or unmaintained drivers): sometimes such code is
				744	modified during tree-wide cleanups by developers that do not care about the
				745	particular driver at all.
				746
				747
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	748	Search for existing reports, second run
				749	---------------------------------------
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	750
				751	*Search the archives of the bug tracker or mailing list in question
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	752	thoroughly for reports that might match your issue. If you find anything,
				753	join the discussion instead of sending a new report.*
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	754
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	755	As mentioned earlier already: reporting an issue that someone else already
				756	brought forward is often a waste of time for everyone involved, especially you
				757	as the reporter. That's why you should search for existing report again, now
				758	that you know where they need to be reported to. If it's mailing list, you will
				759	often find its archives on `lore.kernel.org <https://lore.kernel.org/>`_.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	760
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	761	But some list are hosted in different places. That for example is the case for
				762	the ath10k WiFi driver used as example in the previous step. But you'll often
				763	find the archives for these lists easily on the net. Searching for 'archive
				764	ath10k@lists.infradead.org' for example will lead you to the `Info page for the
				765	ath10k mailing list <https://lists.infradead.org/mailman/listinfo/ath10k>`_,
				766	which at the top links to its
				767	`list archives <https://lists.infradead.org/pipermail/ath10k/>`_. Sadly this and
				768	quite a few other lists miss a way to search the archives. In those cases use a
				769	regular internet search engine and add something like
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	770	'site:lists.infradead.org/pipermail/ath10k/' to your search terms, which limits
				771	the results to the archives at that URL.
				772
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	773	It's also wise to check the internet, LKML and maybe bugzilla.kernel.org again
Thorsten Leemhuis	0043f0b	2021-04-15 12:29:14 +0200	[diff] [blame]	774	at this point. If your report needs to be filed in a bug tracker, you may want
				775	to check the mailing list archives for the subsystem as well, as someone might
				776	have reported it only there.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	777
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	778	For details how to search and what to do if you find matching reports see
				779	"Search for existing reports, first run" above.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	780
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	781	Do not hurry with this step of the reporting process: spending 30 to 60 minutes
				782	or even more time can save you and others quite a lot of time and trouble.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	783
				784
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	785	Install a fresh kernel for testing
				786	----------------------------------
				787
Thorsten Leemhuis	2dfa9eb	2021-03-19 20:27:46 +0100	[diff] [blame]	788	*Unless you are already running the latest 'mainline' Linux kernel, better
				789	go and install it for the reporting process. Testing and reporting with
				790	the latest 'stable' Linux can be an acceptable alternative in some
				791	situations; during the merge window that actually might be even the best
				792	approach, but in that development phase it can be an even better idea to
				793	suspend your efforts for a few days anyway. Whatever version you choose,
				794	ideally use a 'vanilla' built. Ignoring these advices will dramatically
				795	increase the risk your report will be rejected or ignored.*
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	796
Thorsten Leemhuis	2dfa9eb	2021-03-19 20:27:46 +0100	[diff] [blame]	797	As mentioned in the detailed explanation for the first step already: Like most
				798	programmers, Linux kernel developers don't like to spend time dealing with
				799	reports for issues that don't even happen with the current code. It's just a
				800	waste everybody's time, especially yours. That's why it's in everybody's
				801	interest that you confirm the issue still exists with the latest upstream code
				802	before reporting it. You are free to ignore this advice, but as outlined
				803	earlier: doing so dramatically increases the risk that your issue report might
				804	get rejected or simply ignored.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	805
Thorsten Leemhuis	2dfa9eb	2021-03-19 20:27:46 +0100	[diff] [blame]	806	In the scope of the kernel "latest upstream" normally means:
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	807
Thorsten Leemhuis	2dfa9eb	2021-03-19 20:27:46 +0100	[diff] [blame]	808	* Install a mainline kernel; the latest stable kernel can be an option, but
				809	most of the time is better avoided. Longterm kernels (sometimes called 'LTS
				810	kernels') are unsuitable at this point of the process. The next subsection
				811	explains all of this in more detail.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	812
Thorsten Leemhuis	2dfa9eb	2021-03-19 20:27:46 +0100	[diff] [blame]	813	* The over next subsection describes way to obtain and install such a kernel.
				814	It also outlines that using a pre-compiled kernel are fine, but better are
				815	vanilla, which means: it was built using Linux sources taken straight `from
				816	kernel.org <https://kernel.org/>`_ and not modified or enhanced in any way.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	817
Thorsten Leemhuis	2dfa9eb	2021-03-19 20:27:46 +0100	[diff] [blame]	818	Choosing the right version for testing
				819	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				820
				821	Head over to `kernel.org <https://kernel.org/>`_ to find out which version you
				822	want to use for testing. Ignore the big yellow button that says 'Latest release'
				823	and look a little lower at the table. At its top you'll see a line starting with
				824	mainline, which most of the time will point to a pre-release with a version
				825	number like '5.8-rc2'. If that's the case, you'll want to use this mainline
				826	kernel for testing, as that where all fixes have to be applied first. Do not let
				827	that 'rc' scare you, these 'development kernels' are pretty reliable — and you
				828	made a backup, as you were instructed above, didn't you?
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	829
Thorsten Leemhuis	58c5394	2021-03-30 16:13:07 +0200	[diff] [blame]	830	In about two out of every nine to ten weeks, mainline might point you to a
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	831	proper release with a version number like '5.7'. If that happens, consider
				832	suspending the reporting process until the first pre-release of the next
				833	version (5.8-rc1) shows up on kernel.org. That's because the Linux development
				834	cycle then is in its two-week long 'merge window'. The bulk of the changes and
				835	all intrusive ones get merged for the next release during this time. It's a bit
				836	more risky to use mainline during this period. Kernel developers are also often
				837	quite busy then and might have no spare time to deal with issue reports. It's
				838	also quite possible that one of the many changes applied during the merge
				839	window fixes the issue you face; that's why you soon would have to retest with
				840	a newer kernel version anyway, as outlined below in the section 'Duties after
				841	the report went out'.
				842
				843	That's why it might make sense to wait till the merge window is over. But don't
				844	to that if you're dealing with something that shouldn't wait. In that case
				845	consider obtaining the latest mainline kernel via git (see below) or use the
				846	latest stable version offered on kernel.org. Using that is also acceptable in
				847	case mainline for some reason does currently not work for you. An in general:
				848	using it for reproducing the issue is also better than not reporting it issue
				849	at all.
				850
Thorsten Leemhuis	2dfa9eb	2021-03-19 20:27:46 +0100	[diff] [blame]	851	Better avoid using the latest stable kernel outside merge windows, as all fixes
				852	must be applied to mainline first. That's why checking the latest mainline
				853	kernel is so important: any issue you want to see fixed in older version lines
				854	needs to be fixed in mainline first before it can get backported, which can
				855	take a few days or weeks. Another reason: the fix you hope for might be too
				856	hard or risky for backporting; reporting the issue again hence is unlikely to
				857	change anything.
				858
				859	These aspects are also why longterm kernels (sometimes called "LTS kernels")
				860	are unsuitable for this part of the reporting process: they are to distant from
				861	the current code. Hence go and test mainline first and follow the process
				862	further: if the issue doesn't occur with mainline it will guide you how to get
				863	it fixed in older version lines, if that's in the cards for the fix in question.
				864
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	865	How to obtain a fresh Linux kernel
				866	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				867
Thorsten Leemhuis	2dfa9eb	2021-03-19 20:27:46 +0100	[diff] [blame]	868	Using a pre-compiled kernel: This is often the quickest, easiest, and safest
				869	way for testing — especially is you are unfamiliar with the Linux kernel. The
				870	problem: most of those shipped by distributors or add-on repositories are build
				871	from modified Linux sources. They are thus not vanilla and therefore often
				872	unsuitable for testing and issue reporting: the changes might cause the issue
				873	you face or influence it somehow.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	874
Thorsten Leemhuis	2dfa9eb	2021-03-19 20:27:46 +0100	[diff] [blame]	875	But you are in luck if you are using a popular Linux distribution: for quite a
				876	few of them you'll find repositories on the net that contain packages with the
				877	latest mainline or stable Linux built as vanilla kernel. It's totally okay to
				878	use these, just make sure from the repository's description they are vanilla or
				879	at least close to it. Additionally ensure the packages contain the latest
				880	versions as offered on kernel.org. The packages are likely unsuitable if they
				881	are older than a week, as new mainline and stable kernels typically get released
				882	at least once a week.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	883
Thorsten Leemhuis	2dfa9eb	2021-03-19 20:27:46 +0100	[diff] [blame]	884	Please note that you might need to build your own kernel manually later: that's
				885	sometimes needed for debugging or testing fixes, as described later in this
				886	document. Also be aware that pre-compiled kernels might lack debug symbols that
				887	are needed to decode messages the kernel prints when a panic, Oops, warning, or
				888	BUG occurs; if you plan to decode those, you might be better off compiling a
				889	kernel yourself (see the end of this subsection and the section titled 'Decode
				890	failure messages' for details).
				891
				892	Using git: Developers and experienced Linux users familiar with git are
				893	often best served by obtaining the latest Linux kernel sources straight from the
				894	`official development repository on kernel.org
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	895	<https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/>`_.
				896	Those are likely a bit ahead of the latest mainline pre-release. Don't worry
				897	about it: they are as reliable as a proper pre-release, unless the kernel's
				898	development cycle is currently in the middle of a merge window. But even then
				899	they are quite reliable.
				900
Thorsten Leemhuis	2dfa9eb	2021-03-19 20:27:46 +0100	[diff] [blame]	901	Conventional: People unfamiliar with git are often best served by
				902	downloading the sources as tarball from `kernel.org <https://kernel.org/>`_.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	903
Thorsten Leemhuis	2dfa9eb	2021-03-19 20:27:46 +0100	[diff] [blame]	904	How to actually build a kernel is not described here, as many websites explain
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	905	the necessary steps already. If you are new to it, consider following one of
				906	those how-to's that suggest to use ``make localmodconfig``, as that tries to
				907	pick up the configuration of your current kernel and then tries to adjust it
				908	somewhat for your system. That does not make the resulting kernel any better,
				909	but quicker to compile.
				910
Thorsten Leemhuis	315c4e4	2021-02-15 18:28:57 +0100	[diff] [blame]	911	Note: If you are dealing with a panic, Oops, warning, or BUG from the kernel,
				912	please try to enable CONFIG_KALLSYMS when configuring your kernel.
				913	Additionally, enable CONFIG_DEBUG_KERNEL and CONFIG_DEBUG_INFO, too; the
				914	latter is the relevant one of those two, but can only be reached if you enable
				915	the former. Be aware CONFIG_DEBUG_INFO increases the storage space required to
				916	build a kernel by quite a bit. But that's worth it, as these options will allow
				917	you later to pinpoint the exact line of code that triggers your issue. The
				918	section 'Decode failure messages' below explains this in more detail.
				919
				920	But keep in mind: Always keep a record of the issue encountered in case it is
				921	hard to reproduce. Sending an undecoded report is better than not reporting
				922	the issue at all.
				923
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	924
				925	Check 'taint' flag
				926	------------------
				927
				928	*Ensure the kernel you just installed does not 'taint' itself when
				929	running.*
				930
				931	As outlined above in more detail already: the kernel sets a 'taint' flag when
				932	something happens that can lead to follow-up errors that look totally
				933	unrelated. That's why you need to check if the kernel you just installed does
				934	not set this flag. And if it does, you in almost all the cases needs to
				935	eliminate the reason for it before you reporting issues that occur with it. See
				936	the section above for details how to do that.
				937
				938
				939	Reproduce issue with the fresh kernel
				940	-------------------------------------
				941
				942	*Reproduce the issue with the kernel you just installed. If it doesn't show
Thorsten Leemhuis	613f969	2021-03-19 20:27:45 +0100	[diff] [blame]	943	up there, scroll down to the instructions for issues only happening with
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	944	stable and longterm kernels.*
				945
				946	Check if the issue occurs with the fresh Linux kernel version you just
				947	installed. If it was fixed there already, consider sticking with this version
				948	line and abandoning your plan to report the issue. But keep in mind that other
				949	users might still be plagued by it, as long as it's not fixed in either stable
				950	and longterm version from kernel.org (and thus vendor kernels derived from
				951	those). If you prefer to use one of those or just want to help their users,
				952	head over to the section "Details about reporting issues only occurring in
				953	older kernel version lines" below.
				954
				955
				956	Optimize description to reproduce issue
				957	---------------------------------------
				958
				959	*Optimize your notes: try to find and write the most straightforward way to
				960	reproduce your issue. Make sure the end result has all the important
				961	details, and at the same time is easy to read and understand for others
				962	that hear about it for the first time. And if you learned something in this
				963	process, consider searching again for existing reports about the issue.*
				964
				965	An unnecessarily complex report will make it hard for others to understand your
				966	report. Thus try to find a reproducer that's straight forward to describe and
				967	thus easy to understand in written form. Include all important details, but at
				968	the same time try to keep it as short as possible.
				969
				970	In this in the previous steps you likely have learned a thing or two about the
				971	issue you face. Use this knowledge and search again for existing reports
				972	instead you can join.
				973
				974
				975	Decode failure messages
				976	-----------------------
				977
Thorsten Leemhuis	315c4e4	2021-02-15 18:28:57 +0100	[diff] [blame]	978	*If your failure involves a 'panic', 'Oops', 'warning', or 'BUG', consider
				979	decoding the kernel log to find the line of code that triggered the error.*
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	980
Thorsten Leemhuis	315c4e4	2021-02-15 18:28:57 +0100	[diff] [blame]	981	When the kernel detects an internal problem, it will log some information about
				982	the executed code. This makes it possible to pinpoint the exact line in the
				983	source code that triggered the issue and shows how it was called. But that only
				984	works if you enabled CONFIG_DEBUG_INFO and CONFIG_KALLSYMS when configuring
				985	your kernel. If you did so, consider to decode the information from the
				986	kernel's log. That will make it a lot easier to understand what lead to the
				987	'panic', 'Oops', 'warning', or 'BUG', which increases the chances that someone
				988	can provide a fix.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	989
Thorsten Leemhuis	315c4e4	2021-02-15 18:28:57 +0100	[diff] [blame]	990	Decoding can be done with a script you find in the Linux source tree. If you
				991	are running a kernel you compiled yourself earlier, call it like this::
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	992
Thorsten Leemhuis	315c4e4	2021-02-15 18:28:57 +0100	[diff] [blame]	993	[user@something ~]$ sudo dmesg \| ./linux-5.10.5/scripts/decode_stacktrace.sh ./linux-5.10.5/vmlinux
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	994
Thorsten Leemhuis	315c4e4	2021-02-15 18:28:57 +0100	[diff] [blame]	995	If you are running a packaged vanilla kernel, you will likely have to install
				996	the corresponding packages with debug symbols. Then call the script (which you
				997	might need to get from the Linux sources if your distro does not package it)
				998	like this::
Thorsten Leemhuis	e223a70	2020-12-09 06:19:14 +0100	[diff] [blame]	999
Thorsten Leemhuis	315c4e4	2021-02-15 18:28:57 +0100	[diff] [blame]	1000	[user@something ~]$ sudo dmesg \| ./linux-5.10.5/scripts/decode_stacktrace.sh \
				1001	/usr/lib/debug/lib/modules/5.10.10-4.1.x86_64/vmlinux /usr/src/kernels/5.10.10-4.1.x86_64/
				1002
				1003	The script will work on log lines like the following, which show the address of
				1004	the code the kernel was executing when the error occurred::
				1005
				1006	[ 68.387301] RIP: 0010:test_module_init+0x5/0xffa [test_module]
				1007
				1008	Once decoded, these lines will look like this::
				1009
				1010	[ 68.387301] RIP: 0010:test_module_init (/home/username/linux-5.10.5/test-module/test-module.c:16) test_module
				1011
				1012	In this case the executed code was built from the file
				1013	'~/linux-5.10.5/test-module/test-module.c' and the error occurred by the
				1014	instructions found in line '16'.
				1015
				1016	The script will similarly decode the addresses mentioned in the section
				1017	starting with 'Call trace', which show the path to the function where the
				1018	problem occurred. Additionally, the script will show the assembler output for
				1019	the code section the kernel was executing.
				1020
				1021	Note, if you can't get this to work, simply skip this step and mention the
				1022	reason for it in the report. If you're lucky, it might not be needed. And if it
				1023	is, someone might help you to get things going. Also be aware this is just one
				1024	of several ways to decode kernel stack traces. Sometimes different steps will
				1025	be required to retrieve the relevant details. Don't worry about that, if that's
				1026	needed in your case, developers will tell you what to do.
Thorsten Leemhuis	e223a70	2020-12-09 06:19:14 +0100	[diff] [blame]	1027
				1028
				1029	Special care for regressions
				1030	----------------------------
				1031
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	1032	*If your problem is a regression, try to narrow down when the issue was
				1033	introduced as much as possible.*
				1034
				1035	Linux lead developer Linus Torvalds insists that the Linux kernel never
				1036	worsens, that's why he deems regressions as unacceptable and wants to see them
				1037	fixed quickly. That's why changes that introduced a regression are often
				1038	promptly reverted if the issue they cause can't get solved quickly any other
				1039	way. Reporting a regression is thus a bit like playing a kind of trump card to
				1040	get something quickly fixed. But for that to happen the change that's causing
				1041	the regression needs to be known. Normally it's up to the reporter to track
				1042	down the culprit, as maintainers often won't have the time or setup at hand to
				1043	reproduce it themselves.
				1044
				1045	To find the change there is a process called 'bisection' which the document
				1046	'Documentation/admin-guide/bug-bisect.rst' describes in detail. That process
				1047	will often require you to build about ten to twenty kernel images, trying to
				1048	reproduce the issue with each of them before building the next. Yes, that takes
				1049	some time, but don't worry, it works a lot quicker than most people assume.
				1050	Thanks to a 'binary search' this will lead you to the one commit in the source
				1051	code management system that's causing the regression. Once you find it, search
				1052	the net for the subject of the change, its commit id and the shortened commit id
				1053	(the first 12 characters of the commit id). This will lead you to existing
				1054	reports about it, if there are any.
				1055
				1056	Note, a bisection needs a bit of know-how, which not everyone has, and quite a
				1057	bit of effort, which not everyone is willing to invest. Nevertheless, it's
				1058	highly recommended performing a bisection yourself. If you really can't or
				1059	don't want to go down that route at least find out which mainline kernel
				1060	introduced the regression. If something for example breaks when switching from
				1061	5.5.15 to 5.8.4, then try at least all the mainline releases in that area (5.6,
				1062	5.7 and 5.8) to check when it first showed up. Unless you're trying to find a
				1063	regression in a stable or longterm kernel, avoid testing versions which number
				1064	has three sections (5.6.12, 5.7.8), as that makes the outcome hard to
				1065	interpret, which might render your testing useless. Once you found the major
				1066	version which introduced the regression, feel free to move on in the reporting
				1067	process. But keep in mind: it depends on the issue at hand if the developers
				1068	will be able to help without knowing the culprit. Sometimes they might
				1069	recognize from the report want went wrong and can fix it; other times they will
				1070	be unable to help unless you perform a bisection.
				1071
				1072	When dealing with regressions make sure the issue you face is really caused by
				1073	the kernel and not by something else, as outlined above already.
				1074
				1075	In the whole process keep in mind: an issue only qualifies as regression if the
				1076	older and the newer kernel got built with a similar configuration. The best way
				1077	to archive this: copy the configuration file (``.config``) from the old working
				1078	kernel freshly to each newer kernel version you try. Afterwards run ``make
Ismael Luceno	0e5e0a5	2021-03-31 18:35:41 +0200	[diff] [blame]	1079	olddefconfig`` to adjust it for the needs of the new version.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	1080
				1081
				1082	Write and send the report
				1083	-------------------------
				1084
				1085	*Start to compile the report by writing a detailed description about the
				1086	issue. Always mention a few things: the latest kernel version you installed
				1087	for reproducing, the Linux Distribution used, and your notes on how to
				1088	reproduce the issue. Ideally, make the kernel's build configuration
				1089	(.config) and the output from ``dmesg`` available somewhere on the net and
				1090	link to it. Include or upload all other information that might be relevant,
				1091	like the output/screenshot of an Oops or the output from ``lspci``. Once
				1092	you wrote this main part, insert a normal length paragraph on top of it
				1093	outlining the issue and the impact quickly. On top of this add one sentence
				1094	that briefly describes the problem and gets people to read on. Now give the
				1095	thing a descriptive title or subject that yet again is shorter. Then you're
				1096	ready to send or file the report like the MAINTAINERS file told you, unless
				1097	you are dealing with one of those 'issues of high priority': they need
				1098	special care which is explained in 'Special handling for high priority
				1099	issues' below.*
				1100
				1101	Now that you have prepared everything it's time to write your report. How to do
				1102	that is partly explained by the three documents linked to in the preface above.
				1103	That's why this text will only mention a few of the essentials as well as
				1104	things specific to the Linux kernel.
				1105
				1106	There is one thing that fits both categories: the most crucial parts of your
				1107	report are the title/subject, the first sentence, and the first paragraph.
				1108	Developers often get quite a lot of mail. They thus often just take a few
				1109	seconds to skim a mail before deciding to move on or look closer. Thus: the
				1110	better the top section of your report, the higher are the chances that someone
				1111	will look into it and help you. And that is why you should ignore them for now
				1112	and write the detailed report first. ;-)
				1113
				1114	Things each report should mention
				1115	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				1116
				1117	Describe in detail how your issue happens with the fresh vanilla kernel you
				1118	installed. Try to include the step-by-step instructions you wrote and optimized
				1119	earlier that outline how you and ideally others can reproduce the issue; in
				1120	those rare cases where that's impossible try to describe what you did to
				1121	trigger it.
				1122
				1123	Also include all the relevant information others might need to understand the
				1124	issue and its environment. What's actually needed depends a lot on the issue,
				1125	but there are some things you should include always:
				1126
				1127	* the output from ``cat /proc/version``, which contains the Linux kernel
				1128	version number and the compiler it was built with.
				1129
				1130	* the Linux distribution the machine is running (``hostnamectl \| grep
				1131	"Operating System"``)
				1132
				1133	* the architecture of the CPU and the operating system (``uname -mi``)
				1134
				1135	* if you are dealing with a regression and performed a bisection, mention the
				1136	subject and the commit-id of the change that is causing it.
				1137
				1138	In a lot of cases it's also wise to make two more things available to those
				1139	that read your report:
				1140
				1141	* the configuration used for building your Linux kernel (the '.config' file)
				1142
				1143	* the kernel's messages that you get from ``dmesg`` written to a file. Make
				1144	sure that it starts with a line like 'Linux version 5.8-1
				1145	(foobar@example.com) (gcc (GCC) 10.2.1, GNU ld version 2.34) #1 SMP Mon Aug
				1146	3 14:54:37 UTC 2020' If it's missing, then important messages from the first
				1147	boot phase already got discarded. In this case instead consider using
				1148	``journalctl -b 0 -k``; alternatively you can also reboot, reproduce the
				1149	issue and call ``dmesg`` right afterwards.
				1150
				1151	These two files are big, that's why it's a bad idea to put them directly into
				1152	your report. If you are filing the issue in a bug tracker then attach them to
				1153	the ticket. If you report the issue by mail do not attach them, as that makes
				1154	the mail too large; instead do one of these things:
				1155
				1156	* Upload the files somewhere public (your website, a public file paste
				1157	service, a ticket created just for this purpose on `bugzilla.kernel.org
				1158	<https://bugzilla.kernel.org/>`_, ...) and include a link to them in your
				1159	report. Ideally use something where the files stay available for years, as
				1160	they could be useful to someone many years from now; this for example can
				1161	happen if five or ten years from now a developer works on some code that was
				1162	changed just to fix your issue.
				1163
				1164	* Put the files aside and mention you will send them later in individual
				1165	replies to your own mail. Just remember to actually do that once the report
				1166	went out. ;-)
				1167
				1168	Things that might be wise to provide
				1169	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				1170
				1171	Depending on the issue you might need to add more background data. Here are a
				1172	few suggestions what often is good to provide:
				1173
				1174	* If you are dealing with a 'warning', an 'OOPS' or a 'panic' from the kernel,
				1175	include it. If you can't copy'n'paste it, try to capture a netconsole trace
				1176	or at least take a picture of the screen.
				1177
				1178	* If the issue might be related to your computer hardware, mention what kind
				1179	of system you use. If you for example have problems with your graphics card,
				1180	mention its manufacturer, the card's model, and what chip is uses. If it's a
				1181	laptop mention its name, but try to make sure it's meaningful. 'Dell XPS 13'
				1182	for example is not, because it might be the one from 2012; that one looks
				1183	not that different from the one sold today, but apart from that the two have
				1184	nothing in common. Hence, in such cases add the exact model number, which
				1185	for example are '9380' or '7390' for XPS 13 models introduced during 2019.
				1186	Names like 'Lenovo Thinkpad T590' are also somewhat ambiguous: there are
				1187	variants of this laptop with and without a dedicated graphics chip, so try
				1188	to find the exact model name or specify the main components.
				1189
				1190	* Mention the relevant software in use. If you have problems with loading
				1191	modules, you want to mention the versions of kmod, systemd, and udev in use.
				1192	If one of the DRM drivers misbehaves, you want to state the versions of
				1193	libdrm and Mesa; also specify your Wayland compositor or the X-Server and
				1194	its driver. If you have a filesystem issue, mention the version of
				1195	corresponding filesystem utilities (e2fsprogs, btrfs-progs, xfsprogs, ...).
				1196
				1197	* Gather additional information from the kernel that might be of interest. The
				1198	output from ``lspci -nn`` will for example help others to identify what
				1199	hardware you use. If you have a problem with hardware you even might want to
				1200	make the output from ``sudo lspci -vvv`` available, as that provides
				1201	insights how the components were configured. For some issues it might be
				1202	good to include the contents of files like ``/proc/cpuinfo``,
				1203	``/proc/ioports``, ``/proc/iomem``, ``/proc/modules``, or
				1204	``/proc/scsi/scsi``. Some subsystem also offer tools to collect relevant
				1205	information. One such tool is ``alsa-info.sh`` `which the audio/sound
				1206	subsystem developers provide <https://www.alsa-project.org/wiki/AlsaInfo>`_.
				1207
				1208	Those examples should give your some ideas of what data might be wise to
				1209	attach, but you have to think yourself what will be helpful for others to know.
				1210	Don't worry too much about forgetting something, as developers will ask for
				1211	additional details they need. But making everything important available from
				1212	the start increases the chance someone will take a closer look.
				1213
				1214
				1215	The important part: the head of your report
				1216	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				1217
				1218	Now that you have the detailed part of the report prepared let's get to the
				1219	most important section: the first few sentences. Thus go to the top, add
				1220	something like 'The detailed description:' before the part you just wrote and
				1221	insert two newlines at the top. Now write one normal length paragraph that
				1222	describes the issue roughly. Leave out all boring details and focus on the
				1223	crucial parts readers need to know to understand what this is all about; if you
				1224	think this bug affects a lot of users, mention this to get people interested.
				1225
				1226	Once you did that insert two more lines at the top and write a one sentence
				1227	summary that explains quickly what the report is about. After that you have to
				1228	get even more abstract and write an even shorter subject/title for the report.
				1229
				1230	Now that you have written this part take some time to optimize it, as it is the
				1231	most important parts of your report: a lot of people will only read this before
				1232	they decide if reading the rest is time well spent.
				1233
				1234	Now send or file the report like the :ref:`MAINTAINERS <maintainers>` file told
				1235	you, unless it's one of those 'issues of high priority' outlined earlier: in
				1236	that case please read the next subsection first before sending the report on
				1237	its way.
				1238
				1239	Special handling for high priority issues
				1240	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				1241
				1242	Reports for high priority issues need special handling.
				1243
Thorsten Leemhuis	58c5394	2021-03-30 16:13:07 +0200	[diff] [blame]	1244	Severe issues: make sure the subject or ticket title as well as the first
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	1245	paragraph makes the severeness obvious.
				1246
Thorsten Leemhuis	6161a4b	2021-04-09 13:47:24 +0200	[diff] [blame]	1247	Regressions: make the report's subject start with '[REGRESSION]'.
				1248
				1249	In case you performed a successful bisection, use the title of the change that
				1250	introduced the regression as the second part of your subject. Make the report
Mauro Carvalho Chehab	349660e	2021-06-16 08:55:07 +0200	[diff] [blame]	1251	also mention the commit id of the culprit. In case of an unsuccessful bisection,
Thorsten Leemhuis	6161a4b	2021-04-09 13:47:24 +0200	[diff] [blame]	1252	make your report mention the latest tested version that's working fine (say 5.7)
				1253	and the oldest where the issue occurs (say 5.8-rc1).
				1254
				1255	When sending the report by mail, CC the Linux regressions mailing list
				1256	(regressions@lists.linux.dev). In case the report needs to be filed to some web
Thorsten Leemhuis	0043f0b	2021-04-15 12:29:14 +0200	[diff] [blame]	1257	tracker, proceed to do so. Once filed, forward the report by mail to the
				1258	regressions list; CC the maintainer and the mailing list for the subsystem in
				1259	question. Make sure to inline the forwarded report, hence do not attach it.
				1260	Also add a short note at the top where you mention the URL to the ticket.
Thorsten Leemhuis	6161a4b	2021-04-09 13:47:24 +0200	[diff] [blame]	1261
				1262	When mailing or forwarding the report, in case of a successful bisection add the
				1263	author of the culprit to the recipients; also CC everyone in the signed-off-by
				1264	chain, which you find at the end of its commit message.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	1265
				1266	Security issues: for these issues your will have to evaluate if a
				1267	short-term risk to other users would arise if details were publicly disclosed.
				1268	If that's not the case simply proceed with reporting the issue as described.
				1269	For issues that bear such a risk you will need to adjust the reporting process
				1270	slightly:
				1271
				1272	* If the MAINTAINERS file instructed you to report the issue by mail, do not
				1273	CC any public mailing lists.
				1274
				1275	* If you were supposed to file the issue in a bug tracker make sure to mark
				1276	the ticket as 'private' or 'security issue'. If the bug tracker does not
				1277	offer a way to keep reports private, forget about it and send your report as
				1278	a private mail to the maintainers instead.
				1279
				1280	In both cases make sure to also mail your report to the addresses the
				1281	MAINTAINERS file lists in the section 'security contact'. Ideally directly CC
				1282	them when sending the report by mail. If you filed it in a bug tracker, forward
				1283	the report's text to these addresses; but on top of it put a small note where
				1284	you mention that you filed it with a link to the ticket.
				1285
				1286	See 'Documentation/admin-guide/security-bugs.rst' for more information.
				1287
				1288
				1289	Duties after the report went out
				1290	--------------------------------
				1291
				1292	*Wait for reactions and keep the thing rolling until you can accept the
				1293	outcome in one way or the other. Thus react publicly and in a timely manner
				1294	to any inquiries. Test proposed fixes. Do proactive testing: retest with at
				1295	least every first release candidate (RC) of a new mainline version and
				1296	report your results. Send friendly reminders if things stall. And try to
				1297	help yourself, if you don't get any help or if it's unsatisfying.*
				1298
				1299	If your report was good and you are really lucky then one of the developers
				1300	might immediately spot what's causing the issue; they then might write a patch
				1301	to fix it, test it, and send it straight for integration in mainline while
				1302	tagging it for later backport to stable and longterm kernels that need it. Then
				1303	all you need to do is reply with a 'Thank you very much' and switch to a version
				1304	with the fix once it gets released.
				1305
				1306	But this ideal scenario rarely happens. That's why the job is only starting
				1307	once you got the report out. What you'll have to do depends on the situations,
				1308	but often it will be the things listed below. But before digging into the
				1309	details, here are a few important things you need to keep in mind for this part
				1310	of the process.
				1311
				1312
				1313	General advice for further interactions
				1314	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				1315
				1316	Always reply in public: When you filed the issue in a bug tracker, always
				1317	reply there and do not contact any of the developers privately about it. For
				1318	mailed reports always use the 'Reply-all' function when replying to any mails
				1319	you receive. That includes mails with any additional data you might want to add
				1320	to your report: go to your mail applications 'Sent' folder and use 'reply-all'
				1321	on your mail with the report. This approach will make sure the public mailing
				1322	list(s) and everyone else that gets involved over time stays in the loop; it
				1323	also keeps the mail thread intact, which among others is really important for
				1324	mailing lists to group all related mails together.
				1325
				1326	There are just two situations where a comment in a bug tracker or a 'Reply-all'
				1327	is unsuitable:
				1328
				1329	* Someone tells you to send something privately.
				1330
				1331	* You were told to send something, but noticed it contains sensitive
				1332	information that needs to be kept private. In that case it's okay to send it
				1333	in private to the developer that asked for it. But note in the ticket or a
				1334	mail that you did that, so everyone else knows you honored the request.
				1335
				1336	Do research before asking for clarifications or help: In this part of the
				1337	process someone might tell you to do something that requires a skill you might
				1338	not have mastered yet. For example, you might be asked to use some test tools
				1339	you never have heard of yet; or you might be asked to apply a patch to the
				1340	Linux kernel sources to test if it helps. In some cases it will be fine sending
				1341	a reply asking for instructions how to do that. But before going that route try
				1342	to find the answer own your own by searching the internet; alternatively
Thorsten Leemhuis	613f969	2021-03-19 20:27:45 +0100	[diff] [blame]	1343	consider asking in other places for advice. For example ask a friend or post
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	1344	about it to a chatroom or forum you normally hang out.
				1345
				1346	Be patient: If you are really lucky you might get a reply to your report
				1347	within a few hours. But most of the time it will take longer, as maintainers
				1348	are scattered around the globe and thus might be in a different time zone – one
				1349	where they already enjoy their night away from keyboard.
				1350
				1351	In general, kernel developers will take one to five business days to respond to
				1352	reports. Sometimes it will take longer, as they might be busy with the merge
				1353	windows, other work, visiting developer conferences, or simply enjoying a long
				1354	summer holiday.
				1355
				1356	The 'issues of high priority' (see above for an explanation) are an exception
				1357	here: maintainers should address them as soon as possible; that's why you
				1358	should wait a week at maximum (or just two days if it's something urgent)
				1359	before sending a friendly reminder.
				1360
				1361	Sometimes the maintainer might not be responding in a timely manner; other
				1362	times there might be disagreements, for example if an issue qualifies as
				1363	regression or not. In such cases raise your concerns on the mailing list and
				1364	ask others for public or private replies how to move on. If that fails, it
				1365	might be appropriate to get a higher authority involved. In case of a WiFi
				1366	driver that would be the wireless maintainers; if there are no higher level
				1367	maintainers or all else fails, it might be one of those rare situations where
				1368	it's okay to get Linus Torvalds involved.
				1369
				1370	Proactive testing: Every time the first pre-release (the 'rc1') of a new
				1371	mainline kernel version gets released, go and check if the issue is fixed there
				1372	or if anything of importance changed. Mention the outcome in the ticket or in a
				1373	mail you sent as reply to your report (make sure it has all those in the CC
				1374	that up to that point participated in the discussion). This will show your
				1375	commitment and that you are willing to help. It also tells developers if the
				1376	issue persists and makes sure they do not forget about it. A few other
				1377	occasional retests (for example with rc3, rc5 and the final) are also a good
				1378	idea, but only report your results if something relevant changed or if you are
				1379	writing something anyway.
				1380
				1381	With all these general things off the table let's get into the details of how
				1382	to help to get issues resolved once they were reported.
				1383
				1384	Inquires and testing request
				1385	~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				1386
				1387	Here are your duties in case you got replies to your report:
				1388
				1389	Check who you deal with: Most of the time it will be the maintainer or a
				1390	developer of the particular code area that will respond to your report. But as
				1391	issues are normally reported in public it could be anyone that's replying —
				1392	including people that want to help, but in the end might guide you totally off
				1393	track with their questions or requests. That rarely happens, but it's one of
				1394	many reasons why it's wise to quickly run an internet search to see who you're
				1395	interacting with. By doing this you also get aware if your report was heard by
				1396	the right people, as a reminder to the maintainer (see below) might be in order
				1397	later if discussion fades out without leading to a satisfying solution for the
				1398	issue.
				1399
				1400	Inquiries for data: Often you will be asked to test something or provide
				1401	additional details. Try to provide the requested information soon, as you have
				1402	the attention of someone that might help and risk losing it the longer you
				1403	wait; that outcome is even likely if you do not provide the information within
				1404	a few business days.
				1405
				1406	Requests for testing: When you are asked to test a diagnostic patch or a
				1407	possible fix, try to test it in timely manner, too. But do it properly and make
				1408	sure to not rush it: mixing things up can happen easily and can lead to a lot
				1409	of confusion for everyone involved. A common mistake for example is thinking a
				1410	proposed patch with a fix was applied, but in fact wasn't. Things like that
				1411	happen even to experienced testers occasionally, but they most of the time will
				1412	notice when the kernel with the fix behaves just as one without it.
				1413
				1414	What to do when nothing of substance happens
				1415	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				1416
				1417	Some reports will not get any reaction from the responsible Linux kernel
				1418	developers; or a discussion around the issue evolved, but faded out with
				1419	nothing of substance coming out of it.
				1420
				1421	In these cases wait two (better: three) weeks before sending a friendly
				1422	reminder: maybe the maintainer was just away from keyboard for a while when
				1423	your report arrived or had something more important to take care of. When
				1424	writing the reminder, kindly ask if anything else from your side is needed to
				1425	get the ball running somehow. If the report got out by mail, do that in the
				1426	first lines of a mail that is a reply to your initial mail (see above) which
				1427	includes a full quote of the original report below: that's on of those few
				1428	situations where such a 'TOFU' (Text Over, Fullquote Under) is the right
				1429	approach, as then all the recipients will have the details at hand immediately
				1430	in the proper order.
				1431
				1432	After the reminder wait three more weeks for replies. If you still don't get a
				1433	proper reaction, you first should reconsider your approach. Did you maybe try
				1434	to reach out to the wrong people? Was the report maybe offensive or so
				1435	confusing that people decided to completely stay away from it? The best way to
				1436	rule out such factors: show the report to one or two people familiar with FLOSS
				1437	issue reporting and ask for their opinion. Also ask them for their advice how
				1438	to move forward. That might mean: prepare a better report and make those people
				1439	review it before you send it out. Such an approach is totally fine; just
				1440	mention that this is the second and improved report on the issue and include a
				1441	link to the first report.
				1442
				1443	If the report was proper you can send a second reminder; in it ask for advice
				1444	why the report did not get any replies. A good moment for this second reminder
				1445	mail is shortly after the first pre-release (the 'rc1') of a new Linux kernel
				1446	version got published, as you should retest and provide a status update at that
				1447	point anyway (see above).
				1448
				1449	If the second reminder again results in no reaction within a week, try to
				1450	contact a higher-level maintainer asking for advice: even busy maintainers by
				1451	then should at least have sent some kind of acknowledgment.
				1452
				1453	Remember to prepare yourself for a disappointment: maintainers ideally should
				1454	react somehow to every issue report, but they are only obliged to fix those
				1455	'issues of high priority' outlined earlier. So don't be too devastating if you
				1456	get a reply along the lines of 'thanks for the report, I have more important
				1457	issues to deal with currently and won't have time to look into this for the
				1458	foreseeable future'.
				1459
				1460	It's also possible that after some discussion in the bug tracker or on a list
				1461	nothing happens anymore and reminders don't help to motivate anyone to work out
				1462	a fix. Such situations can be devastating, but is within the cards when it
				1463	comes to Linux kernel development. This and several other reasons for not
				1464	getting help are explained in 'Why some issues won't get any reaction or remain
				1465	unfixed after being reported' near the end of this document.
				1466
				1467	Don't get devastated if you don't find any help or if the issue in the end does
				1468	not get solved: the Linux kernel is FLOSS and thus you can still help yourself.
				1469	You for example could try to find others that are affected and team up with
				1470	them to get the issue resolved. Such a team could prepare a fresh report
				1471	together that mentions how many you are and why this is something that in your
				1472	option should get fixed. Maybe together you can also narrow down the root cause
				1473	or the change that introduced a regression, which often makes developing a fix
				1474	easier. And with a bit of luck there might be someone in the team that knows a
				1475	bit about programming and might be able to write a fix.
				1476
				1477
Thorsten Leemhuis	58c5394	2021-03-30 16:13:07 +0200	[diff] [blame]	1478	Reference for "Reporting regressions within a stable and longterm kernel line"
				1479	------------------------------------------------------------------------------
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	1480
Thorsten Leemhuis	58c5394	2021-03-30 16:13:07 +0200	[diff] [blame]	1481	This subsection provides details for the steps you need to perform if you face
				1482	a regression within a stable and longterm kernel line.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	1483
				1484	Make sure the particular version line still gets support
				1485	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				1486
				1487	*Check if the kernel developers still maintain the Linux kernel version
				1488	line you care about: go to the front page of kernel.org and make sure it
				1489	mentions the latest release of the particular version line without an
				1490	'[EOL]' tag.*
				1491
				1492	Most kernel version lines only get supported for about three months, as
				1493	maintaining them longer is quite a lot of work. Hence, only one per year is
				1494	chosen and gets supported for at least two years (often six). That's why you
				1495	need to check if the kernel developers still support the version line you care
				1496	for.
				1497
Thorsten Leemhuis	58c5394	2021-03-30 16:13:07 +0200	[diff] [blame]	1498	Note, if kernel.org lists two stable version lines on the front page, you
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	1499	should consider switching to the newer one and forget about the older one:
				1500	support for it is likely to be abandoned soon. Then it will get a "end-of-life"
				1501	(EOL) stamp. Version lines that reached that point still get mentioned on the
				1502	kernel.org front page for a week or two, but are unsuitable for testing and
				1503	reporting.
				1504
				1505	Search stable mailing list
				1506	~~~~~~~~~~~~~~~~~~~~~~~~~~
				1507
				1508	Check the archives of the Linux stable mailing list for existing reports.
				1509
				1510	Maybe the issue you face is already known and was fixed or is about to. Hence,
				1511	`search the archives of the Linux stable mailing list
				1512	<https://lore.kernel.org/stable/>`_ for reports about an issue like yours. If
				1513	you find any matches, consider joining the discussion, unless the fix is
				1514	already finished and scheduled to get applied soon.
				1515
				1516	Reproduce issue with the newest release
				1517	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				1518
				1519	*Install the latest release from the particular version line as a vanilla
				1520	kernel. Ensure this kernel is not tainted and still shows the problem, as
Thorsten Leemhuis	58c5394	2021-03-30 16:13:07 +0200	[diff] [blame]	1521	the issue might have already been fixed there. If you first noticed the
				1522	problem with a vendor kernel, check a vanilla build of the last version
				1523	known to work performs fine as well.*
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	1524
				1525	Before investing any more time in this process you want to check if the issue
				1526	was already fixed in the latest release of version line you're interested in.
				1527	This kernel needs to be vanilla and shouldn't be tainted before the issue
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	1528	happens, as detailed outlined already above in the section "Install a fresh
				1529	kernel for testing".
				1530
Thorsten Leemhuis	58c5394	2021-03-30 16:13:07 +0200	[diff] [blame]	1531	Did you first notice the regression with a vendor kernel? Then changes the
				1532	vendor applied might be interfering. You need to rule that out by performing
				1533	a recheck. Say something broke when you updated from 5.10.4-vendor.42 to
				1534	5.10.5-vendor.43. Then after testing the latest 5.10 release as outlined in
				1535	the previous paragraph check if a vanilla build of Linux 5.10.4 works fine as
				1536	well. If things are broken there, the issue does not qualify as upstream
				1537	regression and you need switch back to the main step-by-step guide to report
				1538	the issue.
				1539
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	1540	Report the regression
				1541	~~~~~~~~~~~~~~~~~~~~~
				1542
Thorsten Leemhuis	58c5394	2021-03-30 16:13:07 +0200	[diff] [blame]	1543	*Send a short problem report to the Linux stable mailing list
Thorsten Leemhuis	6161a4b	2021-04-09 13:47:24 +0200	[diff] [blame]	1544	(stable@vger.kernel.org) and CC the Linux regressions mailing list
Thorsten Leemhuis	0043f0b	2021-04-15 12:29:14 +0200	[diff] [blame]	1545	(regressions@lists.linux.dev); if you suspect the cause in a particular
				1546	subsystem, CC its maintainer and its mailing list. Roughly describe the
				1547	issue and ideally explain how to reproduce it. Mention the first version
				1548	that shows the problem and the last version that's working fine. Then
				1549	wait for further instructions.*
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	1550
				1551	When reporting a regression that happens within a stable or longterm kernel
				1552	line (say when updating from 5.10.4 to 5.10.5) a brief report is enough for
Thorsten Leemhuis	0043f0b	2021-04-15 12:29:14 +0200	[diff] [blame]	1553	the start to get the issue reported quickly. Hence a rough description to the
				1554	stable and regressions mailing list is all it takes; but in case you suspect
				1555	the cause in a particular subsystem, CC its maintainers and its mailing list
				1556	as well, because that will speed things up.
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	1557
Thorsten Leemhuis	0043f0b	2021-04-15 12:29:14 +0200	[diff] [blame]	1558	And note, it helps developers a great deal if you can specify the exact version
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	1559	that introduced the problem. Hence if possible within a reasonable time frame,
				1560	try to find that version using vanilla kernels. Lets assume something broke when
				1561	your distributor released a update from Linux kernel 5.10.5 to 5.10.8. Then as
				1562	instructed above go and check the latest kernel from that version line, say
				1563	5.10.9. If it shows the problem, try a vanilla 5.10.5 to ensure that no patches
				1564	the distributor applied interfere. If the issue doesn't manifest itself there,
				1565	try 5.10.7 and then (depending on the outcome) 5.10.8 or 5.10.6 to find the
				1566	first version where things broke. Mention it in the report and state that 5.10.9
				1567	is still broken.
				1568
				1569	What the previous paragraph outlines is basically a rough manual 'bisection'.
				1570	Once your report is out your might get asked to do a proper one, as it allows to
				1571	pinpoint the exact change that causes the issue (which then can easily get
				1572	reverted to fix the issue quickly). Hence consider to do a proper bisection
				1573	right away if time permits. See the section 'Special care for regressions' and
				1574	the document 'Documentation/admin-guide/bug-bisect.rst' for details how to
Thorsten Leemhuis	0043f0b	2021-04-15 12:29:14 +0200	[diff] [blame]	1575	perform one. In case of a successful bisection add the author of the culprit to
				1576	the recipients; also CC everyone in the signed-off-by chain, which you find at
				1577	the end of its commit message.
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	1578
				1579
Thorsten Leemhuis	58c5394	2021-03-30 16:13:07 +0200	[diff] [blame]	1580	Reference for "Reporting issues only occurring in older kernel version lines"
				1581	-----------------------------------------------------------------------------
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	1582
Thorsten Leemhuis	58c5394	2021-03-30 16:13:07 +0200	[diff] [blame]	1583	This section provides details for the steps you need to take if you could not
Thorsten Leemhuis	4b9d49d	2021-03-19 20:27:49 +0100	[diff] [blame]	1584	reproduce your issue with a mainline kernel, but want to see it fixed in older
				1585	version lines (aka stable and longterm kernels).
				1586
				1587	Some fixes are too complex
				1588	~~~~~~~~~~~~~~~~~~~~~~~~~~
				1589
				1590	*Prepare yourself for the possibility that going through the next few steps
				1591	might not get the issue solved in older releases: the fix might be too big
				1592	or risky to get backported there.*
				1593
				1594	Even small and seemingly obvious code-changes sometimes introduce new and
				1595	totally unexpected problems. The maintainers of the stable and longterm kernels
				1596	are very aware of that and thus only apply changes to these kernels that are
				1597	within rules outlined in 'Documentation/process/stable-kernel-rules.rst'.
				1598
				1599	Complex or risky changes for example do not qualify and thus only get applied
				1600	to mainline. Other fixes are easy to get backported to the newest stable and
				1601	longterm kernels, but too risky to integrate into older ones. So be aware the
				1602	fix you are hoping for might be one of those that won't be backported to the
				1603	version line your care about. In that case you'll have no other choice then to
				1604	live with the issue or switch to a newer Linux version, unless you want to
				1605	patch the fix into your kernels yourself.
				1606
				1607	Common preparations
				1608	~~~~~~~~~~~~~~~~~~~
				1609
				1610	*Perform the first three steps in the section "Reporting issues only
				1611	occurring in older kernel version lines" above.*
				1612
				1613	You need to carry out a few steps already described in another section of this
				1614	guide. Those steps will let you:
				1615
				1616	* Check if the kernel developers still maintain the Linux kernel version line
				1617	you care about.
				1618
				1619	* Search the Linux stable mailing list for exiting reports.
				1620
				1621	* Check with the latest release.
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	1622
Thorsten Leemhuis	9bc4430	2021-03-19 20:27:48 +0100	[diff] [blame]	1623
				1624	Check code history and search for existing discussions
				1625	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				1626
				1627	*Search the Linux kernel version control system for the change that fixed
				1628	the issue in mainline, as its commit message might tell you if the fix is
				1629	scheduled for backporting already. If you don't find anything that way,
				1630	search the appropriate mailing lists for posts that discuss such an issue
				1631	or peer-review possible fixes; then check the discussions if the fix was
				1632	deemed unsuitable for backporting. If backporting was not considered at
				1633	all, join the newest discussion, asking if it's in the cards.*
				1634
				1635	In a lot of cases the issue you deal with will have happened with mainline, but
				1636	got fixed there. The commit that fixed it would need to get backported as well
				1637	to get the issue solved. That's why you want to search for it or any
				1638	discussions abound it.
				1639
				1640	* First try to find the fix in the Git repository that holds the Linux kernel
				1641	sources. You can do this with the web interfaces `on kernel.org
				1642	<https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/>`_
				1643	or its mirror `on GitHub <https://github.com/torvalds/linux>`_; if you have
				1644	a local clone you alternatively can search on the command line with ``git
				1645	log --grep=<pattern>``.
				1646
				1647	If you find the fix, look if the commit message near the end contains a
				1648	'stable tag' that looks like this:
				1649
				1650	Cc: <stable@vger.kernel.org> # 5.4+
				1651
				1652	If that's case the developer marked the fix safe for backporting to version
				1653	line 5.4 and later. Most of the time it's getting applied there within two
				1654	weeks, but sometimes it takes a bit longer.
				1655
				1656	* If the commit doesn't tell you anything or if you can't find the fix, look
				1657	again for discussions about the issue. Search the net with your favorite
				1658	internet search engine as well as the archives for the `Linux kernel
				1659	developers mailing list <https://lore.kernel.org/lkml/>`_. Also read the
				1660	section `Locate kernel area that causes the issue` above and follow the
				1661	instructions to find the subsystem in question: its bug tracker or mailing
				1662	list archive might have the answer you are looking for.
				1663
				1664	* If you see a proposed fix, search for it in the version control system as
				1665	outlined above, as the commit might tell you if a backport can be expected.
				1666
				1667	* Check the discussions for any indicators the fix might be too risky to get
				1668	backported to the version line you care about. If that's the case you have
				1669	to live with the issue or switch to the kernel version line where the fix
				1670	got applied.
				1671
				1672	* If the fix doesn't contain a stable tag and backporting was not discussed,
				1673	join the discussion: mention the version where you face the issue and that
				1674	you would like to see it fixed, if suitable.
				1675
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	1676
				1677	Ask for advice
				1678	~~~~~~~~~~~~~~
				1679
				1680	*One of the former steps should lead to a solution. If that doesn't work
				1681	out, ask the maintainers for the subsystem that seems to be causing the
				1682	issue for advice; CC the mailing list for the particular subsystem as well
				1683	as the stable mailing list.*
				1684
				1685	If the previous three steps didn't get you closer to a solution there is only
				1686	one option left: ask for advice. Do that in a mail you sent to the maintainers
				1687	for the subsystem where the issue seems to have its roots; CC the mailing list
Thorsten Leemhuis	58c5394	2021-03-30 16:13:07 +0200	[diff] [blame]	1688	for the subsystem as well as the stable mailing list (stable@vger.kernel.org).
Thorsten Leemhuis	3e544d7	2020-12-04 07:43:49 +0100	[diff] [blame]	1689
				1690
				1691	Why some issues won't get any reaction or remain unfixed after being reported
				1692	=============================================================================
				1693
				1694	When reporting a problem to the Linux developers, be aware only 'issues of high
				1695	priority' (regressions, security issues, severe problems) are definitely going
				1696	to get resolved. The maintainers or if all else fails Linus Torvalds himself
				1697	will make sure of that. They and the other kernel developers will fix a lot of
				1698	other issues as well. But be aware that sometimes they can't or won't help; and
				1699	sometimes there isn't even anyone to send a report to.
				1700
				1701	This is best explained with kernel developers that contribute to the Linux
				1702	kernel in their spare time. Quite a few of the drivers in the kernel were
				1703	written by such programmers, often because they simply wanted to make their
				1704	hardware usable on their favorite operating system.
				1705
				1706	These programmers most of the time will happily fix problems other people
				1707	report. But nobody can force them to do, as they are contributing voluntarily.
				1708
				1709	Then there are situations where such developers really want to fix an issue,
				1710	but can't: sometimes they lack hardware programming documentation to do so.
				1711	This often happens when the publicly available docs are superficial or the
				1712	driver was written with the help of reverse engineering.
				1713
				1714	Sooner or later spare time developers will also stop caring for the driver.
				1715	Maybe their test hardware broke, got replaced by something more fancy, or is so
				1716	old that it's something you don't find much outside of computer museums
				1717	anymore. Sometimes developer stops caring for their code and Linux at all, as
				1718	something different in their life became way more important. In some cases
				1719	nobody is willing to take over the job as maintainer – and nobody can be forced
				1720	to, as contributing to the Linux kernel is done on a voluntary basis. Abandoned
				1721	drivers nevertheless remain in the kernel: they are still useful for people and
				1722	removing would be a regression.
				1723
				1724	The situation is not that different with developers that are paid for their
				1725	work on the Linux kernel. Those contribute most changes these days. But their
				1726	employers sooner or later also stop caring for their code or make its
				1727	programmer focus on other things. Hardware vendors for example earn their money
				1728	mainly by selling new hardware; quite a few of them hence are not investing
				1729	much time and energy in maintaining a Linux kernel driver for something they
				1730	stopped selling years ago. Enterprise Linux distributors often care for a
				1731	longer time period, but in new versions often leave support for old and rare
				1732	hardware aside to limit the scope. Often spare time contributors take over once
				1733	a company orphans some code, but as mentioned above: sooner or later they will
				1734	leave the code behind, too.
				1735
				1736	Priorities are another reason why some issues are not fixed, as maintainers
				1737	quite often are forced to set those, as time to work on Linux is limited.
				1738	That's true for spare time or the time employers grant their developers to
				1739	spend on maintenance work on the upstream kernel. Sometimes maintainers also
				1740	get overwhelmed with reports, even if a driver is working nearly perfectly. To
				1741	not get completely stuck, the programmer thus might have no other choice than
				1742	to prioritize issue reports and reject some of them.
				1743
				1744	But don't worry too much about all of this, a lot of drivers have active
				1745	maintainers who are quite interested in fixing as many issues as possible.
				1746
				1747
				1748	Closing words
				1749	=============
				1750
				1751	Compared with other Free/Libre & Open Source Software it's hard to report
				1752	issues to the Linux kernel developers: the length and complexity of this
				1753	document and the implications between the lines illustrate that. But that's how
				1754	it is for now. The main author of this text hopes documenting the state of the
				1755	art will lay some groundwork to improve the situation over time.
Thorsten Leemhuis	d2ce285	2021-03-30 16:13:04 +0200	[diff] [blame]	1756
				1757
				1758	..
				1759	This text is maintained by Thorsten Leemhuis <linux@leemhuis.info>. If you
				1760	spot a typo or small mistake, feel free to let him know directly and he'll
				1761	fix it. You are free to do the same in a mostly informal way if you want
				1762	to contribute changes to the text, but for copyright reasons please CC
				1763	linux-doc@vger.kernel.org and "sign-off" your contribution as
				1764	Documentation/process/submitting-patches.rst outlines in the section "Sign
				1765	your work - the Developer's Certificate of Origin".