Blame - Documentation/PCI/pci-error-recovery.rst - SHIFTPHONES/mainline/linux

blob: 187f43a032006ce65041c9ba971bc31b9b69ed17 [file] [log] [blame]

Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	1	.. SPDX-License-Identifier: GPL-2.0
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	2
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	3	==================
				4	PCI Error Recovery
				5	==================
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	6
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	7
				8	:Authors: - Linas Vepstas <linasvepstas@gmail.com>
				9	- Richard Lary <rlary@us.ibm.com>
				10	- Mike Mason <mmlnx@us.ibm.com>
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	11
				12
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	13	Many PCI bus controllers are able to detect a variety of hardware
				14	PCI errors on the bus, such as parity errors on the data and address
Cao jin	97e4e95	2017-03-21 21:24:18 +0800	[diff] [blame]	15	buses, as well as SERR and PERR errors. Some of the more advanced
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	16	chipsets are able to deal with these errors; these include PCI-E chipsets,
Mike Mason	fe14acd	2009-07-30 15:39:29 -0700	[diff] [blame]	17	and the PCI-host bridges found on IBM Power4, Power5 and Power6-based
				18	pSeries boxes. A typical action taken is to disconnect the affected device,
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	19	halting all I/O to it. The goal of a disconnection is to avoid system
				20	corruption; for example, to halt system memory corruption due to DMA's
				21	to "wild" addresses. Typically, a reconnection mechanism is also
				22	offered, so that the affected PCI device(s) are reset and put back
				23	into working condition. The reset phase requires coordination
				24	between the affected device drivers and the PCI controller chip.
				25	This document describes a generic API for notifying device drivers
				26	of a bus disconnection, and then performing error recovery.
				27	This API is currently implemented in the 2.6.16 and later kernels.
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	28
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	29	Reporting and recovery is performed in several steps. First, when
				30	a PCI hardware error has resulted in a bus disconnect, that event
				31	is reported as soon as possible to all affected device drivers,
				32	including multiple instances of a device driver on multi-function
				33	cards. This allows device drivers to avoid deadlocking in spinloops,
				34	waiting for some i/o-space register to change, when it never will.
				35	It also gives the drivers a chance to defer incoming I/O as
				36	needed.
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	37
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	38	Next, recovery is performed in several stages. Most of the complexity
				39	is forced by the need to handle multi-function devices, that is,
				40	devices that have multiple device drivers associated with them.
				41	In the first stage, each driver is allowed to indicate what type
				42	of reset it desires, the choices being a simple re-enabling of I/O
Mike Mason	fe14acd	2009-07-30 15:39:29 -0700	[diff] [blame]	43	or requesting a slot reset.
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	44
Mike Mason	fe14acd	2009-07-30 15:39:29 -0700	[diff] [blame]	45	If any driver requests a slot reset, that is what will be done.
				46
				47	After a reset and/or a re-enabling of I/O, all drivers are
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	48	again notified, so that they may then perform any device setup/config
				49	that may be required. After these have all completed, a final
				50	"resume normal operations" event is sent out.
				51
				52	The biggest reason for choosing a kernel-based implementation rather
				53	than a user-space implementation was the need to deal with bus
				54	disconnects of PCI devices attached to storage media, and, in particular,
				55	disconnects from devices holding the root file system. If the root
				56	file system is disconnected, a user-space mechanism would have to go
				57	through a large number of contortions to complete recovery. Almost all
				58	of the current Linux file systems are not tolerant of disconnection
				59	from/reconnection to their underlying block device. By contrast,
				60	bus errors are easy to manage in the device driver. Indeed, most
				61	device drivers already handle very similar recovery procedures;
				62	for example, the SCSI-generic layer already provides significant
				63	mechanisms for dealing with SCSI bus errors and SCSI bus resets.
				64
				65
				66	Detailed Design
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	67	===============
				68
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	69	Design and implementation details below, based on a chain of
				70	public email discussions with Ben Herrenschmidt, circa 5 April 2005.
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	71
				72	The error recovery API support is exposed to the driver in the form of
				73	a structure of function pointers pointed to by a new field in struct
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	74	pci_driver. A driver that fails to provide the structure is "non-aware",
				75	and the actual recovery steps taken are platform dependent. The
				76	arch/powerpc implementation will simulate a PCI hotplug remove/add.
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	77
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	78	This structure has the form::
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	79
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	80	struct pci_error_handlers
				81	{
Luc Van Oostenryck	16d79cd	2020-07-02 18:26:49 +0200	[diff] [blame]	82	int (error_detected)(struct pci_dev dev, pci_channel_state_t);
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	83	int (mmio_enabled)(struct pci_dev dev);
				84	int (slot_reset)(struct pci_dev dev);
				85	void (resume)(struct pci_dev dev);
				86	};
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	87
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	88	The possible channel states are::
				89
Luc Van Oostenryck	16d79cd	2020-07-02 18:26:49 +0200	[diff] [blame]	90	typedef enum {
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	91	pci_channel_io_normal, /* I/O channel is in normal state */
				92	pci_channel_io_frozen, /* I/O to channel is blocked */
				93	pci_channel_io_perm_failure, /* PCI card is dead */
Luc Van Oostenryck	16d79cd	2020-07-02 18:26:49 +0200	[diff] [blame]	94	} pci_channel_state_t;
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	95
				96	Possible return values are::
				97
				98	enum pci_ers_result {
				99	PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */
				100	PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */
				101	PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
				102	PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
				103	PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
				104	};
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	105
				106	A driver does not have to implement all of these callbacks; however,
				107	if it implements any, it must implement error_detected(). If a callback
				108	is not implemented, the corresponding feature is considered unsupported.
				109	For example, if mmio_enabled() and resume() aren't there, then it
				110	is assumed that the driver is not doing any direct recovery and requires
Michael S. Tsirkin	2fd260f	2017-01-24 19:35:56 +0200	[diff] [blame]	111	a slot reset. Typically a driver will want to know about
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	112	a slot_reset().
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	113
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	114	The actual steps taken by a platform to recover from a PCI error
				115	event will be platform-dependent, but will follow the general
				116	sequence described below.
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	117
Keith Busch	bdb5ac85	2018-09-20 10:27:12 -0600	[diff] [blame]	118	STEP 0: Error Event
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	119	-------------------
Mike Mason	fe14acd	2009-07-30 15:39:29 -0700	[diff] [blame]	120	A PCI bus error is detected by the PCI hardware. On powerpc, the slot
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	121	is isolated, in that all I/O is blocked: all reads return 0xffffffff,
				122	all writes are ignored.
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	123
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	124
				125	STEP 1: Notification
				126	--------------------
				127	Platform calls the error_detected() callback on every instance of
				128	every driver affected by the error.
				129
				130	At this point, the device might not be accessible anymore, depending on
				131	the platform (the slot will be isolated on powerpc). The driver may
				132	already have "noticed" the error because of a failing I/O, but this
				133	is the proper "synchronization point", that is, it gives the driver
				134	a chance to cleanup, waiting for pending stuff (timers, whatever, etc...)
				135	to complete; it can take semaphores, schedule, etc... everything but
				136	touch the device. Within this function and after it returns, the driver
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	137	shouldn't do any new IOs. Called in task context. This is sort of a
				138	"quiesce" point. See note about interrupts at the end of this doc.
				139
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	140	All drivers participating in this system must implement this call.
				141	The driver must return one of the following result codes:
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	142
				143	- PCI_ERS_RESULT_CAN_RECOVER
				144	Driver returns this if it thinks it might be able to recover
				145	the HW by just banging IOs or if it wants to be given
				146	a chance to extract some diagnostic information (see
				147	mmio_enable, below).
				148	- PCI_ERS_RESULT_NEED_RESET
				149	Driver returns this if it can't recover without a
				150	slot reset.
				151	- PCI_ERS_RESULT_DISCONNECT
				152	Driver returns this if it doesn't want to recover at all.
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	153
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	154	The next step taken will depend on the result codes returned by the
				155	drivers.
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	156
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	157	If all drivers on the segment/slot return PCI_ERS_RESULT_CAN_RECOVER,
				158	then the platform should re-enable IOs on the slot (or do nothing in
				159	particular, if the platform doesn't isolate slots), and recovery
				160	proceeds to STEP 2 (MMIO Enable).
				161
				162	If any driver requested a slot reset (by returning PCI_ERS_RESULT_NEED_RESET),
				163	then recovery proceeds to STEP 4 (Slot Reset).
				164
				165	If the platform is unable to recover the slot, the next step
				166	is STEP 6 (Permanent Failure).
				167
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	168	.. note::
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	169
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	170	The current powerpc implementation assumes that a device driver will
				171	not schedule or semaphore in this routine; the current powerpc
				172	implementation uses one kernel thread to notify all devices;
				173	thus, if one device sleeps/schedules, all devices are affected.
				174	Doing better requires complex multi-threaded logic in the error
				175	recovery implementation (e.g. waiting for all notification threads
				176	to "join" before proceeding with recovery.) This seems excessively
				177	complex and not worth implementing.
				178
				179	The current powerpc implementation doesn't much care if the device
				180	attempts I/O at this point, or not. I/O's will fail, returning
				181	a value of 0xff on read, and writes will be dropped. If more than
				182	EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH
				183	assumes that the device driver has gone into an infinite loop
				184	and prints an error to syslog. A reboot is then required to
				185	get the device working again.
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	186
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	187	STEP 2: MMIO Enabled
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	188	--------------------
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	189	The platform re-enables MMIO to the device (but typically not the
				190	DMA), and then calls the mmio_enabled() callback on all affected
				191	device drivers.
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	192
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	193	This is the "early recovery" call. IOs are allowed again, but DMA is
Mike Mason	fe14acd	2009-07-30 15:39:29 -0700	[diff] [blame]	194	not, with some restrictions. This is NOT a callback for the driver to
				195	start operations again, only to peek/poke at the device, extract diagnostic
				196	information, if any, and eventually do things like trigger a device local
				197	reset or some such, but not restart operations. This callback is made if
				198	all drivers on a segment agree that they can try to recover and if no automatic
				199	link reset was performed by the HW. If the platform can't just re-enable IOs
				200	without a slot reset or a link reset, it will not call this callback, and
				201	instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	202
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	203	.. note::
				204
				205	The following is proposed; no platform implements this yet:
				206	Proposal: All I/O's should be done _synchronously_ from within
				207	this callback, errors triggered by them will be returned via
				208	the normal pci_check_whatever() API, no new error_detected()
				209	callback will be issued due to an error happening here. However,
				210	such an error might cause IOs to be re-blocked for the whole
				211	segment, and thus invalidate the recovery that other devices
				212	on the same segment might have done, forcing the whole segment
				213	into one of the next states, that is, link reset or slot reset.
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	214
				215	The driver should return one of the following result codes:
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	216	- PCI_ERS_RESULT_RECOVERED
				217	Driver returns this if it thinks the device is fully
				218	functional and thinks it is ready to start
				219	normal driver operations again. There is no
				220	guarantee that the driver will actually be
				221	allowed to proceed, as another driver on the
				222	same segment might have failed and thus triggered a
				223	slot reset on platforms that support it.
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	224
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	225	- PCI_ERS_RESULT_NEED_RESET
				226	Driver returns this if it thinks the device is not
				227	recoverable in its current state and it needs a slot
				228	reset to proceed.
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	229
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	230	- PCI_ERS_RESULT_DISCONNECT
				231	Same as above. Total failure, no recovery even after
				232	reset driver dead. (To be defined more precisely)
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	233
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	234	The next step taken depends on the results returned by the drivers.
				235	If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform
				236	proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations).
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	237
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	238	If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform
				239	proceeds to STEP 4 (Slot Reset)
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	240
Keith Busch	bdb5ac85	2018-09-20 10:27:12 -0600	[diff] [blame]	241	STEP 3: Link Reset
				242	------------------
				243	The platform resets the link. This is a PCI-Express specific step
				244	and is done whenever a fatal error has been detected that can be
				245	"solved" by resetting the link.
				246
				247	STEP 4: Slot Reset
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	248	------------------
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	249
Mike Mason	fe14acd	2009-07-30 15:39:29 -0700	[diff] [blame]	250	In response to a return value of PCI_ERS_RESULT_NEED_RESET, the
Randy Dunlap	84520c0	2020-07-03 14:21:55 -0700	[diff] [blame]	251	platform will perform a slot reset on the requesting PCI device(s).
Mike Mason	fe14acd	2009-07-30 15:39:29 -0700	[diff] [blame]	252	The actual steps taken by a platform to perform a slot reset
				253	will be platform-dependent. Upon completion of slot reset, the
				254	platform will call the device slot_reset() callback.
				255
				256	Powerpc platforms implement two levels of slot reset:
				257	soft reset(default) and fundamental(optional) reset.
				258
				259	Powerpc soft reset consists of asserting the adapter #RST line and then
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	260	restoring the PCI BAR's and PCI configuration header to a state
				261	that is equivalent to what it would be after a fresh system
				262	power-on followed by power-on BIOS/system firmware initialization.
Mike Mason	fe14acd	2009-07-30 15:39:29 -0700	[diff] [blame]	263	Soft reset is also known as hot-reset.
				264
				265	Powerpc fundamental reset is supported by PCI Express cards only
				266	and results in device's state machines, hardware logic, port states and
				267	configuration registers to initialize to their default conditions.
				268
				269	For most PCI devices, a soft reset will be sufficient for recovery.
				270	Optional fundamental reset is provided to support a limited number
Cao jin	97e4e95	2017-03-21 21:24:18 +0800	[diff] [blame]	271	of PCI Express devices for which a soft reset is not sufficient
Mike Mason	fe14acd	2009-07-30 15:39:29 -0700	[diff] [blame]	272	for recovery.
				273
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	274	If the platform supports PCI hotplug, then the reset might be
				275	performed by toggling the slot electrical power off/on.
				276
				277	It is important for the platform to restore the PCI config space
				278	to the "fresh poweron" state, rather than the "last state". After
				279	a slot reset, the device driver will almost always use its standard
				280	device initialization routines, and an unusual config space setup
				281	may result in hung devices, kernel panics, or silent data corruption.
				282
				283	This call gives drivers the chance to re-initialize the hardware
				284	(re-download firmware, etc.). At this point, the driver may assume
Mike Mason	fe14acd	2009-07-30 15:39:29 -0700	[diff] [blame]	285	that the card is in a fresh state and is fully functional. The slot
				286	is unfrozen and the driver has full access to PCI config space,
				287	memory mapped I/O space and DMA. Interrupts (Legacy, MSI, or MSI-X)
				288	will also be available.
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	289
Mike Mason	fe14acd	2009-07-30 15:39:29 -0700	[diff] [blame]	290	Drivers should not restart normal I/O processing operations
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	291	at this point. If all device drivers report success on this
				292	callback, the platform will call resume() to complete the sequence,
				293	and let the driver restart normal I/O processing.
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	294
				295	A driver can still return a critical failure for this function if
				296	it can't get the device operational after reset. If the platform
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	297	previously tried a soft reset, it might now try a hard reset (power
Wesley Sheng	8e32379	2021-05-31 16:12:15 +0800	[diff] [blame]	298	cycle) and then call slot_reset() again. If the device still can't
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	299	be recovered, there is nothing more that can be done; the platform
				300	will typically report a "permanent failure" in such a case. The
				301	device will be considered "dead" in this case.
				302
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	303	Drivers for multi-function cards will need to coordinate among
				304	themselves as to which driver instance will perform any "one-shot"
				305	or global device initialization. For example, the Symbios sym53cxx2
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	306	driver performs device init only from PCI function 0::
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	307
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	308	+ if (PCI_FUNC(pdev->devfn) == 0)
				309	+ sym_reset_scsi_bus(np, 0);
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	310
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	311	Result codes:
				312	- PCI_ERS_RESULT_DISCONNECT
				313	Same as above.
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	314
Mike Mason	fe14acd	2009-07-30 15:39:29 -0700	[diff] [blame]	315	Drivers for PCI Express cards that require a fundamental reset must
Cao jin	97e4e95	2017-03-21 21:24:18 +0800	[diff] [blame]	316	set the needs_freset bit in the pci_dev structure in their probe function.
Mike Mason	fe14acd	2009-07-30 15:39:29 -0700	[diff] [blame]	317	For example, the QLogic qla2xxx driver sets the needs_freset bit for certain
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	318	PCI card types::
Mike Mason	fe14acd	2009-07-30 15:39:29 -0700	[diff] [blame]	319
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	320	+ /* Set EEH reset type to fundamental if required by hba */
				321	+ if (IS_QLA24XX(ha) \|\| IS_QLA25XX(ha) \|\| IS_QLA81XX(ha))
				322	+ pdev->needs_freset = 1;
				323	+
Mike Mason	fe14acd	2009-07-30 15:39:29 -0700	[diff] [blame]	324
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	325	Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent
				326	Failure).
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	327
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	328	.. note::
				329
				330	The current powerpc implementation does not try a power-cycle
				331	reset if the driver returned PCI_ERS_RESULT_DISCONNECT.
				332	However, it probably should.
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	333
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	334
Keith Busch	bdb5ac85	2018-09-20 10:27:12 -0600	[diff] [blame]	335	STEP 5: Resume Operations
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	336	-------------------------
				337	The platform will call the resume() callback on all affected device
				338	drivers if all drivers on the segment have returned
				339	PCI_ERS_RESULT_RECOVERED from one of the 3 previous callbacks.
				340	The goal of this callback is to tell the driver to restart activity,
				341	that everything is back and running. This callback does not return
				342	a result code.
				343
				344	At this point, if a new error happens, the platform will restart
				345	a new error recovery sequence.
				346
Keith Busch	bdb5ac85	2018-09-20 10:27:12 -0600	[diff] [blame]	347	STEP 6: Permanent Failure
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	348	-------------------------
				349	A "permanent failure" has occurred, and the platform cannot recover
				350	the device. The platform will call error_detected() with a
Luc Van Oostenryck	16d79cd	2020-07-02 18:26:49 +0200	[diff] [blame]	351	pci_channel_state_t value of pci_channel_io_perm_failure.
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	352
				353	The device driver should, at this point, assume the worst. It should
				354	cancel all pending I/O, refuse all new I/O, returning -EIO to
				355	higher layers. The device driver should then clean up all of its
				356	memory and remove itself from kernel operations, much as it would
				357	during system shutdown.
				358
				359	The platform will typically notify the system operator of the
				360	permanent failure in some way. If the device is hotplug-capable,
				361	the operator will probably want to remove and replace the device.
				362	Note, however, not all failures are truly "permanent". Some are
				363	caused by over-heating, some by a poorly seated card. Many
				364	PCI error events are caused by software bugs, e.g. DMA's to
				365	wild addresses or bogus split transactions due to programming
				366	errors. See the discussion in powerpc/eeh-pci-error-recovery.txt
				367	for additional detail on real-life experience of the causes of
				368	software errors.
				369
				370
				371	Conclusion; General Remarks
				372	---------------------------
Mike Mason	fe14acd	2009-07-30 15:39:29 -0700	[diff] [blame]	373	The way the callbacks are called is platform policy. A platform with
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	374	no slot reset capability may want to just "ignore" drivers that can't
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	375	recover (disconnect them) and try to let other cards on the same segment
				376	recover. Keep in mind that in most real life cases, though, there will
				377	be only one driver per segment.
				378
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	379	Now, a note about interrupts. If you get an interrupt and your
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	380	device is dead or has been isolated, there is a problem :)
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	381	The current policy is to turn this into a platform policy.
				382	That is, the recovery API only requires that:
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	383
				384	- There is no guarantee that interrupt delivery can proceed from any
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	385	device on the segment starting from the error detection and until the
				386	slot_reset callback is called, at which point interrupts are expected
				387	to be fully operational.
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	388
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	389	- There is no guarantee that interrupt delivery is stopped, that is,
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	390	a driver that gets an interrupt after detecting an error, or that detects
				391	an error within the interrupt handler such that it prevents proper
				392	ack'ing of the interrupt (and thus removal of the source) should just
				393	return IRQ_NOTHANDLED. It's up to the platform to deal with that
				394	condition, typically by masking the IRQ source during the duration of
				395	the error handling. It is expected that the platform "knows" which
				396	interrupts are routed to error-management capable slots and can deal
				397	with temporarily disabling that IRQ number during error processing (this
				398	isn't terribly complex). That means some IRQ latency for other devices
				399	sharing the interrupt, but there is simply no other way. High end
				400	platforms aren't supposed to share interrupts between many devices
				401	anyway :)
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	402
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	403	.. note::
linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	404
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	405	Implementation details for the powerpc platform are discussed in
Mauro Carvalho Chehab	4d2e26a	2019-04-10 08:32:42 -0300	[diff] [blame]	406	the file Documentation/powerpc/eeh-pci-error-recovery.rst
Linas Vepstas	c9ab8b6	2006-02-03 03:03:45 -0800	[diff] [blame]	407
Changbin Du	8a01fa6	2019-05-14 22:47:29 +0800	[diff] [blame]	408	As of this writing, there is a growing list of device drivers with
				409	patches implementing error recovery. Not all of these patches are in
				410	mainline yet. These may be used as "examples":
				411
				412	- drivers/scsi/ipr
				413	- drivers/scsi/sym53c8xx_2
				414	- drivers/scsi/qla2xxx
				415	- drivers/scsi/lpfc
				416	- drivers/next/bnx2.c
				417	- drivers/next/e100.c
				418	- drivers/net/e1000
				419	- drivers/net/e1000e
				420	- drivers/net/ixgb
				421	- drivers/net/ixgbe
				422	- drivers/net/cxgb3
				423	- drivers/net/s2io.c
Mauro Carvalho Chehab	4d2e26a	2019-04-10 08:32:42 -0300	[diff] [blame]	424
				425	The End
				426	-------