blob: c824cd7f6e322fb7d2002d387f5190b345d2d04c [file] [log] [blame]
Mauro Carvalho Chehabcc2a2d12019-06-12 14:53:01 -03001===========================
2HPE iLO NMI Watchdog Driver
3===========================
4
5for iLO based ProLiant Servers
6==============================
7
Jerry Hoemann18bd1962018-08-20 13:31:23 -06008Last reviewed: 08/20/2018
Thomas Mingarelli47bece82009-06-04 19:50:45 +00009
Thomas Mingarelli47bece82009-06-04 19:50:45 +000010
Nigel Croxon84df0822016-04-06 14:40:05 -040011 The HPE iLO NMI Watchdog driver is a kernel module that provides basic
Jerry Hoemann18bd1962018-08-20 13:31:23 -060012 watchdog functionality and handler for the iLO "Generate NMI to System"
13 virtual button.
14
Nigel Croxon84df0822016-04-06 14:40:05 -040015 All references to iLO in this document imply it also works on iLO2 and all
16 subsequent generations.
Thomas Mingarelli47bece82009-06-04 19:50:45 +000017
18 Watchdog functionality is enabled like any other common watchdog driver. That
19 is, an application needs to be started that kicks off the watchdog timer. A
Tom Saeger718d50e2017-10-12 15:24:10 -050020 basic application exists in tools/testing/selftests/watchdog/ named
Thomas Mingarelli47bece82009-06-04 19:50:45 +000021 watchdog-test.c. Simply compile the C file and kick it off. If the system
Nigel Croxon84df0822016-04-06 14:40:05 -040022 gets into a bad state and hangs, the HPE ProLiant iLO timer register will
Thomas Mingarelli47bece82009-06-04 19:50:45 +000023 not be updated in a timely fashion and a hardware system reset (also known as
24 an Automatic Server Recovery (ASR)) event will occur.
25
Jerry Hoemann18bd1962018-08-20 13:31:23 -060026 The hpwdt driver also has the following module parameters:
Thomas Mingarelli47bece82009-06-04 19:50:45 +000027
Mauro Carvalho Chehabcc2a2d12019-06-12 14:53:01 -030028 ============ ================================================================
29 soft_margin allows the user to set the watchdog timer value.
Nigel Croxon84df0822016-04-06 14:40:05 -040030 Default value is 30 seconds.
Mauro Carvalho Chehabcc2a2d12019-06-12 14:53:01 -030031 timeout an alias of soft_margin.
32 pretimeout allows the user to set the watchdog pretimeout value.
Jerry Hoemann18bd1962018-08-20 13:31:23 -060033 This is the number of seconds before timeout when an
34 NMI is delivered to the system. Setting the value to
35 zero disables the pretimeout NMI.
36 Default value is 9 seconds.
Mauro Carvalho Chehabcc2a2d12019-06-12 14:53:01 -030037 nowayout basic watchdog parameter that does not allow the timer to
Thomas Mingarelli47bece82009-06-04 19:50:45 +000038 be restarted or an impending ASR to be escaped.
Nigel Croxon84df0822016-04-06 14:40:05 -040039 Default value is set when compiling the kernel. If it is set
40 to "Y", then there is no way of disabling the watchdog once
41 it has been started.
Jerry Hoemannf213fcf2019-05-17 14:59:42 -060042 kdumptimeout Minimum timeout in seconds to apply upon receipt of an NMI
43 before calling panic. (-1) disables the watchdog. When value
44 is > 0, the timer is reprogrammed with the greater of
45 value or current timeout value.
Mauro Carvalho Chehabcc2a2d12019-06-12 14:53:01 -030046 ============ ================================================================
Thomas Mingarelli47bece82009-06-04 19:50:45 +000047
Mauro Carvalho Chehabcc2a2d12019-06-12 14:53:01 -030048 NOTE:
49 More information about watchdog drivers in general, including the ioctl
Thomas Mingarelli47bece82009-06-04 19:50:45 +000050 interface to /dev/watchdog can be found in
Mauro Carvalho Chehabcc2a2d12019-06-12 14:53:01 -030051 Documentation/watchdog/watchdog-api.rst and Documentation/IPMI.txt.
Thomas Mingarelli47bece82009-06-04 19:50:45 +000052
Jerry Hoemann18bd1962018-08-20 13:31:23 -060053 Due to limitations in the iLO hardware, the NMI pretimeout if enabled,
54 can only be set to 9 seconds. Attempts to set pretimeout to other
55 non-zero values will be rounded, possibly to zero. Users should verify
56 the pretimeout value after attempting to set pretimeout or timeout.
Thomas Mingarelli47bece82009-06-04 19:50:45 +000057
Jerry Hoemann18bd1962018-08-20 13:31:23 -060058 Upon receipt of an NMI from the iLO, the hpwdt driver will initiate a
59 panic. This is to allow for a crash dump to be collected. It is incumbent
60 upon the user to have properly configured the system for kdump.
Thomas Mingarelli47bece82009-06-04 19:50:45 +000061
Jerry Hoemann18bd1962018-08-20 13:31:23 -060062 The default Linux kernel behavior upon panic is to print a kernel tombstone
63 and loop forever. This is generally not what a watchdog user wants.
Thomas Mingarelli47bece82009-06-04 19:50:45 +000064
Jerry Hoemann18bd1962018-08-20 13:31:23 -060065 For those wishing to learn more please see:
Mauro Carvalho Chehabbff9e342019-07-15 05:31:06 -030066 Documentation/admin-guide/kdump/kdump.rst
Jerry Hoemann18bd1962018-08-20 13:31:23 -060067 Documentation/admin-guide/kernel-parameters.txt (panic=)
68 Your Linux Distribution specific documentation.
Thomas Mingarelli47bece82009-06-04 19:50:45 +000069
Jerry Hoemann18bd1962018-08-20 13:31:23 -060070 If the hpwdt does not receive the NMI associated with an expiring timer,
71 the iLO will proceed to reset the system at timeout if the timer hasn't
72 been updated.
Thomas Mingarelli47bece82009-06-04 19:50:45 +000073
Jerry Hoemann18bd1962018-08-20 13:31:23 -060074--
Thomas Mingarelli47bece82009-06-04 19:50:45 +000075
Jerry Hoemann18bd1962018-08-20 13:31:23 -060076 The HPE iLO NMI Watchdog Driver and documentation were originally developed
77 by Tom Mingarelli.