Huang Ying | ea8c071 | 2010-05-18 14:35:15 +0800 | [diff] [blame] | 1 | APEI Error INJection |
| 2 | ~~~~~~~~~~~~~~~~~~~~ |
| 3 | |
| 4 | EINJ provides a hardware error injection mechanism |
| 5 | It is very useful for debugging and testing of other APEI and RAS features. |
| 6 | |
| 7 | To use EINJ, make sure the following are enabled in your kernel |
| 8 | configuration: |
| 9 | |
| 10 | CONFIG_DEBUG_FS |
| 11 | CONFIG_ACPI_APEI |
| 12 | CONFIG_ACPI_APEI_EINJ |
| 13 | |
| 14 | The user interface of EINJ is debug file system, under the |
| 15 | directory apei/einj. The following files are provided. |
| 16 | |
| 17 | - available_error_type |
| 18 | Reading this file returns the error injection capability of the |
| 19 | platform, that is, which error types are supported. The error type |
| 20 | definition is as follow, the left field is the error type value, the |
| 21 | right field is error description. |
| 22 | |
| 23 | 0x00000001 Processor Correctable |
| 24 | 0x00000002 Processor Uncorrectable non-fatal |
| 25 | 0x00000004 Processor Uncorrectable fatal |
| 26 | 0x00000008 Memory Correctable |
| 27 | 0x00000010 Memory Uncorrectable non-fatal |
| 28 | 0x00000020 Memory Uncorrectable fatal |
| 29 | 0x00000040 PCI Express Correctable |
| 30 | 0x00000080 PCI Express Uncorrectable fatal |
| 31 | 0x00000100 PCI Express Uncorrectable non-fatal |
| 32 | 0x00000200 Platform Correctable |
| 33 | 0x00000400 Platform Uncorrectable non-fatal |
| 34 | 0x00000800 Platform Uncorrectable fatal |
| 35 | |
| 36 | The format of file contents are as above, except there are only the |
| 37 | available error type lines. |
| 38 | |
| 39 | - error_type |
| 40 | This file is used to set the error type value. The error type value |
| 41 | is defined in "available_error_type" description. |
| 42 | |
| 43 | - error_inject |
| 44 | Write any integer to this file to trigger the error |
| 45 | injection. Before this, please specify all necessary error |
| 46 | parameters. |
| 47 | |
Luck, Tony | 3482fb5 | 2013-11-06 13:30:36 -0800 | [diff] [blame^] | 48 | - flags |
| 49 | Present for kernel version 3.13 and above. Used to specify which |
| 50 | of param{1..4} are valid and should be used by BIOS during injection. |
| 51 | Value is a bitmask as specified in ACPI5.0 spec for the |
| 52 | SET_ERROR_TYPE_WITH_ADDRESS data structure: |
| 53 | Bit 0 - Processor APIC field valid (see param3 below) |
| 54 | Bit 1 - Memory address and mask valid (param1 and param2) |
| 55 | Bit 2 - PCIe (seg,bus,dev,fn) valid (param4 below) |
| 56 | If set to zero, legacy behaviour is used where the type of injection |
| 57 | specifies just one bit set, and param1 is multiplexed. |
| 58 | |
Huang Ying | 6e320ec | 2010-05-18 14:35:24 +0800 | [diff] [blame] | 59 | - param1 |
| 60 | This file is used to set the first error parameter value. Effect of |
Chen Gong | ace3647 | 2013-06-06 15:28:11 -0700 | [diff] [blame] | 61 | parameter depends on error_type specified. For example, if error |
| 62 | type is memory related type, the param1 should be a valid physical |
Luck, Tony | 3482fb5 | 2013-11-06 13:30:36 -0800 | [diff] [blame^] | 63 | memory address. [Unless "flag" is set - see above] |
Huang Ying | 6e320ec | 2010-05-18 14:35:24 +0800 | [diff] [blame] | 64 | |
| 65 | - param2 |
| 66 | This file is used to set the second error parameter value. Effect of |
Chen Gong | ace3647 | 2013-06-06 15:28:11 -0700 | [diff] [blame] | 67 | parameter depends on error_type specified. For example, if error |
| 68 | type is memory related type, the param2 should be a physical memory |
| 69 | address mask. Linux requires page or narrower granularity, say, |
| 70 | 0xfffffffffffff000. |
Huang Ying | c3e6088 | 2011-07-20 16:09:29 +0800 | [diff] [blame] | 71 | |
Luck, Tony | 3482fb5 | 2013-11-06 13:30:36 -0800 | [diff] [blame^] | 72 | - param3 |
| 73 | Used when the 0x1 bit is set in "flag" to specify the APIC id |
| 74 | |
| 75 | - param4 |
| 76 | Used when the 0x4 bit is set in "flag" to specify target PCIe device |
| 77 | |
Chen Gong | 6ef19ab | 2012-03-15 16:53:37 +0800 | [diff] [blame] | 78 | - notrigger |
| 79 | The EINJ mechanism is a two step process. First inject the error, then |
| 80 | perform some actions to trigger it. Setting "notrigger" to 1 skips the |
| 81 | trigger phase, which *may* allow the user to cause the error in some other |
| 82 | context by a simple access to the cpu, memory location, or device that is |
| 83 | the target of the error injection. Whether this actually works depends |
| 84 | on what operations the BIOS actually includes in the trigger phase. |
| 85 | |
Tony Luck | c130bd6 | 2012-01-17 12:10:16 -0800 | [diff] [blame] | 86 | BIOS versions based in the ACPI 4.0 specification have limited options |
| 87 | to control where the errors are injected. Your BIOS may support an |
| 88 | extension (enabled with the param_extension=1 module parameter, or |
| 89 | boot command line einj.param_extension=1). This allows the address |
| 90 | and mask for memory injections to be specified by the param1 and |
| 91 | param2 files in apei/einj. |
| 92 | |
| 93 | BIOS versions using the ACPI 5.0 specification have more control over |
| 94 | the target of the injection. For processor related errors (type 0x1, |
| 95 | 0x2 and 0x4) the APICID of the target should be provided using the |
| 96 | param1 file in apei/einj. For memory errors (type 0x8, 0x10 and 0x20) |
| 97 | the address is set using param1 with a mask in param2 (0x0 is equivalent |
| 98 | to all ones). For PCI express errors (type 0x40, 0x80 and 0x100) the |
| 99 | segment, bus, device and function are specified using param1: |
| 100 | |
| 101 | 31 24 23 16 15 11 10 8 7 0 |
| 102 | +-------------------------------------------------+ |
| 103 | | segment | bus | device | function | reserved | |
| 104 | +-------------------------------------------------+ |
| 105 | |
| 106 | An ACPI 5.0 BIOS may also allow vendor specific errors to be injected. |
| 107 | In this case a file named vendor will contain identifying information |
| 108 | from the BIOS that hopefully will allow an application wishing to use |
| 109 | the vendor specific extension to tell that they are running on a BIOS |
| 110 | that supports it. All vendor extensions have the 0x80000000 bit set in |
| 111 | error_type. A file vendor_flags controls the interpretation of param1 |
| 112 | and param2 (1 = PROCESSOR, 2 = MEMORY, 4 = PCI). See your BIOS vendor |
| 113 | documentation for details (and expect changes to this API if vendors |
| 114 | creativity in using this feature expands beyond our expectations). |
| 115 | |
| 116 | Example: |
| 117 | # cd /sys/kernel/debug/apei/einj |
| 118 | # cat available_error_type # See which errors can be injected |
| 119 | 0x00000002 Processor Uncorrectable non-fatal |
| 120 | 0x00000008 Memory Correctable |
| 121 | 0x00000010 Memory Uncorrectable non-fatal |
| 122 | # echo 0x12345000 > param1 # Set memory address for injection |
| 123 | # echo 0xfffffffffffff000 > param2 # Mask - anywhere in this page |
| 124 | # echo 0x8 > error_type # Choose correctable memory error |
| 125 | # echo 1 > error_inject # Inject now |
| 126 | |
Huang Ying | 6e320ec | 2010-05-18 14:35:24 +0800 | [diff] [blame] | 127 | |
Huang Ying | ea8c071 | 2010-05-18 14:35:15 +0800 | [diff] [blame] | 128 | For more information about EINJ, please refer to ACPI specification |
Tony Luck | c130bd6 | 2012-01-17 12:10:16 -0800 | [diff] [blame] | 129 | version 4.0, section 17.5 and ACPI 5.0, section 18.6. |