habanalabs: bypass reset for continuous h/w error event
There may be a situation where drivers receives continuous fatal H/W
error events from FW immediately post reset cycle.
This may be due to some fault on the silicon itself.
In such case its better to bypass reset cycle so we won't be stuck in
endless loop of resets.
This commit bypasses reset request in case driver received two back to
back FW fatal error before first occurrence of heartbeat event.
Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
diff --git a/drivers/misc/habanalabs/common/habanalabs_drv.c b/drivers/misc/habanalabs/common/habanalabs_drv.c
index a75e4fc..1da5606 100644
--- a/drivers/misc/habanalabs/common/habanalabs_drv.c
+++ b/drivers/misc/habanalabs/common/habanalabs_drv.c
@@ -339,6 +339,7 @@ int create_hdev(struct hl_device **dev, struct pci_dev *pdev,
set_driver_behavior_per_device(hdev);
hdev->curr_reset_cause = HL_RESET_CAUSE_UNKNOWN;
+ hdev->prev_reset_trigger = HL_RESET_TRIGGER_DEFAULT;
if (timeout_locked)
hdev->timeout_jiffies = msecs_to_jiffies(timeout_locked * 1000);