drm/i915: Perform a direct reset of the GPU from the waiter
If a waiter is holding the struct_mutex, then the reset worker cannot
reset the GPU until the waiter returns. We do not want to return -EAGAIN
form i915_wait_request as that breaks delicate operations like
i915_vma_unbind() which often cannot be restarted easily, and returning
-EIO is just as useless (and has in the past proven dangerous). The
remaining WARN_ON(i915_wait_request) serve as a valuable reminder that
handling errors from an indefinite wait are tricky.
We can keep the current semantic that knowing after a reset is complete,
so is the request, by performing the reset ourselves if we hold the
mutex.
uevent emission is still handled by the reset worker, so it may appear
slightly out of order with respect to the actual reset (and concurrent
use of the device).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-11-chris@chris-wilson.co.uk
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 47a676d..ff4173e 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1729,6 +1729,8 @@
* Reset the chip. Useful if a hang is detected. Returns zero on successful
* reset or otherwise an error code.
*
+ * Caller must hold the struct_mutex.
+ *
* Procedure is fairly simple:
* - reset the chip using the reset reg
* - re-init context state
@@ -1743,7 +1745,10 @@
struct i915_gpu_error *error = &dev_priv->gpu_error;
int ret;
- mutex_lock(&dev->struct_mutex);
+ lockdep_assert_held(&dev->struct_mutex);
+
+ if (!test_and_clear_bit(I915_RESET_IN_PROGRESS, &error->flags))
+ return test_bit(I915_WEDGED, &error->flags) ? -EIO : 0;
/* Clear any previous failed attempts at recovery. Time to try again. */
__clear_bit(I915_WEDGED, &error->flags);
@@ -1784,9 +1789,6 @@
goto error;
}
- clear_bit(I915_RESET_IN_PROGRESS, &error->flags);
- mutex_unlock(&dev->struct_mutex);
-
/*
* rps/rc6 re-init is necessary to restore state lost after the
* reset and the re-install of gt irqs. Skip for ironlake per
@@ -1800,7 +1802,6 @@
error:
set_bit(I915_WEDGED, &error->flags);
- mutex_unlock(&dev->struct_mutex);
return ret;
}