scsi: target: tcmu: Fix and simplify timeout handling
During cmd timeout handling in check_timedout_devices(), due to a race, it
can happen that tcmu_set_next_deadline() does not start a timer as
expected:
1) Either tcmu_check_expired_ring_cmd() checks the inflight_queue or
tcmu_check_expired_queue_cmd() checks the qfull_queue while jiffies has
the value X
2) At the end of the check the queue contains one remaining command with
deadline X (time_after(X, X) is false and thus the command is not
handled as being timed out).
3) After tcmu_check_expired_xxxxx_cmd() a timer interrupt happens and
jiffies is incremented to X+1.
4) Now tcmu_set_next_deadline() is called, but it skips the command, since
time_after(X+1, X) is true. Therefore tcmu_set_next_deadline() finds no
new deadline and stops the timer, which it shouldn't.
Since commands that time out are removed from inflight_queue or
qfull_queue, we don't need the check with time_after() in
tcmu_set_next_deadline() but can use the deadline from the first cmd in
the queue.
Additionally, replace the remaining time_after() calls in
tcmu_check_expired_xxxxx_cmd() with time_after_eq(), because it is not
useful to set the timeout to deadline but then check for jiffies being
greater than deadline.
Simplify the end of tcmu_handle_completions() and change the check for no
more pending commands from
mb->cmd_tail == mb->cmd_head
to
idr_is_empty(&udev->commands)
because the old check doesn't work correctly if paddings or in the future
TMRs are in the ring.
Finally tcmu_set_next_deadline() was shifted in the source as
preparation for later implementation of tmr_notify callback.
Link: https://lore.kernel.org/r/20200726153510.13077-7-bstroesser@ts.fujitsu.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
1 file changed