39b169ea0d36b9c445ab6849002e4edf00c7fcc1 - SHIFTPHONES/mainline/linux

commit	39b169ea0d36b9c445ab6849002e4edf00c7fcc1	[log] [tgz]
author	Sergey Gorenko <sergeygo@nvidia.com>	Wed Dec 15 15:57:17 2021 +0200
committer	Jason Gunthorpe <jgg@nvidia.com>	Wed Jan 05 19:36:20 2022 -0400
tree	720c41a6fc91f371b00febe70c380062f587da49
parent	b28801a08924e887d7e3d33f43f510ccd12bbce8 [diff]

IB/iser: Fix RNR errors

Some users complain about RNR errors on the target, when heavy
high-priority tasks run on the initiator. After the investigation, we
found out that the receive WRs were exhausted, because the initiator could
not post them on time.

Receive work reqeusts are posted in chunks to reduce the number of hits to
the HCA. The WRs are posted in the receive completion handler when the
number of free receive buffers reaches the threshold. But on a high-loaded
host, receive CQEs processing can be delayed and all receive WRs will be
exhausted. In this case, the target will get an RNR error.

To avoid this, we post receive WR, as soon as possible and not in a
batch. This increases the number of hits to the HCA, but also the common
implementation in most of Linux ULPs (e.g. NVMe-oF/RDMA). As a rule of
thumb, performance improvements and heuristics are being added to the RDMA
core layer or vendors low level drivers and it's about time to align iSER
as well.

Link: https://lore.kernel.org/r/20211215135721.3662-3-mgurtovoy@nvidia.com
Signed-off-by: Sergey Gorenko <sergeygo@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

3 files changed

tree: 720c41a6fc91f371b00febe70c380062f587da49