40510a639ec08db81d5ff9c79856baf9dda94748 - SHIFTPHONES/mainline/linux

commit	40510a639ec08db81d5ff9c79856baf9dda94748	[log] [tgz]
author	Sagi Grimberg <sagi@grimberg.me>	Tue Feb 25 15:53:09 2020 -0800
committer	Keith Busch <kbusch@kernel.org>	Thu Mar 26 04:48:06 2020 +0900
tree	9a2a64ca47a8a8dd4b6be70096ca23a8879f545a
parent	fa059b856a593a7bddd4d3779ae8ab1380e05d91 [diff]

nvme-tcp: optimize queue io_cpu assignment for multiple queue maps

Currently, queue io_cpu assignment is done sequentially for default,
read and poll queues based on queue id. This causes miss-alignment between
context of CPU initiating I/O and the I/O worker thread processing
queued requests or completions.

Change to modify queue io_cpu assignment to take into account queue
maps offset. Each queue io_cpu will start at zero for each queue map.
This essentially aligns read/poll queues to start over the same range as
default queues.

Testing performed by Mark with:
- ram device (nvmet)
- single CPU core (pinned)
- 100% 4k reads
- engine io_uring (not using sq_thread option)
- hipri flag set

Micro-benchmark results show a net gain of:
- increase of 18%-29% in IOPs
- reduction of 16%-22% in average latency
- reduction of 7%-23% in 99.99% latency

Baseline:
========
QDepth/Batch	| IOPs [k]	| Avg. Lat [us]	| 99.99% Lat [us]
-----------------------------------------------------------------
1/1 		| 32.4		| 30.11		| 50.94
32/8		| 179		| 168.20	| 371

CPU alignment:
=============
QDepth/Batch	| IOPs [k]	| Avg. Lat [us]	| 99.99% Lat [us]
-----------------------------------------------------------------
1/1 		| 38.5		|   25.18	| 39.16
32/8		| 231		|   130.75	| 343

Reported-by: Mark Wunderlich <mark.wunderlich@intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>

drivers/nvme/host/tcp.c[diff]

1 file changed

tree: 9a2a64ca47a8a8dd4b6be70096ca23a8879f545a