io_uring: batch getting pcpu references

percpu_ref_tryget() has its own overhead. Instead getting a reference
for each request, grab a bunch once per io_submit_sqes().

~5% throughput boost for a "submit and wait 128 nops" benchmark.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>

__io_req_free_empty() -> __io_req_do_free()

Signed-off-by: Jens Axboe <axboe@kernel.dk>
1 file changed