fuse: improve aio directIO write performance for size extending writes

While sending the blocking directIO in fuse, the write request is broken
into sub-requests, each of default size 128k and all the requests are sent
in non-blocking background mode if async_dio mode is supported by libfuse.
The process which issue the write wait for the completion of all the
sub-requests. Sending multiple requests parallely gives a chance to perform
parallel writes in the user space fuse implementation if it is
multi-threaded and hence improves the performance.

When there is a size extending aio dio write, we switch to blocking mode so
that we can properly update the size of the file after completion of the
writes. However, in this situation all the sub-requests are sent in
serialized manner where the next request is sent only after receiving the
reply of the current request. Hence the multi-threaded user space
implementation is not utilized properly.

This patch changes the size extending aio dio behavior to exactly follow
blocking dio. For multi threaded fuse implementation having 10 threads and
using buffer size of 64MB to perform async directIO, we are getting double
the speed.

Signed-off-by: Ashish Sangwan <ashishsangwan2@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 929c383..5db5d24 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -259,6 +259,7 @@
 	struct kiocb *iocb;
 	struct file *file;
 	struct completion *done;
+	bool blocking;
 };
 
 #define FUSE_IO_PRIV_SYNC(f) \