Btrfs: Throttle for async bio submits higher up the chain

The current code waits for the count of async bio submits to get below
a given threshold if it is too high right after adding the latest bio
to the work queue.  This isn't optimal because the caller may have
sequential adjacent bios pending they are waiting to send down the pipe.

This changeset requires the caller to wait on the async bio count,
and changes the async checksumming submits to wait for async bios any
time they self throttle.

The end result is much higher sequential throughput.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index bbba14b..6a218f7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -487,9 +487,15 @@
 	atomic_inc(&fs_info->nr_async_submits);
 	btrfs_queue_worker(&fs_info->workers, &async->work);
 
-	wait_event_timeout(fs_info->async_submit_wait,
+	if (atomic_read(&fs_info->nr_async_submits) > limit) {
+		wait_event_timeout(fs_info->async_submit_wait,
 			   (atomic_read(&fs_info->nr_async_submits) < limit),
 			   HZ/10);
+
+		wait_event_timeout(fs_info->async_submit_wait,
+			   (atomic_read(&fs_info->nr_async_bios) < limit),
+			   HZ/10);
+	}
 	return 0;
 }