[PATCH] dup_fd() part 4 - race fix

Parent _can_ be a clone task, contrary to the comment.  Moreover,
more files could be opened while we allocate a copy, in which case
we end up copying only part into new descriptor table.  Since what
we get _is_ affected by all changes in the old range, we can get
rather weird effects - e.g.
	dup2(0, 1024); close(0);
in parallel with fork() resulting in child that sees the effect of
close(), but not that of dup2() done just before that close().

What we need is to recalculate the open_count after having reacquired
->file_lock and if external fdtable we'd just allocated is too small for
it, free the sucker and redo allocation.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
diff --git a/fs/file.c b/fs/file.c
index 689d2b6..0f705c7c 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -308,11 +308,16 @@
 
 	/*
 	 * Check whether we need to allocate a larger fd array and fd set.
-	 * Note: we're not a clone task, so the open count won't change.
 	 */
-	if (open_files > new_fdt->max_fds) {
+	while (unlikely(open_files > new_fdt->max_fds)) {
 		spin_unlock(&oldf->file_lock);
 
+		if (new_fdt != &newf->fdtab) {
+			free_fdarr(new_fdt);
+			free_fdset(new_fdt);
+			kfree(new_fdt);
+		}
+
 		new_fdt = alloc_fdtable(open_files - 1);
 		if (!new_fdt) {
 			*errorp = -ENOMEM;
@@ -335,6 +340,7 @@
 		 */
 		spin_lock(&oldf->file_lock);
 		old_fdt = files_fdtable(oldf);
+		open_files = count_open_files(old_fdt);
 	}
 
 	old_fds = old_fdt->fd;