[PATCH] dup_fd() part 4 - race fix
Parent _can_ be a clone task, contrary to the comment. Moreover,
more files could be opened while we allocate a copy, in which case
we end up copying only part into new descriptor table. Since what
we get _is_ affected by all changes in the old range, we can get
rather weird effects - e.g.
dup2(0, 1024); close(0);
in parallel with fork() resulting in child that sees the effect of
close(), but not that of dup2() done just before that close().
What we need is to recalculate the open_count after having reacquired
->file_lock and if external fdtable we'd just allocated is too small for
it, free the sucker and redo allocation.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
diff --git a/fs/file.c b/fs/file.c
index 689d2b6..0f705c7c 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -308,11 +308,16 @@
/*
* Check whether we need to allocate a larger fd array and fd set.
- * Note: we're not a clone task, so the open count won't change.
*/
- if (open_files > new_fdt->max_fds) {
+ while (unlikely(open_files > new_fdt->max_fds)) {
spin_unlock(&oldf->file_lock);
+ if (new_fdt != &newf->fdtab) {
+ free_fdarr(new_fdt);
+ free_fdset(new_fdt);
+ kfree(new_fdt);
+ }
+
new_fdt = alloc_fdtable(open_files - 1);
if (!new_fdt) {
*errorp = -ENOMEM;
@@ -335,6 +340,7 @@
*/
spin_lock(&oldf->file_lock);
old_fdt = files_fdtable(oldf);
+ open_files = count_open_files(old_fdt);
}
old_fds = old_fdt->fd;