sched/numa: Rework best node setting in task_numa_migrate()
Fix up the best node setting in task_numa_migrate() to deal with a task
in a pseudo-interleaved NUMA group, which is already running in the
best location.
Set the task's preferred nid to the current nid, so task migration is
not retried at a high rate.
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: mgorman@suse.de
Cc: chegu_vinod@hp.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1403538095-31256-7-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9d1734a..7bb2f46 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1354,10 +1354,6 @@
}
}
- /* No better CPU than the current one was found. */
- if (env.best_cpu == -1)
- return -EAGAIN;
-
/*
* If the task is part of a workload that spans multiple NUMA nodes,
* and is migrating into one of the workload's active nodes, remember
@@ -1366,8 +1362,19 @@
* A task that migrated to a second choice node will be better off
* trying for a better one later. Do not set the preferred node here.
*/
- if (p->numa_group && node_isset(env.dst_nid, p->numa_group->active_nodes))
- sched_setnuma(p, env.dst_nid);
+ if (p->numa_group) {
+ if (env.best_cpu == -1)
+ nid = env.src_nid;
+ else
+ nid = env.dst_nid;
+
+ if (node_isset(nid, p->numa_group->active_nodes))
+ sched_setnuma(p, env.dst_nid);
+ }
+
+ /* No better CPU than the current one was found. */
+ if (env.best_cpu == -1)
+ return -EAGAIN;
/*
* Reset the scan period if the task is being rescheduled on an