mm, page_alloc: make THP-specific decisions more generic Since THP allocations during page faults can be costly, extra decisions are employed for them to avoid excessive reclaim and compaction, if the initial compaction doesn't look promising. The detection has never been perfect as there is no gfp flag specific to THP allocations. At this moment it checks the whole combination of flags that makes up GFP_TRANSHUGE, and hopes that no other users of such combination exist, or would mind being treated the same way. Extra care is also taken to separate allocations from khugepaged, where latency doesn't matter that much. It is however possible to distinguish these allocations in a simpler and more reliable way. The key observation is that after the initial compaction followed by the first iteration of "standard" reclaim/compaction, both __GFP_NORETRY allocations and costly allocations without __GFP_REPEAT are declared as failures: /* Do not loop if specifically requested */ if (gfp_mask & __GFP_NORETRY) goto nopage; /* * Do not retry costly high order allocations unless they are * __GFP_REPEAT */ if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT)) goto nopage; This means we can further distinguish allocations that are costly order *and* additionally include the __GFP_NORETRY flag. As it happens, GFP_TRANSHUGE allocations do already fall into this category. This will also allow other costly allocations with similar high-order benefit vs latency considerations to use this semantic. Furthermore, we can distinguish THP allocations that should try a bit harder (such as from khugepageed) by removing __GFP_NORETRY, as will be done in the next patch. Link: http://lkml.kernel.org/r/20160721073614.24395-6-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit: 3eb2771b06d8e206a2f48cfc7c96bb4ef97e2471 [log] [tgz]
author: Vlastimil Babka <vbabka@suse.cz> Thu Jul 28 15:49:22 2016 -0700
committer: Linus Torvalds <torvalds@linux-foundation.org> Thu Jul 28 16:07:41 2016 -0700
tree: 93979bb95dac80990722d7e918836238c4f9da02
parent: a8161d1ed6098506303c65b3701dedba876df42a [diff] [blame]
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ae721a7..c42ec37 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c

@@ -3085,7 +3085,6 @@
 	return page;
 }
 
-
 /*
  * Maximum number of compaction retries wit a progress before OOM
  * killer is consider as the only way to move forward.
@@ -3373,11 +3372,6 @@
 	return false;
 }
 
-static inline bool is_thp_gfp_mask(gfp_t gfp_mask)
-{
-	return (gfp_mask & (GFP_TRANSHUGE | __GFP_KSWAPD_RECLAIM)) == GFP_TRANSHUGE;
-}
-
 /*
  * Maximum number of reclaim retries without any progress before OOM killer
  * is consider as the only way to move forward.
@@ -3536,8 +3530,11 @@
 		if (page)
 			goto got_pg;
 
-		/* Checks for THP-specific high-order allocations */
-		if (is_thp_gfp_mask(gfp_mask)) {
+		/*
+		 * Checks for costly allocations with __GFP_NORETRY, which
+		 * includes THP page fault allocations
+		 */
+		if (gfp_mask & __GFP_NORETRY) {
 			/*
 			 * If compaction is deferred for high-order allocations,
 			 * it is because sync compaction recently failed. If
@@ -3557,11 +3554,10 @@
 				goto nopage;
 
 			/*
-			 * It can become very expensive to allocate transparent
-			 * hugepages at fault, so use asynchronous memory
-			 * compaction for THP unless it is khugepaged trying to
-			 * collapse. All other requests should tolerate at
-			 * least light sync migration.
+			 * Looks like reclaim/compaction is worth trying, but
+			 * sync compaction could be very expensive, so keep
+			 * using async compaction, unless it's khugepaged
+			 * trying to collapse.
 			 */
 			if (!(current->flags & PF_KTHREAD))
 				migration_mode = MIGRATE_ASYNC;
commit	3eb2771b06d8e206a2f48cfc7c96bb4ef97e2471	[log] [tgz]
author	Vlastimil Babka <vbabka@suse.cz>	Thu Jul 28 15:49:22 2016 -0700
committer	Linus Torvalds <torvalds@linux-foundation.org>	Thu Jul 28 16:07:41 2016 -0700
tree	93979bb95dac80990722d7e918836238c4f9da02
parent	a8161d1ed6098506303c65b3701dedba876df42a [diff] [blame]