thp: increase split_huge_page() success rate

During freeze_page(), we remove the page from rmap.  It munlocks the
page if it was mlocked.  clear_page_mlock() uses thelru cache, which
temporary pins the page.

Let's drain the lru cache before checking page's count vs.  mapcount.
The change makes mlocked page split on first attempt, if it was not
pinned by somebody else.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a0b910a..882b044 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3201,6 +3201,7 @@
 	struct page *head = compound_head(page);
 	struct anon_vma *anon_vma;
 	int count, mapcount, ret;
+	bool mlocked;
 
 	VM_BUG_ON_PAGE(is_huge_zero_page(page), page);
 	VM_BUG_ON_PAGE(!PageAnon(page), page);
@@ -3231,9 +3232,14 @@
 		goto out_unlock;
 	}
 
+	mlocked = PageMlocked(page);
 	freeze_page(anon_vma, head);
 	VM_BUG_ON_PAGE(compound_mapcount(head), head);
 
+	/* Make sure the page is not on per-CPU pagevec as it takes pin */
+	if (mlocked)
+		lru_add_drain();
+
 	/* Prevent deferred_split_scan() touching ->_count */
 	spin_lock(&split_queue_lock);
 	count = page_count(head);