Diff - 40b1de007aca4f9ec4ee4322c29f026ebb60ac96^! - SHIFTPHONES/mainline/linux

commit	40b1de007aca4f9ec4ee4322c29f026ebb60ac96	[log] [tgz]
author	Darrick J. Wong <djwong@kernel.org>	Fri Aug 06 11:05:43 2021 -0700
committer	Darrick J. Wong <djwong@kernel.org>	Mon Aug 09 11:13:17 2021 -0700
tree	0933ecaa5f4f262b63e94f1a8da9bf60e2810ab8
parent	a6343e4d9278b3919c809fab9945c4d8f04fadf5 [diff] [blame]

xfs: throttle inode inactivation queuing on memory reclaim

Now that we defer inode inactivation, we've decoupled the process of
unlinking or closing an inode from the process of inactivating it.  In
theory this should lead to better throughput since we now inactivate the
queued inodes in batches instead of one at a time.

Unfortunately, one of the primary risks with this decoupling is the loss
of rate control feedback between the frontend and background threads.
In other words, a rm -rf /* thread can run the system out of memory if
it can queue inodes for inactivation and jump to a new CPU faster than
the background threads can actually clear the deferred work.  The
workers can get scheduled off the CPU if they have to do IO, etc.

To solve this problem, we configure a shrinker so that it will activate
the /second/ time the shrinkers are called.  The custom shrinker will
queue all percpu deferred inactivation workers immediately and set a
flag to force frontend callers who are releasing a vfs inode to wait for
the inactivation workers.

On my test VM with 560M of RAM and a 2TB filesystem, this seems to solve
most of the OOMing problem when deleting 10 million inodes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 4b3ce61..91a1023 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h

@@ -65,6 +65,7 @@ struct xfs_inodegc {
 
 	/* approximate count of inodes in the list */
 	unsigned int		items;
+	unsigned int		shrinker_hits;
 };
 
 /*
@@ -210,6 +211,8 @@ typedef struct xfs_mount {
 	xfs_agnumber_t		m_agirotor;	/* last ag dir inode alloced */
 	spinlock_t		m_agirotor_lock;/* .. and lock protecting it */
 
+	/* Memory shrinker to throttle and reprioritize inodegc */
+	struct shrinker		m_inodegc_shrinker;
 	/*
 	 * Workqueue item so that we can coalesce multiple inode flush attempts
 	 * into a single flush.