rcu: Optionally run grace-period kthreads at real-time priority
Recent testing has shown that under heavy load, running RCU's grace-period
kthreads at real-time priority can improve performance (according to 0day
test robot) and reduce the incidence of RCU CPU stall warnings. However,
most systems do just fine with the default non-realtime priorities for
these kthreads, and it does not make sense to expose the entire user
base to any risk stemming from this change, given that this change is
of use only to a few users running extremely heavy workloads.
Therefore, this commit allows users to specify realtime priorities
for the grace-period kthreads, but leaves them running SCHED_OTHER
by default. The realtime priority may be specified at build time
via the RCU_KTHREAD_PRIO Kconfig parameter, or at boot time via the
rcutree.kthread_prio parameter. Either way, 0 says to continue the
default SCHED_OTHER behavior and values from 1-99 specify that priority
of SCHED_FIFO behavior. Note that a value of 0 is not permitted when
the RCU_BOOST Kconfig parameter is specified.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 5987fdc..75ce123 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -156,6 +156,10 @@
static void invoke_rcu_core(void);
static void invoke_rcu_callbacks(struct rcu_state *rsp, struct rcu_data *rdp);
+/* rcuc/rcub kthread realtime priority */
+static int kthread_prio = CONFIG_RCU_KTHREAD_PRIO;
+module_param(kthread_prio, int, 0644);
+
/*
* Track the rcutorture test sequence number and the update version
* number within a given test. The rcutorture_testseq is incremented
@@ -3597,17 +3601,35 @@
static int __init rcu_spawn_gp_kthread(void)
{
unsigned long flags;
+ int kthread_prio_in = kthread_prio;
struct rcu_node *rnp;
struct rcu_state *rsp;
+ struct sched_param sp;
struct task_struct *t;
+ /* Force priority into range. */
+ if (IS_ENABLED(CONFIG_RCU_BOOST) && kthread_prio < 1)
+ kthread_prio = 1;
+ else if (kthread_prio < 0)
+ kthread_prio = 0;
+ else if (kthread_prio > 99)
+ kthread_prio = 99;
+ if (kthread_prio != kthread_prio_in)
+ pr_alert("rcu_spawn_gp_kthread(): Limited prio to %d from %d\n",
+ kthread_prio, kthread_prio_in);
+
rcu_scheduler_fully_active = 1;
for_each_rcu_flavor(rsp) {
- t = kthread_run(rcu_gp_kthread, rsp, "%s", rsp->name);
+ t = kthread_create(rcu_gp_kthread, rsp, "%s", rsp->name);
BUG_ON(IS_ERR(t));
rnp = rcu_get_root(rsp);
raw_spin_lock_irqsave(&rnp->lock, flags);
rsp->gp_kthread = t;
+ if (kthread_prio) {
+ sp.sched_priority = kthread_prio;
+ sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
+ }
+ wake_up_process(t);
raw_spin_unlock_irqrestore(&rnp->lock, flags);
}
rcu_spawn_nocb_kthreads();