cgroup: implement "nsdelegate" mount option
Currently, cgroup only supports delegation to !root users and cgroup
namespaces don't get any special treatments. This limits the
usefulness of cgroup namespaces as they by themselves can't be safe
delegation boundaries. A process inside a cgroup can change the
resource control knobs of the parent in the namespace root and may
move processes in and out of the namespace if cgroups outside its
namespace are visible somehow.
This patch adds a new mount option "nsdelegate" which makes cgroup
namespaces delegation boundaries. If set, cgroup behaves as if write
permission based delegation took place at namespace boundaries -
writes to the resource control knobs from the namespace root are
denied and migration crossing the namespace boundary aren't allowed
from inside the namespace.
This allows cgroup namespace to function as a delegation boundary by
itself.
v2: Silently ignore nsdelegate specified on !init mounts.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Aravind Anbudurai <aru7@fb.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Eric Biederman <ebiederm@xmission.com>
diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index 0260ed0..558c3a7 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -149,6 +149,16 @@
and experimenting easier, the kernel parameter cgroup_no_v1= allows
disabling controllers in v1 and make them always available in v2.
+cgroup v2 currently supports the following mount options.
+
+ nsdelegate
+
+ Consider cgroup namespaces as delegation boundaries. This
+ option is system wide and can only be set on mount or modified
+ through remount from the init namespace. The mount option is
+ ignored on non-init namespace mounts. Please refer to the
+ Delegation section for details.
+
2-2. Organizing Processes
@@ -308,19 +318,27 @@
2-5-1. Model of Delegation
-A cgroup can be delegated to a less privileged user by granting write
-access of the directory and its "cgroup.procs" and
-"cgroup.subtree_control" files to the user. Note that resource
-control interface files in a given directory control the distribution
-of the parent's resources and thus must not be delegated along with
-the directory.
+A cgroup can be delegated in two ways. First, to a less privileged
+user by granting write access of the directory and its "cgroup.procs"
+and "cgroup.subtree_control" files to the user. Second, if the
+"nsdelegate" mount option is set, automatically to a cgroup namespace
+on namespace creation.
-Once delegated, the user can build sub-hierarchy under the directory,
-organize processes as it sees fit and further distribute the resources
-it received from the parent. The limits and other settings of all
-resource controllers are hierarchical and regardless of what happens
-in the delegated sub-hierarchy, nothing can escape the resource
-restrictions imposed by the parent.
+Because the resource control interface files in a given directory
+control the distribution of the parent's resources, the delegatee
+shouldn't be allowed to write to them. For the first method, this is
+achieved by not granting access to these files. For the second, the
+kernel rejects writes to all files other than "cgroup.procs" and
+"cgroup.subtree_control" on a namespace root from inside the
+namespace.
+
+The end results are equivalent for both delegation types. Once
+delegated, the user can build sub-hierarchy under the directory,
+organize processes inside it as it sees fit and further distribute the
+resources it received from the parent. The limits and other settings
+of all resource controllers are hierarchical and regardless of what
+happens in the delegated sub-hierarchy, nothing can escape the
+resource restrictions imposed by the parent.
Currently, cgroup doesn't impose any restrictions on the number of
cgroups in or nesting depth of a delegated sub-hierarchy; however,
@@ -330,10 +348,12 @@
2-5-2. Delegation Containment
A delegated sub-hierarchy is contained in the sense that processes
-can't be moved into or out of the sub-hierarchy by the delegatee. For
-a process with a non-root euid to migrate a target process into a
-cgroup by writing its PID to the "cgroup.procs" file, the following
-conditions must be met.
+can't be moved into or out of the sub-hierarchy by the delegatee.
+
+For delegations to a less privileged user, this is achieved by
+requiring the following conditions for a process with a non-root euid
+to migrate a target process into a cgroup by writing its PID to the
+"cgroup.procs" file.
- The writer must have write access to the "cgroup.procs" file.
@@ -360,6 +380,11 @@
not have write access to its "cgroup.procs" files and thus the write
will be denied with -EACCES.
+For delegations to namespaces, containment is achieved by requiring
+that both the source and destination cgroups are reachable from the
+namespace of the process which is attempting the migration. If either
+is not reachable, the migration is rejected with -ENOENT.
+
2-6. Guidelines
@@ -1414,7 +1439,7 @@
- Multiple hierarchies including named ones are not supported.
-- All mount options and remounting are not supported.
+- All v1 mount options are not supported.
- The "tasks" file is removed and "cgroup.procs" is not sorted.