blob: ff9bcfd2cc140eba11afb66a6c51c7e651fb40da [file] [log] [blame]
Changbin Duf0339db2019-05-08 23:21:39 +08001.. SPDX-License-Identifier: GPL-2.0
2
3=====================
4Fake NUMA For CPUSets
5=====================
6
7:Author: David Rientjes <rientjes@cs.washington.edu>
8
David Rientjes20280192007-05-02 19:27:09 +02009Using numa=fake and CPUSets for Resource Management
David Rientjes20280192007-05-02 19:27:09 +020010
11This document describes how the numa=fake x86_64 command-line option can be used
12in conjunction with cpusets for coarse memory management. Using this feature,
13you can create fake NUMA nodes that represent contiguous chunks of memory and
14assign them to cpusets and their attached tasks. This is a way of limiting the
15amount of system memory that are available to a certain class of tasks.
16
Thadeu Lima de Souza Cascardo21acb9c2009-02-04 10:12:08 +010017For more information on the features of cpusets, see
Mauro Carvalho Chehabda82c922019-06-27 13:08:35 -030018Documentation/admin-guide/cgroup-v1/cpusets.rst.
David Rientjes20280192007-05-02 19:27:09 +020019There are a number of different configurations you can use for your needs. For
20more information on the numa=fake command line option and its various ways of
Mauro Carvalho Chehabcb1aaeb2019-06-07 15:54:32 -030021configuring fake nodes, see Documentation/x86/x86_64/boot-options.rst.
David Rientjes20280192007-05-02 19:27:09 +020022
23For the purposes of this introduction, we'll assume a very primitive NUMA
24emulation setup of "numa=fake=4*512,". This will split our system memory into
25four equal chunks of 512M each that we can now use to assign to cpusets. As
26you become more familiar with using this combination for resource control,
27you'll determine a better setup to minimize the number of nodes you have to deal
28with.
29
Changbin Duf0339db2019-05-08 23:21:39 +080030A machine may be split as follows with "numa=fake=4*512," as reported by dmesg::
David Rientjes20280192007-05-02 19:27:09 +020031
32 Faking node 0 at 0000000000000000-0000000020000000 (512MB)
33 Faking node 1 at 0000000020000000-0000000040000000 (512MB)
34 Faking node 2 at 0000000040000000-0000000060000000 (512MB)
35 Faking node 3 at 0000000060000000-0000000080000000 (512MB)
36 ...
37 On node 0 totalpages: 130975
38 On node 1 totalpages: 131072
39 On node 2 totalpages: 131072
40 On node 3 totalpages: 131072
41
42Now following the instructions for mounting the cpusets filesystem from
Mauro Carvalho Chehabda82c922019-06-27 13:08:35 -030043Documentation/admin-guide/cgroup-v1/cpusets.rst, you can assign fake nodes (i.e. contiguous memory
Changbin Duf0339db2019-05-08 23:21:39 +080044address spaces) to individual cpusets::
David Rientjes20280192007-05-02 19:27:09 +020045
46 [root@xroads /]# mkdir exampleset
47 [root@xroads /]# mount -t cpuset none exampleset
48 [root@xroads /]# mkdir exampleset/ddset
49 [root@xroads /]# cd exampleset/ddset
50 [root@xroads /exampleset/ddset]# echo 0-1 > cpus
51 [root@xroads /exampleset/ddset]# echo 0-1 > mems
52
53Now this cpuset, 'ddset', will only allowed access to fake nodes 0 and 1 for
54memory allocations (1G).
55
56You can now assign tasks to these cpusets to limit the memory resources
Changbin Duf0339db2019-05-08 23:21:39 +080057available to them according to the fake nodes assigned as mems::
David Rientjes20280192007-05-02 19:27:09 +020058
59 [root@xroads /exampleset/ddset]# echo $$ > tasks
60 [root@xroads /exampleset/ddset]# dd if=/dev/zero of=tmp bs=1024 count=1G
61 [1] 13425
62
63Notice the difference between the system memory usage as reported by
64/proc/meminfo between the restricted cpuset case above and the unrestricted
65case (i.e. running the same 'dd' command without assigning it to a fake NUMA
66cpuset):
Changbin Duf0339db2019-05-08 23:21:39 +080067
68 ======== ============ ==========
69 Name Unrestricted Restricted
70 ======== ============ ==========
71 MemTotal 3091900 kB 3091900 kB
72 MemFree 42113 kB 1513236 kB
73 ======== ============ ==========
David Rientjes20280192007-05-02 19:27:09 +020074
75This allows for coarse memory management for the tasks you assign to particular
76cpusets. Since cpusets can form a hierarchy, you can create some pretty
77interesting combinations of use-cases for various classes of tasks for your
78memory management needs.