Changbin Du | f0339db | 2019-05-08 23:21:39 +0800 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0 |
| 2 | |
| 3 | ===================== |
| 4 | Fake NUMA For CPUSets |
| 5 | ===================== |
| 6 | |
| 7 | :Author: David Rientjes <rientjes@cs.washington.edu> |
| 8 | |
David Rientjes | 2028019 | 2007-05-02 19:27:09 +0200 | [diff] [blame] | 9 | Using numa=fake and CPUSets for Resource Management |
David Rientjes | 2028019 | 2007-05-02 19:27:09 +0200 | [diff] [blame] | 10 | |
| 11 | This document describes how the numa=fake x86_64 command-line option can be used |
| 12 | in conjunction with cpusets for coarse memory management. Using this feature, |
| 13 | you can create fake NUMA nodes that represent contiguous chunks of memory and |
| 14 | assign them to cpusets and their attached tasks. This is a way of limiting the |
| 15 | amount of system memory that are available to a certain class of tasks. |
| 16 | |
Thadeu Lima de Souza Cascardo | 21acb9c | 2009-02-04 10:12:08 +0100 | [diff] [blame] | 17 | For more information on the features of cpusets, see |
Mauro Carvalho Chehab | da82c92 | 2019-06-27 13:08:35 -0300 | [diff] [blame] | 18 | Documentation/admin-guide/cgroup-v1/cpusets.rst. |
David Rientjes | 2028019 | 2007-05-02 19:27:09 +0200 | [diff] [blame] | 19 | There are a number of different configurations you can use for your needs. For |
| 20 | more information on the numa=fake command line option and its various ways of |
Mauro Carvalho Chehab | cb1aaeb | 2019-06-07 15:54:32 -0300 | [diff] [blame] | 21 | configuring fake nodes, see Documentation/x86/x86_64/boot-options.rst. |
David Rientjes | 2028019 | 2007-05-02 19:27:09 +0200 | [diff] [blame] | 22 | |
| 23 | For the purposes of this introduction, we'll assume a very primitive NUMA |
| 24 | emulation setup of "numa=fake=4*512,". This will split our system memory into |
| 25 | four equal chunks of 512M each that we can now use to assign to cpusets. As |
| 26 | you become more familiar with using this combination for resource control, |
| 27 | you'll determine a better setup to minimize the number of nodes you have to deal |
| 28 | with. |
| 29 | |
Changbin Du | f0339db | 2019-05-08 23:21:39 +0800 | [diff] [blame] | 30 | A machine may be split as follows with "numa=fake=4*512," as reported by dmesg:: |
David Rientjes | 2028019 | 2007-05-02 19:27:09 +0200 | [diff] [blame] | 31 | |
| 32 | Faking node 0 at 0000000000000000-0000000020000000 (512MB) |
| 33 | Faking node 1 at 0000000020000000-0000000040000000 (512MB) |
| 34 | Faking node 2 at 0000000040000000-0000000060000000 (512MB) |
| 35 | Faking node 3 at 0000000060000000-0000000080000000 (512MB) |
| 36 | ... |
| 37 | On node 0 totalpages: 130975 |
| 38 | On node 1 totalpages: 131072 |
| 39 | On node 2 totalpages: 131072 |
| 40 | On node 3 totalpages: 131072 |
| 41 | |
| 42 | Now following the instructions for mounting the cpusets filesystem from |
Mauro Carvalho Chehab | da82c92 | 2019-06-27 13:08:35 -0300 | [diff] [blame] | 43 | Documentation/admin-guide/cgroup-v1/cpusets.rst, you can assign fake nodes (i.e. contiguous memory |
Changbin Du | f0339db | 2019-05-08 23:21:39 +0800 | [diff] [blame] | 44 | address spaces) to individual cpusets:: |
David Rientjes | 2028019 | 2007-05-02 19:27:09 +0200 | [diff] [blame] | 45 | |
| 46 | [root@xroads /]# mkdir exampleset |
| 47 | [root@xroads /]# mount -t cpuset none exampleset |
| 48 | [root@xroads /]# mkdir exampleset/ddset |
| 49 | [root@xroads /]# cd exampleset/ddset |
| 50 | [root@xroads /exampleset/ddset]# echo 0-1 > cpus |
| 51 | [root@xroads /exampleset/ddset]# echo 0-1 > mems |
| 52 | |
| 53 | Now this cpuset, 'ddset', will only allowed access to fake nodes 0 and 1 for |
| 54 | memory allocations (1G). |
| 55 | |
| 56 | You can now assign tasks to these cpusets to limit the memory resources |
Changbin Du | f0339db | 2019-05-08 23:21:39 +0800 | [diff] [blame] | 57 | available to them according to the fake nodes assigned as mems:: |
David Rientjes | 2028019 | 2007-05-02 19:27:09 +0200 | [diff] [blame] | 58 | |
| 59 | [root@xroads /exampleset/ddset]# echo $$ > tasks |
| 60 | [root@xroads /exampleset/ddset]# dd if=/dev/zero of=tmp bs=1024 count=1G |
| 61 | [1] 13425 |
| 62 | |
| 63 | Notice the difference between the system memory usage as reported by |
| 64 | /proc/meminfo between the restricted cpuset case above and the unrestricted |
| 65 | case (i.e. running the same 'dd' command without assigning it to a fake NUMA |
| 66 | cpuset): |
Changbin Du | f0339db | 2019-05-08 23:21:39 +0800 | [diff] [blame] | 67 | |
| 68 | ======== ============ ========== |
| 69 | Name Unrestricted Restricted |
| 70 | ======== ============ ========== |
| 71 | MemTotal 3091900 kB 3091900 kB |
| 72 | MemFree 42113 kB 1513236 kB |
| 73 | ======== ============ ========== |
David Rientjes | 2028019 | 2007-05-02 19:27:09 +0200 | [diff] [blame] | 74 | |
| 75 | This allows for coarse memory management for the tasks you assign to particular |
| 76 | cpusets. Since cpusets can form a hierarchy, you can create some pretty |
| 77 | interesting combinations of use-cases for various classes of tasks for your |
| 78 | memory management needs. |