blob: 0fa724d82abb60821fb3bce63e6ec859d0b629d2 [file] [log] [blame]
Mauro Carvalho Chehab99c8b232019-06-12 14:52:41 -03001==================
Aneesh Kumar K.V585e27e2012-07-31 16:42:30 -07002HugeTLB Controller
Mauro Carvalho Chehab99c8b232019-06-12 14:52:41 -03003==================
Aneesh Kumar K.V585e27e2012-07-31 16:42:30 -07004
Aneesh Kumar K.V585e27e2012-07-31 16:42:30 -07005HugeTLB controller can be created by first mounting the cgroup filesystem.
6
7# mount -t cgroup -o hugetlb none /sys/fs/cgroup
8
9With the above step, the initial or the parent HugeTLB group becomes
10visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in
11the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup.
12
Mauro Carvalho Chehab99c8b232019-06-12 14:52:41 -030013New groups can be created under the parent group /sys/fs/cgroup::
Aneesh Kumar K.V585e27e2012-07-31 16:42:30 -070014
Mauro Carvalho Chehab99c8b232019-06-12 14:52:41 -030015 # cd /sys/fs/cgroup
16 # mkdir g1
17 # echo $$ > g1/tasks
Aneesh Kumar K.V585e27e2012-07-31 16:42:30 -070018
19The above steps create a new group g1 and move the current shell
20process (bash) into it.
21
Mauro Carvalho Chehab99c8b232019-06-12 14:52:41 -030022Brief summary of control files::
Aneesh Kumar K.V585e27e2012-07-31 16:42:30 -070023
Mina Almasry65667042020-04-01 21:11:41 -070024 hugetlb.<hugepagesize>.rsvd.limit_in_bytes # set/show limit of "hugepagesize" hugetlb reservations
25 hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes # show max "hugepagesize" hugetlb reservations and no-reserve faults
26 hugetlb.<hugepagesize>.rsvd.usage_in_bytes # show current reservations and no-reserve faults for "hugepagesize" hugetlb
27 hugetlb.<hugepagesize>.rsvd.failcnt # show the number of allocation failure due to HugeTLB reservation limit
28 hugetlb.<hugepagesize>.limit_in_bytes # set/show limit of "hugepagesize" hugetlb faults
29 hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded
30 hugetlb.<hugepagesize>.usage_in_bytes # show current usage for "hugepagesize" hugetlb
31 hugetlb.<hugepagesize>.failcnt # show the number of allocation failure due to HugeTLB usage limit
Mina Almasryf4776192022-01-14 14:07:48 -080032 hugetlb.<hugepagesize>.numa_stat # show the numa information of the hugetlb memory charged to this cgroup
Aneesh Kumar K.V585e27e2012-07-31 16:42:30 -070033
Odin Ugedal8cfeb382019-05-30 00:24:25 +020034For a system supporting three hugepage sizes (64k, 32M and 1G), the control
Mauro Carvalho Chehab99c8b232019-06-12 14:52:41 -030035files include::
Aneesh Kumar K.V585e27e2012-07-31 16:42:30 -070036
Mauro Carvalho Chehab99c8b232019-06-12 14:52:41 -030037 hugetlb.1GB.limit_in_bytes
38 hugetlb.1GB.max_usage_in_bytes
Mina Almasryf4776192022-01-14 14:07:48 -080039 hugetlb.1GB.numa_stat
Mauro Carvalho Chehab99c8b232019-06-12 14:52:41 -030040 hugetlb.1GB.usage_in_bytes
41 hugetlb.1GB.failcnt
Mina Almasry65667042020-04-01 21:11:41 -070042 hugetlb.1GB.rsvd.limit_in_bytes
43 hugetlb.1GB.rsvd.max_usage_in_bytes
44 hugetlb.1GB.rsvd.usage_in_bytes
45 hugetlb.1GB.rsvd.failcnt
Mauro Carvalho Chehab99c8b232019-06-12 14:52:41 -030046 hugetlb.64KB.limit_in_bytes
47 hugetlb.64KB.max_usage_in_bytes
Mina Almasryf4776192022-01-14 14:07:48 -080048 hugetlb.64KB.numa_stat
Mauro Carvalho Chehab99c8b232019-06-12 14:52:41 -030049 hugetlb.64KB.usage_in_bytes
50 hugetlb.64KB.failcnt
Mina Almasry65667042020-04-01 21:11:41 -070051 hugetlb.64KB.rsvd.limit_in_bytes
52 hugetlb.64KB.rsvd.max_usage_in_bytes
53 hugetlb.64KB.rsvd.usage_in_bytes
54 hugetlb.64KB.rsvd.failcnt
Mauro Carvalho Chehab99c8b232019-06-12 14:52:41 -030055 hugetlb.32MB.limit_in_bytes
56 hugetlb.32MB.max_usage_in_bytes
Mina Almasryf4776192022-01-14 14:07:48 -080057 hugetlb.32MB.numa_stat
Mauro Carvalho Chehab99c8b232019-06-12 14:52:41 -030058 hugetlb.32MB.usage_in_bytes
59 hugetlb.32MB.failcnt
Mina Almasry65667042020-04-01 21:11:41 -070060 hugetlb.32MB.rsvd.limit_in_bytes
61 hugetlb.32MB.rsvd.max_usage_in_bytes
62 hugetlb.32MB.rsvd.usage_in_bytes
63 hugetlb.32MB.rsvd.failcnt
64
65
661. Page fault accounting
67
68hugetlb.<hugepagesize>.limit_in_bytes
69hugetlb.<hugepagesize>.max_usage_in_bytes
70hugetlb.<hugepagesize>.usage_in_bytes
71hugetlb.<hugepagesize>.failcnt
72
73The HugeTLB controller allows users to limit the HugeTLB usage (page fault) per
74control group and enforces the limit during page fault. Since HugeTLB
75doesn't support page reclaim, enforcing the limit at page fault time implies
76that, the application will get SIGBUS signal if it tries to fault in HugeTLB
77pages beyond its limit. Therefore the application needs to know exactly how many
78HugeTLB pages it uses before hand, and the sysadmin needs to make sure that
79there are enough available on the machine for all the users to avoid processes
80getting SIGBUS.
81
82
832. Reservation accounting
84
85hugetlb.<hugepagesize>.rsvd.limit_in_bytes
86hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes
87hugetlb.<hugepagesize>.rsvd.usage_in_bytes
88hugetlb.<hugepagesize>.rsvd.failcnt
89
90The HugeTLB controller allows to limit the HugeTLB reservations per control
91group and enforces the controller limit at reservation time and at the fault of
92HugeTLB memory for which no reservation exists. Since reservation limits are
93enforced at reservation time (on mmap or shget), reservation limits never causes
94the application to get SIGBUS signal if the memory was reserved before hand. For
95MAP_NORESERVE allocations, the reservation limit behaves the same as the fault
96limit, enforcing memory usage at fault time and causing the application to
97receive a SIGBUS if it's crossing its limit.
98
99Reservation limits are superior to page fault limits described above, since
100reservation limits are enforced at reservation time (on mmap or shget), and
101never causes the application to get SIGBUS signal if the memory was reserved
102before hand. This allows for easier fallback to alternatives such as
103non-HugeTLB memory for example. In the case of page fault accounting, it's very
104hard to avoid processes getting SIGBUS since the sysadmin needs precisely know
105the HugeTLB usage of all the tasks in the system and make sure there is enough
106pages to satisfy all requests. Avoiding tasks getting SIGBUS on overcommited
107systems is practically impossible with page fault accounting.
108
109
1103. Caveats with shared memory
111
112For shared HugeTLB memory, both HugeTLB reservation and page faults are charged
113to the first task that causes the memory to be reserved or faulted, and all
114subsequent uses of this reserved or faulted memory is done without charging.
115
116Shared HugeTLB memory is only uncharged when it is unreserved or deallocated.
117This is usually when the HugeTLB file is deleted, and not when the task that
118caused the reservation or fault has exited.
119
120
1214. Caveats with HugeTLB cgroup offline.
122
123When a HugeTLB cgroup goes offline with some reservations or faults still
124charged to it, the behavior is as follows:
125
126- The fault charges are charged to the parent HugeTLB cgroup (reparented),
127- the reservation charges remain on the offline HugeTLB cgroup.
128
129This means that if a HugeTLB cgroup gets offlined while there is still HugeTLB
130reservations charged to it, that cgroup persists as a zombie until all HugeTLB
131reservations are uncharged. HugeTLB reservations behave in this manner to match
132the memory controller whose cgroups also persist as zombie until all charged
133memory is uncharged. Also, the tracking of HugeTLB reservations is a bit more
134complex compared to the tracking of HugeTLB faults, so it is significantly
135harder to reparent reservations at offline time.