Blame - Documentation/admin-guide/cgroup-v1/hugetlb.rst - SHIFTPHONES/mainline/linux

blob: 0fa724d82abb60821fb3bce63e6ec859d0b629d2 [file] [log] [blame]

Mauro Carvalho Chehab	99c8b23	2019-06-12 14:52:41 -0300	[diff] [blame]	1	==================
Aneesh Kumar K.V	585e27e	2012-07-31 16:42:30 -0700	[diff] [blame]	2	HugeTLB Controller
Mauro Carvalho Chehab	99c8b23	2019-06-12 14:52:41 -0300	[diff] [blame]	3	==================
Aneesh Kumar K.V	585e27e	2012-07-31 16:42:30 -0700	[diff] [blame]	4
Aneesh Kumar K.V	585e27e	2012-07-31 16:42:30 -0700	[diff] [blame]	5	HugeTLB controller can be created by first mounting the cgroup filesystem.
				6
				7	# mount -t cgroup -o hugetlb none /sys/fs/cgroup
				8
				9	With the above step, the initial or the parent HugeTLB group becomes
				10	visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in
				11	the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup.
				12
Mauro Carvalho Chehab	99c8b23	2019-06-12 14:52:41 -0300	[diff] [blame]	13	New groups can be created under the parent group /sys/fs/cgroup::
Aneesh Kumar K.V	585e27e	2012-07-31 16:42:30 -0700	[diff] [blame]	14
Mauro Carvalho Chehab	99c8b23	2019-06-12 14:52:41 -0300	[diff] [blame]	15	# cd /sys/fs/cgroup
				16	# mkdir g1
				17	# echo $$ > g1/tasks
Aneesh Kumar K.V	585e27e	2012-07-31 16:42:30 -0700	[diff] [blame]	18
				19	The above steps create a new group g1 and move the current shell
				20	process (bash) into it.
				21
Mauro Carvalho Chehab	99c8b23	2019-06-12 14:52:41 -0300	[diff] [blame]	22	Brief summary of control files::
Aneesh Kumar K.V	585e27e	2012-07-31 16:42:30 -0700	[diff] [blame]	23
Mina Almasry	6566704	2020-04-01 21:11:41 -0700	[diff] [blame]	24	hugetlb.<hugepagesize>.rsvd.limit_in_bytes # set/show limit of "hugepagesize" hugetlb reservations
				25	hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes # show max "hugepagesize" hugetlb reservations and no-reserve faults
				26	hugetlb.<hugepagesize>.rsvd.usage_in_bytes # show current reservations and no-reserve faults for "hugepagesize" hugetlb
				27	hugetlb.<hugepagesize>.rsvd.failcnt # show the number of allocation failure due to HugeTLB reservation limit
				28	hugetlb.<hugepagesize>.limit_in_bytes # set/show limit of "hugepagesize" hugetlb faults
				29	hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded
				30	hugetlb.<hugepagesize>.usage_in_bytes # show current usage for "hugepagesize" hugetlb
				31	hugetlb.<hugepagesize>.failcnt # show the number of allocation failure due to HugeTLB usage limit
Mina Almasry	f477619	2022-01-14 14:07:48 -0800	[diff] [blame]	32	hugetlb.<hugepagesize>.numa_stat # show the numa information of the hugetlb memory charged to this cgroup
Aneesh Kumar K.V	585e27e	2012-07-31 16:42:30 -0700	[diff] [blame]	33
Odin Ugedal	8cfeb38	2019-05-30 00:24:25 +0200	[diff] [blame]	34	For a system supporting three hugepage sizes (64k, 32M and 1G), the control
Mauro Carvalho Chehab	99c8b23	2019-06-12 14:52:41 -0300	[diff] [blame]	35	files include::
Aneesh Kumar K.V	585e27e	2012-07-31 16:42:30 -0700	[diff] [blame]	36
Mauro Carvalho Chehab	99c8b23	2019-06-12 14:52:41 -0300	[diff] [blame]	37	hugetlb.1GB.limit_in_bytes
				38	hugetlb.1GB.max_usage_in_bytes
Mina Almasry	f477619	2022-01-14 14:07:48 -0800	[diff] [blame]	39	hugetlb.1GB.numa_stat
Mauro Carvalho Chehab	99c8b23	2019-06-12 14:52:41 -0300	[diff] [blame]	40	hugetlb.1GB.usage_in_bytes
				41	hugetlb.1GB.failcnt
Mina Almasry	6566704	2020-04-01 21:11:41 -0700	[diff] [blame]	42	hugetlb.1GB.rsvd.limit_in_bytes
				43	hugetlb.1GB.rsvd.max_usage_in_bytes
				44	hugetlb.1GB.rsvd.usage_in_bytes
				45	hugetlb.1GB.rsvd.failcnt
Mauro Carvalho Chehab	99c8b23	2019-06-12 14:52:41 -0300	[diff] [blame]	46	hugetlb.64KB.limit_in_bytes
				47	hugetlb.64KB.max_usage_in_bytes
Mina Almasry	f477619	2022-01-14 14:07:48 -0800	[diff] [blame]	48	hugetlb.64KB.numa_stat
Mauro Carvalho Chehab	99c8b23	2019-06-12 14:52:41 -0300	[diff] [blame]	49	hugetlb.64KB.usage_in_bytes
				50	hugetlb.64KB.failcnt
Mina Almasry	6566704	2020-04-01 21:11:41 -0700	[diff] [blame]	51	hugetlb.64KB.rsvd.limit_in_bytes
				52	hugetlb.64KB.rsvd.max_usage_in_bytes
				53	hugetlb.64KB.rsvd.usage_in_bytes
				54	hugetlb.64KB.rsvd.failcnt
Mauro Carvalho Chehab	99c8b23	2019-06-12 14:52:41 -0300	[diff] [blame]	55	hugetlb.32MB.limit_in_bytes
				56	hugetlb.32MB.max_usage_in_bytes
Mina Almasry	f477619	2022-01-14 14:07:48 -0800	[diff] [blame]	57	hugetlb.32MB.numa_stat
Mauro Carvalho Chehab	99c8b23	2019-06-12 14:52:41 -0300	[diff] [blame]	58	hugetlb.32MB.usage_in_bytes
				59	hugetlb.32MB.failcnt
Mina Almasry	6566704	2020-04-01 21:11:41 -0700	[diff] [blame]	60	hugetlb.32MB.rsvd.limit_in_bytes
				61	hugetlb.32MB.rsvd.max_usage_in_bytes
				62	hugetlb.32MB.rsvd.usage_in_bytes
				63	hugetlb.32MB.rsvd.failcnt
				64
				65
				66	1. Page fault accounting
				67
				68	hugetlb.<hugepagesize>.limit_in_bytes
				69	hugetlb.<hugepagesize>.max_usage_in_bytes
				70	hugetlb.<hugepagesize>.usage_in_bytes
				71	hugetlb.<hugepagesize>.failcnt
				72
				73	The HugeTLB controller allows users to limit the HugeTLB usage (page fault) per
				74	control group and enforces the limit during page fault. Since HugeTLB
				75	doesn't support page reclaim, enforcing the limit at page fault time implies
				76	that, the application will get SIGBUS signal if it tries to fault in HugeTLB
				77	pages beyond its limit. Therefore the application needs to know exactly how many
				78	HugeTLB pages it uses before hand, and the sysadmin needs to make sure that
				79	there are enough available on the machine for all the users to avoid processes
				80	getting SIGBUS.
				81
				82
				83	2. Reservation accounting
				84
				85	hugetlb.<hugepagesize>.rsvd.limit_in_bytes
				86	hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes
				87	hugetlb.<hugepagesize>.rsvd.usage_in_bytes
				88	hugetlb.<hugepagesize>.rsvd.failcnt
				89
				90	The HugeTLB controller allows to limit the HugeTLB reservations per control
				91	group and enforces the controller limit at reservation time and at the fault of
				92	HugeTLB memory for which no reservation exists. Since reservation limits are
				93	enforced at reservation time (on mmap or shget), reservation limits never causes
				94	the application to get SIGBUS signal if the memory was reserved before hand. For
				95	MAP_NORESERVE allocations, the reservation limit behaves the same as the fault
				96	limit, enforcing memory usage at fault time and causing the application to
				97	receive a SIGBUS if it's crossing its limit.
				98
				99	Reservation limits are superior to page fault limits described above, since
				100	reservation limits are enforced at reservation time (on mmap or shget), and
				101	never causes the application to get SIGBUS signal if the memory was reserved
				102	before hand. This allows for easier fallback to alternatives such as
				103	non-HugeTLB memory for example. In the case of page fault accounting, it's very
				104	hard to avoid processes getting SIGBUS since the sysadmin needs precisely know
				105	the HugeTLB usage of all the tasks in the system and make sure there is enough
				106	pages to satisfy all requests. Avoiding tasks getting SIGBUS on overcommited
				107	systems is practically impossible with page fault accounting.
				108
				109
				110	3. Caveats with shared memory
				111
				112	For shared HugeTLB memory, both HugeTLB reservation and page faults are charged
				113	to the first task that causes the memory to be reserved or faulted, and all
				114	subsequent uses of this reserved or faulted memory is done without charging.
				115
				116	Shared HugeTLB memory is only uncharged when it is unreserved or deallocated.
				117	This is usually when the HugeTLB file is deleted, and not when the task that
				118	caused the reservation or fault has exited.
				119
				120
				121	4. Caveats with HugeTLB cgroup offline.
				122
				123	When a HugeTLB cgroup goes offline with some reservations or faults still
				124	charged to it, the behavior is as follows:
				125
				126	- The fault charges are charged to the parent HugeTLB cgroup (reparented),
				127	- the reservation charges remain on the offline HugeTLB cgroup.
				128
				129	This means that if a HugeTLB cgroup gets offlined while there is still HugeTLB
				130	reservations charged to it, that cgroup persists as a zombie until all HugeTLB
				131	reservations are uncharged. HugeTLB reservations behave in this manner to match
				132	the memory controller whose cgroups also persist as zombie until all charged
				133	memory is uncharged. Also, the tracking of HugeTLB reservations is a bit more
				134	complex compared to the tracking of HugeTLB faults, so it is significantly
				135	harder to reparent reservations at offline time.