blob: 4d8848e4e224a8eb8207c8fb7469be57340f1671 [file] [log] [blame]
Fenghua Yuf20e5782016-10-28 15:04:40 -07001User Interface for Resource Allocation in Intel Resource Director Technology
2
3Copyright (C) 2016 Intel Corporation
4
5Fenghua Yu <fenghua.yu@intel.com>
6Tony Luck <tony.luck@intel.com>
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -07007Vikas Shivappa <vikas.shivappa@intel.com>
Fenghua Yuf20e5782016-10-28 15:04:40 -07008
Vikas Shivappa1640ae92017-07-25 14:14:21 -07009This feature is enabled by the CONFIG_INTEL_RDT Kconfig and the
10X86 /proc/cpuinfo flag bits "rdt", "cqm", "cat_l3" and "cdp_l3".
Fenghua Yuf20e5782016-10-28 15:04:40 -070011
12To use the feature mount the file system:
13
14 # mount -t resctrl resctrl [-o cdp] /sys/fs/resctrl
15
16mount options are:
17
18"cdp": Enable code/data prioritization in L3 cache allocations.
19
Vikas Shivappa1640ae92017-07-25 14:14:21 -070020RDT features are orthogonal. A particular system may support only
21monitoring, only control, or both monitoring and control.
22
23The mount succeeds if either of allocation or monitoring is present, but
24only those files and directories supported by the system will be created.
25For more details on the behavior of the interface during monitoring
26and allocation, see the "Resource alloc and monitor groups" section.
Fenghua Yuf20e5782016-10-28 15:04:40 -070027
Thomas Gleixner458b0d6e2016-11-07 11:58:12 +010028Info directory
29--------------
30
31The 'info' directory contains information about the enabled
32resources. Each resource has its own subdirectory. The subdirectory
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -070033names reflect the resource names.
Vikas Shivappa1640ae92017-07-25 14:14:21 -070034
35Each subdirectory contains the following files with respect to
36allocation:
37
38Cache resource(L3/L2) subdirectory contains the following files
39related to allocation:
Thomas Gleixner458b0d6e2016-11-07 11:58:12 +010040
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -070041"num_closids": The number of CLOSIDs which are valid for this
42 resource. The kernel uses the smallest number of
43 CLOSIDs of all enabled resources as limit.
Thomas Gleixner458b0d6e2016-11-07 11:58:12 +010044
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -070045"cbm_mask": The bitmask which is valid for this resource.
46 This mask is equivalent to 100%.
Thomas Gleixner458b0d6e2016-11-07 11:58:12 +010047
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -070048"min_cbm_bits": The minimum number of consecutive bits which
49 must be set when writing a mask.
Thomas Gleixner458b0d6e2016-11-07 11:58:12 +010050
Fenghua Yu0dd2d742017-07-25 15:39:04 -070051"shareable_bits": Bitmask of shareable resource with other executing
52 entities (e.g. I/O). User can use this when
53 setting up exclusive cache partitions. Note that
54 some platforms support devices that have their
55 own settings for cache use which can over-ride
56 these bits.
57
Vikas Shivappa1640ae92017-07-25 14:14:21 -070058Memory bandwitdh(MB) subdirectory contains the following files
59with respect to allocation:
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -070060
61"min_bandwidth": The minimum memory bandwidth percentage which
62 user can request.
63
64"bandwidth_gran": The granularity in which the memory bandwidth
65 percentage is allocated. The allocated
66 b/w percentage is rounded off to the next
67 control step available on the hardware. The
68 available bandwidth control steps are:
69 min_bandwidth + N * bandwidth_gran.
70
71"delay_linear": Indicates if the delay scale is linear or
72 non-linear. This field is purely informational
73 only.
Thomas Gleixner458b0d6e2016-11-07 11:58:12 +010074
Vikas Shivappa1640ae92017-07-25 14:14:21 -070075If RDT monitoring is available there will be an "L3_MON" directory
76with the following files:
77
78"num_rmids": The number of RMIDs available. This is the
79 upper bound for how many "CTRL_MON" + "MON"
80 groups can be created.
81
82"mon_features": Lists the monitoring events if
83 monitoring is enabled for the resource.
84
85"max_threshold_occupancy":
86 Read/write file provides the largest value (in
87 bytes) at which a previously used LLC_occupancy
88 counter can be considered for re-use.
89
90
91Resource alloc and monitor groups
92---------------------------------
93
Fenghua Yuf20e5782016-10-28 15:04:40 -070094Resource groups are represented as directories in the resctrl file
Vikas Shivappa1640ae92017-07-25 14:14:21 -070095system. The default group is the root directory which, immediately
96after mounting, owns all the tasks and cpus in the system and can make
97full use of all resources.
Fenghua Yuf20e5782016-10-28 15:04:40 -070098
Vikas Shivappa1640ae92017-07-25 14:14:21 -070099On a system with RDT control features additional directories can be
100created in the root directory that specify different amounts of each
101resource (see "schemata" below). The root and these additional top level
102directories are referred to as "CTRL_MON" groups below.
Fenghua Yuf20e5782016-10-28 15:04:40 -0700103
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700104On a system with RDT monitoring the root directory and other top level
105directories contain a directory named "mon_groups" in which additional
106directories can be created to monitor subsets of tasks in the CTRL_MON
107group that is their ancestor. These are called "MON" groups in the rest
108of this document.
Fenghua Yuf20e5782016-10-28 15:04:40 -0700109
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700110Removing a directory will move all tasks and cpus owned by the group it
111represents to the parent. Removing one of the created CTRL_MON groups
112will automatically remove all MON groups below it.
Fenghua Yuf20e5782016-10-28 15:04:40 -0700113
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700114All groups contain the following files:
Jiri Olsa4ffa3c92017-04-10 16:52:32 +0200115
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700116"tasks":
117 Reading this file shows the list of all tasks that belong to
118 this group. Writing a task id to the file will add a task to the
119 group. If the group is a CTRL_MON group the task is removed from
120 whichever previous CTRL_MON group owned the task and also from
121 any MON group that owned the task. If the group is a MON group,
122 then the task must already belong to the CTRL_MON parent of this
123 group. The task is removed from any previous MON group.
Fenghua Yuf20e5782016-10-28 15:04:40 -0700124
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700125
126"cpus":
127 Reading this file shows a bitmask of the logical CPUs owned by
128 this group. Writing a mask to this file will add and remove
129 CPUs to/from this group. As with the tasks file a hierarchy is
130 maintained where MON groups may only include CPUs owned by the
131 parent CTRL_MON group.
132
133
134"cpus_list":
135 Just like "cpus", only using ranges of CPUs instead of bitmasks.
136
137
138When control is enabled all CTRL_MON groups will also contain:
139
140"schemata":
141 A list of all the resources available to this group.
142 Each resource has its own line and format - see below for details.
143
144When monitoring is enabled all MON groups will also contain:
145
146"mon_data":
147 This contains a set of files organized by L3 domain and by
148 RDT event. E.g. on a system with two L3 domains there will
149 be subdirectories "mon_L3_00" and "mon_L3_01". Each of these
150 directories have one file per event (e.g. "llc_occupancy",
151 "mbm_total_bytes", and "mbm_local_bytes"). In a MON group these
152 files provide a read out of the current value of the event for
153 all tasks in the group. In CTRL_MON groups these files provide
154 the sum for all tasks in the CTRL_MON group and all tasks in
155 MON groups. Please see example section for more details on usage.
156
157Resource allocation rules
158-------------------------
159When a task is running the following rules define which resources are
160available to it:
Fenghua Yuf20e5782016-10-28 15:04:40 -0700161
1621) If the task is a member of a non-default group, then the schemata
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700163 for that group is used.
Fenghua Yuf20e5782016-10-28 15:04:40 -0700164
1652) Else if the task belongs to the default group, but is running on a
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700166 CPU that is assigned to some specific group, then the schemata for the
167 CPU's group is used.
Fenghua Yuf20e5782016-10-28 15:04:40 -0700168
1693) Otherwise the schemata for the default group is used.
170
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700171Resource monitoring rules
172-------------------------
1731) If a task is a member of a MON group, or non-default CTRL_MON group
174 then RDT events for the task will be reported in that group.
175
1762) If a task is a member of the default CTRL_MON group, but is running
177 on a CPU that is assigned to some specific group, then the RDT events
178 for the task will be reported in that group.
179
1803) Otherwise RDT events for the task will be reported in the root level
181 "mon_data" group.
182
183
184Notes on cache occupancy monitoring and control
185-----------------------------------------------
186When moving a task from one group to another you should remember that
187this only affects *new* cache allocations by the task. E.g. you may have
188a task in a monitor group showing 3 MB of cache occupancy. If you move
189to a new group and immediately check the occupancy of the old and new
190groups you will likely see that the old group is still showing 3 MB and
191the new group zero. When the task accesses locations still in cache from
192before the move, the h/w does not update any counters. On a busy system
193you will likely see the occupancy in the old group go down as cache lines
194are evicted and re-used while the occupancy in the new group rises as
195the task accesses memory and loads into the cache are counted based on
196membership in the new group.
197
198The same applies to cache allocation control. Moving a task to a group
199with a smaller cache partition will not evict any cache lines. The
200process may continue to use them from the old partition.
201
202Hardware uses CLOSid(Class of service ID) and an RMID(Resource monitoring ID)
203to identify a control group and a monitoring group respectively. Each of
204the resource groups are mapped to these IDs based on the kind of group. The
205number of CLOSid and RMID are limited by the hardware and hence the creation of
206a "CTRL_MON" directory may fail if we run out of either CLOSID or RMID
207and creation of "MON" group may fail if we run out of RMIDs.
208
209max_threshold_occupancy - generic concepts
210------------------------------------------
211
212Note that an RMID once freed may not be immediately available for use as
213the RMID is still tagged the cache lines of the previous user of RMID.
214Hence such RMIDs are placed on limbo list and checked back if the cache
215occupancy has gone down. If there is a time when system has a lot of
216limbo RMIDs but which are not ready to be used, user may see an -EBUSY
217during mkdir.
218
219max_threshold_occupancy is a user configurable value to determine the
220occupancy at which an RMID can be freed.
Fenghua Yuf20e5782016-10-28 15:04:40 -0700221
222Schemata files - general concepts
223---------------------------------
224Each line in the file describes one resource. The line starts with
225the name of the resource, followed by specific values to be applied
226in each of the instances of that resource on the system.
227
228Cache IDs
229---------
230On current generation systems there is one L3 cache per socket and L2
231caches are generally just shared by the hyperthreads on a core, but this
232isn't an architectural requirement. We could have multiple separate L3
233caches on a socket, multiple cores could share an L2 cache. So instead
234of using "socket" or "core" to define the set of logical cpus sharing
235a resource we use a "Cache ID". At a given cache level this will be a
236unique number across the whole system (but it isn't guaranteed to be a
237contiguous sequence, there may be gaps). To find the ID for each logical
238CPU look in /sys/devices/system/cpu/cpu*/cache/index*/id
239
240Cache Bit Masks (CBM)
241---------------------
242For cache resources we describe the portion of the cache that is available
243for allocation using a bitmask. The maximum value of the mask is defined
244by each cpu model (and may be different for different cache levels). It
245is found using CPUID, but is also provided in the "info" directory of
246the resctrl file system in "info/{resource}/cbm_mask". X86 hardware
247requires that these masks have all the '1' bits in a contiguous block. So
2480x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9
249and 0xA are not. On a system with a 20-bit mask each bit represents 5%
250of the capacity of the cache. You could partition the cache into four
251equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
252
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -0700253Memory bandwidth(b/w) percentage
254--------------------------------
255For Memory b/w resource, user controls the resource by indicating the
256percentage of total memory b/w.
257
258The minimum bandwidth percentage value for each cpu model is predefined
259and can be looked up through "info/MB/min_bandwidth". The bandwidth
260granularity that is allocated is also dependent on the cpu model and can
261be looked up at "info/MB/bandwidth_gran". The available bandwidth
262control steps are: min_bw + N * bw_gran. Intermediate values are rounded
263to the next control step available on the hardware.
264
265The bandwidth throttling is a core specific mechanism on some of Intel
266SKUs. Using a high bandwidth and a low bandwidth setting on two threads
267sharing a core will result in both threads being throttled to use the
268low bandwidth.
Fenghua Yuf20e5782016-10-28 15:04:40 -0700269
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700270L3 schemata file details (code and data prioritization disabled)
271----------------------------------------------------------------
Fenghua Yuf20e5782016-10-28 15:04:40 -0700272With CDP disabled the L3 schemata format is:
273
274 L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
275
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700276L3 schemata file details (CDP enabled via mount option to resctrl)
277------------------------------------------------------------------
Fenghua Yuf20e5782016-10-28 15:04:40 -0700278When CDP is enabled L3 control is split into two separate resources
279so you can specify independent masks for code and data like this:
280
281 L3data:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
282 L3code:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
283
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700284L2 schemata file details
285------------------------
Fenghua Yuf20e5782016-10-28 15:04:40 -0700286L2 cache does not support code and data prioritization, so the
287schemata format is always:
288
289 L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
290
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -0700291Memory b/w Allocation details
292-----------------------------
293
294Memory b/w domain is L3 cache.
295
296 MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...
297
Tony Luckc4026b7b2017-04-03 14:44:16 -0700298Reading/writing the schemata file
299---------------------------------
300Reading the schemata file will show the state of all resources
301on all domains. When writing you only need to specify those values
302which you wish to change. E.g.
303
304# cat schemata
305L3DATA:0=fffff;1=fffff;2=fffff;3=fffff
306L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
307# echo "L3DATA:2=3c0;" > schemata
308# cat schemata
309L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
310L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
311
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700312Examples for RDT allocation usage:
313
Fenghua Yuf20e5782016-10-28 15:04:40 -0700314Example 1
315---------
316On a two socket machine (one L3 cache per socket) with just four bits
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -0700317for cache bit masks, minimum b/w of 10% with a memory bandwidth
318granularity of 10%
Fenghua Yuf20e5782016-10-28 15:04:40 -0700319
320# mount -t resctrl resctrl /sys/fs/resctrl
321# cd /sys/fs/resctrl
322# mkdir p0 p1
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -0700323# echo "L3:0=3;1=c\nMB:0=50;1=50" > /sys/fs/resctrl/p0/schemata
324# echo "L3:0=3;1=3\nMB:0=50;1=50" > /sys/fs/resctrl/p1/schemata
Fenghua Yuf20e5782016-10-28 15:04:40 -0700325
326The default resource group is unmodified, so we have access to all parts
327of all caches (its schemata file reads "L3:0=f;1=f").
328
329Tasks that are under the control of group "p0" may only allocate from the
330"lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1.
331Tasks in group "p1" use the "lower" 50% of cache on both sockets.
332
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -0700333Similarly, tasks that are under the control of group "p0" may use a
334maximum memory b/w of 50% on socket0 and 50% on socket 1.
335Tasks in group "p1" may also use 50% memory b/w on both sockets.
336Note that unlike cache masks, memory b/w cannot specify whether these
337allocations can overlap or not. The allocations specifies the maximum
338b/w that the group may be able to use and the system admin can configure
339the b/w accordingly.
340
Fenghua Yuf20e5782016-10-28 15:04:40 -0700341Example 2
342---------
343Again two sockets, but this time with a more realistic 20-bit mask.
344
345Two real time tasks pid=1234 running on processor 0 and pid=5678 running on
346processor 1 on socket 0 on a 2-socket and dual core machine. To avoid noisy
347neighbors, each of the two real-time tasks exclusively occupies one quarter
348of L3 cache on socket 0.
349
350# mount -t resctrl resctrl /sys/fs/resctrl
351# cd /sys/fs/resctrl
352
353First we reset the schemata for the default group so that the "upper"
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -070035450% of the L3 cache on socket 0 and 50% of memory b/w cannot be used by
355ordinary tasks:
Fenghua Yuf20e5782016-10-28 15:04:40 -0700356
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -0700357# echo "L3:0=3ff;1=fffff\nMB:0=50;1=100" > schemata
Fenghua Yuf20e5782016-10-28 15:04:40 -0700358
359Next we make a resource group for our first real time task and give
360it access to the "top" 25% of the cache on socket 0.
361
362# mkdir p0
363# echo "L3:0=f8000;1=fffff" > p0/schemata
364
365Finally we move our first real time task into this resource group. We
366also use taskset(1) to ensure the task always runs on a dedicated CPU
367on socket 0. Most uses of resource groups will also constrain which
368processors tasks run on.
369
370# echo 1234 > p0/tasks
371# taskset -cp 1 1234
372
373Ditto for the second real time task (with the remaining 25% of cache):
374
375# mkdir p1
376# echo "L3:0=7c00;1=fffff" > p1/schemata
377# echo 5678 > p1/tasks
378# taskset -cp 2 5678
379
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -0700380For the same 2 socket system with memory b/w resource and CAT L3 the
381schemata would look like(Assume min_bandwidth 10 and bandwidth_gran is
38210):
383
384For our first real time task this would request 20% memory b/w on socket
3850.
386
387# echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata
388
389For our second real time task this would request an other 20% memory b/w
390on socket 0.
391
392# echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata
393
Fenghua Yuf20e5782016-10-28 15:04:40 -0700394Example 3
395---------
396
397A single socket system which has real-time tasks running on core 4-7 and
398non real-time workload assigned to core 0-3. The real-time tasks share text
399and data, so a per task association is not required and due to interaction
400with the kernel it's desired that the kernel on these cores shares L3 with
401the tasks.
402
403# mount -t resctrl resctrl /sys/fs/resctrl
404# cd /sys/fs/resctrl
405
406First we reset the schemata for the default group so that the "upper"
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -070040750% of the L3 cache on socket 0, and 50% of memory bandwidth on socket 0
408cannot be used by ordinary tasks:
Fenghua Yuf20e5782016-10-28 15:04:40 -0700409
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -0700410# echo "L3:0=3ff\nMB:0=50" > schemata
Fenghua Yuf20e5782016-10-28 15:04:40 -0700411
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -0700412Next we make a resource group for our real time cores and give it access
413to the "top" 50% of the cache on socket 0 and 50% of memory bandwidth on
414socket 0.
Fenghua Yuf20e5782016-10-28 15:04:40 -0700415
416# mkdir p0
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -0700417# echo "L3:0=ffc00\nMB:0=50" > p0/schemata
Fenghua Yuf20e5782016-10-28 15:04:40 -0700418
419Finally we move core 4-7 over to the new group and make sure that the
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -0700420kernel and the tasks running there get 50% of the cache. They should
421also get 50% of memory bandwidth assuming that the cores 4-7 are SMT
422siblings and only the real time threads are scheduled on the cores 4-7.
Fenghua Yuf20e5782016-10-28 15:04:40 -0700423
Xiaochen Shenfb8fb462017-05-03 11:15:56 +0800424# echo F0 > p0/cpus
Marcelo Tosatti3c2a7692016-12-14 15:08:37 -0200425
4264) Locking between applications
427
428Certain operations on the resctrl filesystem, composed of read/writes
429to/from multiple files, must be atomic.
430
431As an example, the allocation of an exclusive reservation of L3 cache
432involves:
433
434 1. Read the cbmmasks from each directory
435 2. Find a contiguous set of bits in the global CBM bitmask that is clear
436 in any of the directory cbmmasks
437 3. Create a new directory
438 4. Set the bits found in step 2 to the new directory "schemata" file
439
440If two applications attempt to allocate space concurrently then they can
441end up allocating the same bits so the reservations are shared instead of
442exclusive.
443
444To coordinate atomic operations on the resctrlfs and to avoid the problem
445above, the following locking procedure is recommended:
446
447Locking is based on flock, which is available in libc and also as a shell
448script command
449
450Write lock:
451
452 A) Take flock(LOCK_EX) on /sys/fs/resctrl
453 B) Read/write the directory structure.
454 C) funlock
455
456Read lock:
457
458 A) Take flock(LOCK_SH) on /sys/fs/resctrl
459 B) If success read the directory structure.
460 C) funlock
461
462Example with bash:
463
464# Atomically read directory structure
465$ flock -s /sys/fs/resctrl/ find /sys/fs/resctrl
466
467# Read directory contents and create new subdirectory
468
469$ cat create-dir.sh
470find /sys/fs/resctrl/ > output.txt
471mask = function-of(output.txt)
472mkdir /sys/fs/resctrl/newres/
473echo mask > /sys/fs/resctrl/newres/schemata
474
475$ flock /sys/fs/resctrl/ ./create-dir.sh
476
477Example with C:
478
479/*
480 * Example code do take advisory locks
481 * before accessing resctrl filesystem
482 */
483#include <sys/file.h>
484#include <stdlib.h>
485
486void resctrl_take_shared_lock(int fd)
487{
488 int ret;
489
490 /* take shared lock on resctrl filesystem */
491 ret = flock(fd, LOCK_SH);
492 if (ret) {
493 perror("flock");
494 exit(-1);
495 }
496}
497
498void resctrl_take_exclusive_lock(int fd)
499{
500 int ret;
501
502 /* release lock on resctrl filesystem */
503 ret = flock(fd, LOCK_EX);
504 if (ret) {
505 perror("flock");
506 exit(-1);
507 }
508}
509
510void resctrl_release_lock(int fd)
511{
512 int ret;
513
514 /* take shared lock on resctrl filesystem */
515 ret = flock(fd, LOCK_UN);
516 if (ret) {
517 perror("flock");
518 exit(-1);
519 }
520}
521
522void main(void)
523{
524 int fd, ret;
525
526 fd = open("/sys/fs/resctrl", O_DIRECTORY);
527 if (fd == -1) {
528 perror("open");
529 exit(-1);
530 }
531 resctrl_take_shared_lock(fd);
532 /* code to read directory contents */
533 resctrl_release_lock(fd);
534
535 resctrl_take_exclusive_lock(fd);
536 /* code to read and write directory contents */
537 resctrl_release_lock(fd);
538}
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700539
540Examples for RDT Monitoring along with allocation usage:
541
542Reading monitored data
543----------------------
544Reading an event file (for ex: mon_data/mon_L3_00/llc_occupancy) would
545show the current snapshot of LLC occupancy of the corresponding MON
546group or CTRL_MON group.
547
548
549Example 1 (Monitor CTRL_MON group and subset of tasks in CTRL_MON group)
550---------
551On a two socket machine (one L3 cache per socket) with just four bits
552for cache bit masks
553
554# mount -t resctrl resctrl /sys/fs/resctrl
555# cd /sys/fs/resctrl
556# mkdir p0 p1
557# echo "L3:0=3;1=c" > /sys/fs/resctrl/p0/schemata
558# echo "L3:0=3;1=3" > /sys/fs/resctrl/p1/schemata
559# echo 5678 > p1/tasks
560# echo 5679 > p1/tasks
561
562The default resource group is unmodified, so we have access to all parts
563of all caches (its schemata file reads "L3:0=f;1=f").
564
565Tasks that are under the control of group "p0" may only allocate from the
566"lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1.
567Tasks in group "p1" use the "lower" 50% of cache on both sockets.
568
569Create monitor groups and assign a subset of tasks to each monitor group.
570
571# cd /sys/fs/resctrl/p1/mon_groups
572# mkdir m11 m12
573# echo 5678 > m11/tasks
574# echo 5679 > m12/tasks
575
576fetch data (data shown in bytes)
577
578# cat m11/mon_data/mon_L3_00/llc_occupancy
57916234000
580# cat m11/mon_data/mon_L3_01/llc_occupancy
58114789000
582# cat m12/mon_data/mon_L3_00/llc_occupancy
58316789000
584
585The parent ctrl_mon group shows the aggregated data.
586
587# cat /sys/fs/resctrl/p1/mon_data/mon_l3_00/llc_occupancy
58831234000
589
590Example 2 (Monitor a task from its creation)
591---------
592On a two socket machine (one L3 cache per socket)
593
594# mount -t resctrl resctrl /sys/fs/resctrl
595# cd /sys/fs/resctrl
596# mkdir p0 p1
597
598An RMID is allocated to the group once its created and hence the <cmd>
599below is monitored from its creation.
600
601# echo $$ > /sys/fs/resctrl/p1/tasks
602# <cmd>
603
604Fetch the data
605
606# cat /sys/fs/resctrl/p1/mon_data/mon_l3_00/llc_occupancy
60731789000
608
609Example 3 (Monitor without CAT support or before creating CAT groups)
610---------
611
612Assume a system like HSW has only CQM and no CAT support. In this case
613the resctrl will still mount but cannot create CTRL_MON directories.
614But user can create different MON groups within the root group thereby
615able to monitor all tasks including kernel threads.
616
617This can also be used to profile jobs cache size footprint before being
618able to allocate them to different allocation groups.
619
620# mount -t resctrl resctrl /sys/fs/resctrl
621# cd /sys/fs/resctrl
622# mkdir mon_groups/m01
623# mkdir mon_groups/m02
624
625# echo 3478 > /sys/fs/resctrl/mon_groups/m01/tasks
626# echo 2467 > /sys/fs/resctrl/mon_groups/m02/tasks
627
628Monitor the groups separately and also get per domain data. From the
629below its apparent that the tasks are mostly doing work on
630domain(socket) 0.
631
632# cat /sys/fs/resctrl/mon_groups/m01/mon_L3_00/llc_occupancy
63331234000
634# cat /sys/fs/resctrl/mon_groups/m01/mon_L3_01/llc_occupancy
63534555
636# cat /sys/fs/resctrl/mon_groups/m02/mon_L3_00/llc_occupancy
63731234000
638# cat /sys/fs/resctrl/mon_groups/m02/mon_L3_01/llc_occupancy
63932789
640
641
642Example 4 (Monitor real time tasks)
643-----------------------------------
644
645A single socket system which has real time tasks running on cores 4-7
646and non real time tasks on other cpus. We want to monitor the cache
647occupancy of the real time threads on these cores.
648
649# mount -t resctrl resctrl /sys/fs/resctrl
650# cd /sys/fs/resctrl
651# mkdir p1
652
653Move the cpus 4-7 over to p1
654# echo f0 > p0/cpus
655
656View the llc occupancy snapshot
657
658# cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/llc_occupancy
65911234000