Blame - Documentation/x86/orc-unwinder.txt - SHIFTPHONES/kernel/common

blob: cd4b29be29af1e84162f06b86f306183b1df5d22 [file] [log] [blame]

Josh Poimboeuf	ee9f8fc	2017-07-24 18:36:57 -0500	[diff] [blame]	1	ORC unwinder
				2	============
				3
				4	Overview
				5	--------
				6
Josh Poimboeuf	11af847	2017-10-13 15:02:00 -0500	[diff] [blame]	7	The kernel CONFIG_UNWINDER_ORC option enables the ORC unwinder, which is
Josh Poimboeuf	ee9f8fc	2017-07-24 18:36:57 -0500	[diff] [blame]	8	similar in concept to a DWARF unwinder. The difference is that the
				9	format of the ORC data is much simpler than DWARF, which in turn allows
				10	the ORC unwinder to be much simpler and faster.
				11
				12	The ORC data consists of unwind tables which are generated by objtool.
				13	They contain out-of-band data which is used by the in-kernel ORC
				14	unwinder. Objtool generates the ORC data by first doing compile-time
				15	stack metadata validation (CONFIG_STACK_VALIDATION). After analyzing
				16	all the code paths of a .o file, it determines information about the
				17	stack state at each instruction address in the file and outputs that
				18	information to the .orc_unwind and .orc_unwind_ip sections.
				19
				20	The per-object ORC sections are combined at link time and are sorted and
				21	post-processed at boot time. The unwinder uses the resulting data to
				22	correlate instruction addresses with their stack states at run time.
				23
				24
				25	ORC vs frame pointers
				26	---------------------
				27
				28	With frame pointers enabled, GCC adds instrumentation code to every
				29	function in the kernel. The kernel's .text size increases by about
				30	3.2%, resulting in a broad kernel-wide slowdown. Measurements by Mel
				31	Gorman [1] have shown a slowdown of 5-10% for some workloads.
				32
				33	In contrast, the ORC unwinder has no effect on text size or runtime
				34	performance, because the debuginfo is out of band. So if you disable
				35	frame pointers and enable the ORC unwinder, you get a nice performance
				36	improvement across the board, and still have reliable stack traces.
				37
				38	Ingo Molnar says:
				39
				40	"Note that it's not just a performance improvement, but also an
				41	instruction cache locality improvement: 3.2% .text savings almost
				42	directly transform into a similarly sized reduction in cache
				43	footprint. That can transform to even higher speedups for workloads
				44	whose cache locality is borderline."
				45
				46	Another benefit of ORC compared to frame pointers is that it can
				47	reliably unwind across interrupts and exceptions. Frame pointer based
				48	unwinds can sometimes skip the caller of the interrupted function, if it
				49	was a leaf function or if the interrupt hit before the frame pointer was
				50	saved.
				51
				52	The main disadvantage of the ORC unwinder compared to frame pointers is
				53	that it needs more memory to store the ORC unwind tables: roughly 2-4MB
				54	depending on the kernel config.
				55
				56
				57	ORC vs DWARF
				58	------------
				59
				60	ORC debuginfo's advantage over DWARF itself is that it's much simpler.
				61	It gets rid of the complex DWARF CFI state machine and also gets rid of
				62	the tracking of unnecessary registers. This allows the unwinder to be
				63	much simpler, meaning fewer bugs, which is especially important for
				64	mission critical oops code.
				65
				66	The simpler debuginfo format also enables the unwinder to be much faster
				67	than DWARF, which is important for perf and lockdep. In a basic
				68	performance test by Jiri Slaby [2], the ORC unwinder was about 20x
				69	faster than an out-of-tree DWARF unwinder. (Note: That measurement was
				70	taken before some performance tweaks were added, which doubled
				71	performance, so the speedup over DWARF may be closer to 40x.)
				72
				73	The ORC data format does have a few downsides compared to DWARF. ORC
				74	unwind tables take up ~50% more RAM (+1.3MB on an x86 defconfig kernel)
				75	than DWARF-based eh_frame tables.
				76
				77	Another potential downside is that, as GCC evolves, it's conceivable
				78	that the ORC data may end up being too simple to describe the state of
				79	the stack for certain optimizations. But IMO this is unlikely because
				80	GCC saves the frame pointer for any unusual stack adjustments it does,
				81	so I suspect we'll really only ever need to keep track of the stack
				82	pointer and the frame pointer between call frames. But even if we do
				83	end up having to track all the registers DWARF tracks, at least we will
				84	still be able to control the format, e.g. no complex state machines.
				85
				86
				87	ORC unwind table generation
				88	---------------------------
				89
				90	The ORC data is generated by objtool. With the existing compile-time
				91	stack metadata validation feature, objtool already follows all code
				92	paths, and so it already has all the information it needs to be able to
				93	generate ORC data from scratch. So it's an easy step to go from stack
				94	validation to ORC data generation.
				95
				96	It should be possible to instead generate the ORC data with a simple
				97	tool which converts DWARF to ORC data. However, such a solution would
				98	be incomplete due to the kernel's extensive use of asm, inline asm, and
				99	special sections like exception tables.
				100
				101	That could be rectified by manually annotating those special code paths
				102	using GNU assembler .cfi annotations in .S files, and homegrown
				103	annotations for inline asm in .c files. But asm annotations were tried
				104	in the past and were found to be unmaintainable. They were often
				105	incorrect/incomplete and made the code harder to read and keep updated.
				106	And based on looking at glibc code, annotating inline asm in .c files
				107	might be even worse.
				108
				109	Objtool still needs a few annotations, but only in code which does
				110	unusual things to the stack like entry code. And even then, far fewer
				111	annotations are needed than what DWARF would need, so they're much more
				112	maintainable than DWARF CFI annotations.
				113
				114	So the advantages of using objtool to generate ORC data are that it
				115	gives more accurate debuginfo, with very few annotations. It also
				116	insulates the kernel from toolchain bugs which can be very painful to
				117	deal with in the kernel since we often have to workaround issues in
				118	older versions of the toolchain for years.
				119
				120	The downside is that the unwinder now becomes dependent on objtool's
				121	ability to reverse engineer GCC code flow. If GCC optimizations become
				122	too complicated for objtool to follow, the ORC data generation might
				123	stop working or become incomplete. (It's worth noting that livepatch
				124	already has such a dependency on objtool's ability to follow GCC code
				125	flow.)
				126
				127	If newer versions of GCC come up with some optimizations which break
				128	objtool, we may need to revisit the current implementation. Some
				129	possible solutions would be asking GCC to make the optimizations more
				130	palatable, or having objtool use DWARF as an additional input, or
				131	creating a GCC plugin to assist objtool with its analysis. But for now,
				132	objtool follows GCC code quite well.
				133
				134
				135	Unwinder implementation details
				136	-------------------------------
				137
				138	Objtool generates the ORC data by integrating with the compile-time
				139	stack metadata validation feature, which is described in detail in
				140	tools/objtool/Documentation/stack-validation.txt. After analyzing all
				141	the code paths of a .o file, it creates an array of orc_entry structs,
				142	and a parallel array of instruction addresses associated with those
				143	structs, and writes them to the .orc_unwind and .orc_unwind_ip sections
				144	respectively.
				145
				146	The ORC data is split into the two arrays for performance reasons, to
				147	make the searchable part of the data (.orc_unwind_ip) more compact. The
				148	arrays are sorted in parallel at boot time.
				149
				150	Performance is further improved by the use of a fast lookup table which
				151	is created at runtime. The fast lookup table associates a given address
				152	with a range of indices for the .orc_unwind table, so that only a small
				153	subset of the table needs to be searched.
				154
				155
				156	Etymology
				157	---------
				158
				159	Orcs, fearsome creatures of medieval folklore, are the Dwarves' natural
				160	enemies. Similarly, the ORC unwinder was created in opposition to the
				161	complexity and slowness of DWARF.
				162
				163	"Although Orcs rarely consider multiple solutions to a problem, they do
				164	excel at getting things done because they are creatures of action, not
				165	thought." [3] Similarly, unlike the esoteric DWARF unwinder, the
				166	veracious ORC unwinder wastes no time or siloconic effort decoding
				167	variable-length zero-extended unsigned-integer byte-coded
				168	state-machine-based debug information entries.
				169
				170	Similar to how Orcs frequently unravel the well-intentioned plans of
				171	their adversaries, the ORC unwinder frequently unravels stacks with
				172	brutal, unyielding efficiency.
				173
				174	ORC stands for Oops Rewind Capability.
				175
				176
				177	[1] https://lkml.kernel.org/r/20170602104048.jkkzssljsompjdwy@suse.de
				178	[2] https://lkml.kernel.org/r/d2ca5435-6386-29b8-db87-7f227c2b713a@suse.cz
				179	[3] http://dustin.wikidot.com/half-orcs-and-orcs