blob: b80fbd4e557596beec9dc7acb30776d33ff673e7 [file] [log] [blame]
Paul Gortmakerfaa52732013-06-21 14:56:12 -04001Documentation for /proc/sys/net/*
Shen Feng760df932009-04-02 16:57:20 -07002 (c) 1999 Terrehon Bowden <terrehon@pacbell.net>
3 Bodo Bauer <bb@ricochet.net>
4 (c) 2000 Jorge Nerin <comandante@zaralinux.com>
5 (c) 2009 Shen Feng <shen@cn.fujitsu.com>
6
7For general info and legal blurb, please look in README.
8
9==============================================================
10
11This file contains the documentation for the sysctl files in
Paul Gortmakerfaa52732013-06-21 14:56:12 -040012/proc/sys/net
Shen Feng760df932009-04-02 16:57:20 -070013
14The interface to the networking parts of the kernel is located in
Paul Gortmakerfaa52732013-06-21 14:56:12 -040015/proc/sys/net. The following table shows all possible subdirectories. You may
Shen Feng760df932009-04-02 16:57:20 -070016see only some of them, depending on your kernel's configuration.
17
18
19Table : Subdirectories in /proc/sys/net
20..............................................................................
21 Directory Content Directory Content
22 core General parameter appletalk Appletalk protocol
23 unix Unix domain sockets netrom NET/ROM
24 802 E802 protocol ax25 AX25
25 ethernet Ethernet protocol rose X.25 PLP layer
26 ipv4 IP version 4 x25 X.25 protocol
27 ipx IPX token-ring IBM token ring
28 bridge Bridging decnet DEC net
Ying Xuecc79dd12013-06-17 10:54:37 -040029 ipv6 IP version 6 tipc TIPC
Shen Feng760df932009-04-02 16:57:20 -070030..............................................................................
31
321. /proc/sys/net/core - Network core options
33-------------------------------------------------------
34
Eric Dumazet0a148422011-04-20 09:27:32 +000035bpf_jit_enable
36--------------
37
38This enables Berkeley Packet Filter Just in Time compiler.
39Currently supported on x86_64 architecture, bpf_jit provides a framework
40to speed packet filtering, the one used by tcpdump/libpcap for example.
41Values :
42 0 - disable the JIT (default value)
43 1 - enable the JIT
44 2 - enable the JIT and ask the compiler to emit traces on kernel log.
45
Daniel Borkmann4f3446b2016-05-13 19:08:32 +020046bpf_jit_harden
47--------------
48
49This enables hardening for the Berkeley Packet Filter Just in Time compiler.
50Supported are eBPF JIT backends. Enabling hardening trades off performance,
51but can mitigate JIT spraying.
52Values :
53 0 - disable JIT hardening (default value)
54 1 - enable JIT hardening for unprivileged users only
55 2 - enable JIT hardening for all users
56
Shan Weic60f6aa2012-04-26 16:52:52 +000057dev_weight
58--------------
59
60The maximum number of packets that kernel can handle on a NAPI interrupt,
61it's a Per-CPU variable.
62Default: 64
63
Matthias Tafelmeier3d48b532016-12-29 21:37:21 +010064dev_weight_rx_bias
65--------------
66
67RPS (e.g. RFS, aRFS) processing is competing with the registered NAPI poll function
68of the driver for the per softirq cycle netdev_budget. This parameter influences
69the proportion of the configured netdev_budget that is spent on RPS based packet
70processing during RX softirq cycles. It is further meant for making current
71dev_weight adaptable for asymmetric CPU needs on RX/TX side of the network stack.
72(see dev_weight_tx_bias) It is effective on a per CPU basis. Determination is based
73on dev_weight and is calculated multiplicative (dev_weight * dev_weight_rx_bias).
74Default: 1
75
76dev_weight_tx_bias
77--------------
78
79Scales the maximum number of packets that can be processed during a TX softirq cycle.
80Effective on a per CPU basis. Allows scaling of current dev_weight for asymmetric
81net stack processing needs. Be careful to avoid making TX softirq processing a CPU hog.
82Calculation is based on dev_weight (dev_weight * dev_weight_tx_bias).
83Default: 1
84
stephen hemminger6da7c8f2013-08-27 16:19:08 -070085default_qdisc
86--------------
87
88The default queuing discipline to use for network devices. This allows
Phil Sutter2e641262015-09-15 10:33:07 +020089overriding the default of pfifo_fast with an alternative. Since the default
90queuing discipline is created without additional parameters so is best suited
91to queuing disciplines that work well without configuration like stochastic
92fair queue (sfq), CoDel (codel) or fair queue CoDel (fq_codel). Don't use
93queuing disciplines like Hierarchical Token Bucket or Deficit Round Robin
94which require setting up classes and bandwidths. Note that physical multiqueue
95interfaces still use mq as root qdisc, which in turn uses this default for its
96leaves. Virtual devices (like e.g. lo or veth) ignore this setting and instead
97default to noqueue.
stephen hemminger6da7c8f2013-08-27 16:19:08 -070098Default: pfifo_fast
99
Eliezer Tamir64b0dc52013-07-10 17:13:36 +0300100busy_read
Eliezer Tamir2d48d672013-06-24 10:28:03 +0300101----------------
Cong Wange0d10952013-08-01 11:10:25 +0800102Low latency busy poll timeout for socket reads. (needs CONFIG_NET_RX_BUSY_POLL)
Eliezer Tamircbf55002013-07-08 16:20:34 +0300103Approximate time in us to busy loop waiting for packets on the device queue.
Eliezer Tamir64b0dc52013-07-10 17:13:36 +0300104This sets the default value of the SO_BUSY_POLL socket option.
105Can be set or overridden per socket by setting socket option SO_BUSY_POLL,
106which is the preferred method of enabling. If you need to enable the feature
107globally via sysctl, a value of 50 is recommended.
Eliezer Tamircbf55002013-07-08 16:20:34 +0300108Will increase power usage.
Eliezer Tamir2d48d672013-06-24 10:28:03 +0300109Default: 0 (off)
110
Eliezer Tamir64b0dc52013-07-10 17:13:36 +0300111busy_poll
Eliezer Tamir06021292013-06-10 11:39:50 +0300112----------------
Cong Wange0d10952013-08-01 11:10:25 +0800113Low latency busy poll timeout for poll and select. (needs CONFIG_NET_RX_BUSY_POLL)
Eliezer Tamircbf55002013-07-08 16:20:34 +0300114Approximate time in us to busy loop waiting for events.
Eliezer Tamir2d48d672013-06-24 10:28:03 +0300115Recommended value depends on the number of sockets you poll on.
116For several sockets 50, for several hundreds 100.
117For more than that you probably want to use epoll.
Eliezer Tamir64b0dc52013-07-10 17:13:36 +0300118Note that only sockets with SO_BUSY_POLL set will be busy polled,
119so you want to either selectively set SO_BUSY_POLL on those sockets or set
120sysctl.net.busy_read globally.
Eliezer Tamircbf55002013-07-08 16:20:34 +0300121Will increase power usage.
Eliezer Tamir06021292013-06-10 11:39:50 +0300122Default: 0 (off)
123
Shen Feng760df932009-04-02 16:57:20 -0700124rmem_default
125------------
126
127The default setting of the socket receive buffer in bytes.
128
129rmem_max
130--------
131
132The maximum receive socket buffer size in bytes.
133
Willem de Bruijnb245be12015-01-30 13:29:32 -0500134tstamp_allow_data
135-----------------
136Allow processes to receive tx timestamps looped together with the original
137packet contents. If disabled, transmit timestamp requests from unprivileged
138processes are dropped unless socket option SOF_TIMESTAMPING_OPT_TSONLY is set.
139Default: 1 (on)
140
141
Shen Feng760df932009-04-02 16:57:20 -0700142wmem_default
143------------
144
145The default setting (in bytes) of the socket send buffer.
146
147wmem_max
148--------
149
150The maximum send socket buffer size in bytes.
151
152message_burst and message_cost
153------------------------------
154
155These parameters are used to limit the warning messages written to the kernel
156log from the networking code. They enforce a rate limit to make a
157denial-of-service attack impossible. A higher message_cost factor, results in
158fewer messages that will be written. Message_burst controls when messages will
159be dropped. The default settings limit warning messages to one every five
160seconds.
161
162warnings
163--------
164
Joe Perchesba7a46f2014-11-11 10:59:17 -0800165This sysctl is now unused.
166
167This was used to control console messages from the networking stack that
168occur because of problems on the network like duplicate address or bad
169checksums.
170
171These messages are now emitted at KERN_DEBUG and can generally be enabled
172and controlled by the dynamic_debug facility.
Shen Feng760df932009-04-02 16:57:20 -0700173
174netdev_budget
175-------------
176
177Maximum number of packets taken from all interfaces in one polling cycle (NAPI
178poll). In one polling cycle interfaces which are registered to polling are
Rami Rosen3cc75872013-05-17 09:10:34 +0000179probed in a round-robin manner.
Shen Feng760df932009-04-02 16:57:20 -0700180
181netdev_max_backlog
182------------------
183
184Maximum number of packets, queued on the INPUT side, when the interface
185receives packets faster than kernel can process them.
186
Eric Dumazet960fb622014-11-16 06:23:05 -0800187netdev_rss_key
188--------------
189
190RSS (Receive Side Scaling) enabled drivers use a 40 bytes host key that is
191randomly generated.
192Some user space might need to gather its content even if drivers do not
193provide ethtool -x support yet.
194
195myhost:~# cat /proc/sys/net/core/netdev_rss_key
19684:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8: ... (52 bytes total)
197
198File contains nul bytes if no driver ever called netdev_rss_key_fill() function.
199Note:
200/proc/sys/net/core/netdev_rss_key contains 52 bytes of key,
201but most drivers only use 40 bytes of it.
202
203myhost:~# ethtool -x eth0
204RX flow hash indirection table for eth0 with 8 RX ring(s):
205 0: 0 1 2 3 4 5 6 7
206RSS hash key:
20784:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8:43:e3:c9:0c:fd:17:55:c2:3a:4d:69:ed:f1:42:89
208
Eric Dumazet3b098e22010-05-15 23:57:10 -0700209netdev_tstamp_prequeue
210----------------------
211
212If set to 0, RX packet timestamps can be sampled after RPS processing, when
213the target CPU processes packets. It might give some delay on timestamps, but
214permit to distribute the load on several cpus.
215
216If set to 1 (default), timestamps are sampled as soon as possible, before
217queueing.
218
Shen Feng760df932009-04-02 16:57:20 -0700219optmem_max
220----------
221
222Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence
223of struct cmsghdr structures with appended data.
224
2252. /proc/sys/net/unix - Parameters for Unix domain sockets
226-------------------------------------------------------
227
Li Xiaodong45dad7b2009-04-02 16:57:21 -0700228There is only one file in this directory.
229unix_dgram_qlen limits the max number of datagrams queued in Unix domain
Li Zefanca8b9952009-04-13 14:39:36 -0700230socket's buffer. It will not take effect unless PF_UNIX flag is specified.
Shen Feng760df932009-04-02 16:57:20 -0700231
232
2333. /proc/sys/net/ipv4 - IPV4 settings
234-------------------------------------------------------
235Please see: Documentation/networking/ip-sysctl.txt and ipvs-sysctl.txt for
236descriptions of these entries.
237
238
2394. Appletalk
240-------------------------------------------------------
241
242The /proc/sys/net/appletalk directory holds the Appletalk configuration data
243when Appletalk is loaded. The configurable parameters are:
244
245aarp-expiry-time
246----------------
247
248The amount of time we keep an ARP entry before expiring it. Used to age out
249old hosts.
250
251aarp-resolve-time
252-----------------
253
254The amount of time we will spend trying to resolve an Appletalk address.
255
256aarp-retransmit-limit
257---------------------
258
259The number of times we will retransmit a query before giving up.
260
261aarp-tick-time
262--------------
263
264Controls the rate at which expires are checked.
265
266The directory /proc/net/appletalk holds the list of active Appletalk sockets
267on a machine.
268
269The fields indicate the DDP type, the local address (in network:node format)
270the remote address, the size of the transmit pending queue, the size of the
271received queue (bytes waiting for applications to read) the state and the uid
272owning the socket.
273
274/proc/net/atalk_iface lists all the interfaces configured for appletalk.It
275shows the name of the interface, its Appletalk address, the network range on
276that address (or network number for phase 1 networks), and the status of the
277interface.
278
279/proc/net/atalk_route lists each known network route. It lists the target
280(network) that the route leads to, the router (may be directly connected), the
281route flags, and the device the route is using.
282
283
2845. IPX
285-------------------------------------------------------
286
287The IPX protocol has no tunable values in proc/sys/net.
288
289The IPX protocol does, however, provide proc/net/ipx. This lists each IPX
290socket giving the local and remote addresses in Novell format (that is
291network:node:port). In accordance with the strange Novell tradition,
292everything but the port is in hex. Not_Connected is displayed for sockets that
293are not tied to a specific remote address. The Tx and Rx queue sizes indicate
294the number of bytes pending for transmission and reception. The state
295indicates the state the socket is in and the uid is the owning uid of the
296socket.
297
298The /proc/net/ipx_interface file lists all IPX interfaces. For each interface
299it gives the network number, the node number, and indicates if the network is
300the primary network. It also indicates which device it is bound to (or
301Internal for internal networks) and the Frame Type if appropriate. Linux
302supports 802.3, 802.2, 802.2 SNAP and DIX (Blue Book) ethernet framing for
303IPX.
304
305The /proc/net/ipx_route table holds a list of IPX routes. For each route it
306gives the destination network, the router node (or Directly) and the network
307address of the router (or Connected) for internal networks.
Ying Xuecc79dd12013-06-17 10:54:37 -0400308
3096. TIPC
310-------------------------------------------------------
311
Erik Hugnea5325ae2014-08-28 09:08:47 +0200312tipc_rmem
313----------
314
Ying Xuecc79dd12013-06-17 10:54:37 -0400315The TIPC protocol now has a tunable for the receive memory, similar to the
316tcp_rmem - i.e. a vector of 3 INTEGERs: (min, default, max)
317
318 # cat /proc/sys/net/tipc/tipc_rmem
319 4252725 34021800 68043600
320 #
321
322The max value is set to CONN_OVERLOAD_LIMIT, and the default and min values
323are scaled (shifted) versions of that same value. Note that the min value
324is not at this point in time used in any meaningful way, but the triplet is
325preserved in order to be consistent with things like tcp_rmem.
Erik Hugnea5325ae2014-08-28 09:08:47 +0200326
327named_timeout
328--------------
329
330TIPC name table updates are distributed asynchronously in a cluster, without
331any form of transaction handling. This means that different race scenarios are
332possible. One such is that a name withdrawal sent out by one node and received
333by another node may arrive after a second, overlapping name publication already
334has been accepted from a third node, although the conflicting updates
335originally may have been issued in the correct sequential order.
336If named_timeout is nonzero, failed topology updates will be placed on a defer
337queue until another event arrives that clears the error, or until the timeout
338expires. Value is in milliseconds.