Andy Lutomirski | 09b2435 | 2012-07-02 14:03:58 -0700 | [diff] [blame] | 1 | The execve system call can grant a newly-started program privileges that |
| 2 | its parent did not have. The most obvious examples are setuid/setgid |
| 3 | programs and file capabilities. To prevent the parent program from |
| 4 | gaining these privileges as well, the kernel and user code must be |
| 5 | careful to prevent the parent from doing anything that could subvert the |
| 6 | child. For example: |
| 7 | |
| 8 | - The dynamic loader handles LD_* environment variables differently if |
| 9 | a program is setuid. |
| 10 | |
| 11 | - chroot is disallowed to unprivileged processes, since it would allow |
| 12 | /etc/passwd to be replaced from the point of view of a process that |
| 13 | inherited chroot. |
| 14 | |
| 15 | - The exec code has special handling for ptrace. |
| 16 | |
| 17 | These are all ad-hoc fixes. The no_new_privs bit (since Linux 3.5) is a |
| 18 | new, generic mechanism to make it safe for a process to modify its |
| 19 | execution environment in a manner that persists across execve. Any task |
| 20 | can set no_new_privs. Once the bit is set, it is inherited across fork, |
| 21 | clone, and execve and cannot be unset. With no_new_privs set, execve |
| 22 | promises not to grant the privilege to do anything that could not have |
| 23 | been done without the execve call. For example, the setuid and setgid |
| 24 | bits will no longer change the uid or gid; file capabilities will not |
| 25 | add to the permitted set, and LSMs will not relax constraints after |
| 26 | execve. |
| 27 | |
Andy Lutomirski | c540521 | 2012-07-05 11:23:24 -0700 | [diff] [blame^] | 28 | To set no_new_privs, use prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0). |
| 29 | |
| 30 | Be careful, though: LSMs might also not tighten constraints on exec |
| 31 | in no_new_privs mode. (This means that setting up a general-purpose |
| 32 | service launcher to set no_new_privs before execing daemons may |
| 33 | interfere with LSM-based sandboxing.) |
| 34 | |
Andy Lutomirski | 09b2435 | 2012-07-02 14:03:58 -0700 | [diff] [blame] | 35 | Note that no_new_privs does not prevent privilege changes that do not |
| 36 | involve execve. An appropriately privileged task can still call |
| 37 | setuid(2) and receive SCM_RIGHTS datagrams. |
| 38 | |
| 39 | There are two main use cases for no_new_privs so far: |
| 40 | |
| 41 | - Filters installed for the seccomp mode 2 sandbox persist across |
| 42 | execve and can change the behavior of newly-executed programs. |
| 43 | Unprivileged users are therefore only allowed to install such filters |
| 44 | if no_new_privs is set. |
| 45 | |
| 46 | - By itself, no_new_privs can be used to reduce the attack surface |
| 47 | available to an unprivileged user. If everything running with a |
| 48 | given uid has no_new_privs set, then that uid will be unable to |
| 49 | escalate its privileges by directly attacking setuid, setgid, and |
| 50 | fcap-using binaries; it will need to compromise something without the |
| 51 | no_new_privs bit set first. |
| 52 | |
| 53 | In the future, other potentially dangerous kernel features could become |
| 54 | available to unprivileged tasks if no_new_privs is set. In principle, |
| 55 | several options to unshare(2) and clone(2) would be safe when |
| 56 | no_new_privs is set, and no_new_privs + chroot is considerable less |
| 57 | dangerous than chroot by itself. |