Making Root Unprivileged
There was a time when, to use a computer, you merely turned it on and were greeted by a command prompt. Nowadays, most operating systems offer a security model with multiple users. Typically, the credentials you present at login determine the amount of privilege that programs acting upon your behalf will have. Everyday tasks can be accomplished using unprivileged userids, minimizing the risks due to user error, accidental execution of malware downloaded from the Internet and so on. Any program needing to exercise privilege must be executed using a privileged userid. In UNIX, that is userid 0, the root user. Unfortunately, this means any software needing even just a bit of privilege can lead to a complete system compromise should it misbehave or be attacked successfully.
POSIX capabilities address this problem in two ways. First, they break the notion of all-or-nothing privilege into a set of semantically distinct privileges. This can limit the amount of privilege a task may have, so that, for example, it only is able to create devices or trace another users' tasks.
Second, the notion of privilege is separated from userids. Instead, privilege is wielded by the files (programs) a process executes. After all, users sitting at the keyboard just invoke programs to run on their behalf. It is the programs that actually do something. And, it is the programs that an administrator may entrust with privilege, based on knowledge of what the program does, who wrote it and who installed it.
POSIX capabilities have been implemented in Linux for years, but until recently, Linux supported only process capabilities. Because files are supposed to wield privilege, the lack of file capabilities meant that Linux required workarounds to allow administrators and system services to run with privilege. The POSIX capability rules were perverted to emulate a privileged root user.
Although file capabilities are now supported, the privileged root user remains the norm. In this article, I demonstrate a lazy prototype of a system with an unprivileged root.
Each process has three capability sets:
The effective set (pE) contains the capabilities it can use right now.
The permitted set (pP) contains those that it can add back into its effective set.
The inheritable set (pI) is used to determine its new sets when it executes a new file.
Files also have effective (fE), permitted (pP) and inheritable (fI) sets used to calculate the new capability sets of a process executing it.
At any time, a process can use cap_set_proc() to remove capabilities from any of the three sets.
Capabilities can be added to the effective set only if they are currently in its permitted set. They never can be added to the permitted set. And, they can be added to the inheritable set only if they are in the permitted set or if CAP_SETPCAP is in pE. When a process executes a new file, its new capability sets are calculated as follows:
The inheritable set remains unchanged.
The new permitted set is filled with both the file permitted set (masked with a bounding set, but for this article, I assume that always is full) and any capabilities present in both the file and process inheritable sets.
Capabilities in the file permitted set will be available to the new process—an example use for this is the ping program. Ping needs only the capability CAP_NET_RAW in order to craft raw network packets. It is typically setuid-root, so all users can run it. By placing CAP_NET_RAW in ping's permitted set, all users will receive CAP_NET_RAW while running ping.
Capabilities in the file inheritable set are available to a process only if they also are in the process inheritable set—an example of this would be to allow some users to renice other user's tasks. Simply arrange (as I explain a bit) for CAP_SYS_NICE to be placed in their pI on login, as well as in fI for /usr/bin/renice. Now, ordinary users can run renice without privilege, and the “special” users can run renice, but no other programs, with CAP_SYS_NICE.
The effective set is by default empty, or, if the legacy bit is set (see The File Effective Set, aka the Legacy Bit sidebar), it is set to the new permitted set.
The File Effective Set, aka the Legacy Bit
Linux has an unfortunate discrepancy between the setcap command API and what it actually does. Although setcap expects the user to define a file effective set, the kernel simply knows about a “legacy bit”. Practically speaking, if the file effective set is empty, the legacy bit is not set. If all bits in the file permitted and inheritable sets are in the effective set, the legacy bit is set. If only a subset of those bits are in the effective set, setcap will return an error.
The reasoning for the setcap API command is that Linux is loathe to change userspace APIs. The reason for using the legacy bit is that we want to encourage applications to begin with an empty effective set if they are capability-aware. Hence, the file effective set should be empty unless the application is not capability-aware. But if the application is not capability-aware, all capabilities available to it must be in its effective set from the start.
- Server Hardening
- Unikernels, Docker, and Why You Should Care
- diff -u: What's New in Kernel Development
- Controversy at the Linux Foundation
- 22 Years of Linux Journal on One DVD - Now Available
- Giving Silos Their Due
- Non-Linux FOSS: Snk
- Don't Burn Your Android Yet
- What's New in 3D Printing, Part III: the Software