Making Root Unprivileged
There was a time when, to use a computer, you merely turned it on and were greeted by a command prompt. Nowadays, most operating systems offer a security model with multiple users. Typically, the credentials you present at login determine the amount of privilege that programs acting upon your behalf will have. Everyday tasks can be accomplished using unprivileged userids, minimizing the risks due to user error, accidental execution of malware downloaded from the Internet and so on. Any program needing to exercise privilege must be executed using a privileged userid. In UNIX, that is userid 0, the root user. Unfortunately, this means any software needing even just a bit of privilege can lead to a complete system compromise should it misbehave or be attacked successfully.
POSIX capabilities address this problem in two ways. First, they break the notion of all-or-nothing privilege into a set of semantically distinct privileges. This can limit the amount of privilege a task may have, so that, for example, it only is able to create devices or trace another users' tasks.
Second, the notion of privilege is separated from userids. Instead, privilege is wielded by the files (programs) a process executes. After all, users sitting at the keyboard just invoke programs to run on their behalf. It is the programs that actually do something. And, it is the programs that an administrator may entrust with privilege, based on knowledge of what the program does, who wrote it and who installed it.
POSIX capabilities have been implemented in Linux for years, but until recently, Linux supported only process capabilities. Because files are supposed to wield privilege, the lack of file capabilities meant that Linux required workarounds to allow administrators and system services to run with privilege. The POSIX capability rules were perverted to emulate a privileged root user.
Although file capabilities are now supported, the privileged root user remains the norm. In this article, I demonstrate a lazy prototype of a system with an unprivileged root.
Each process has three capability sets:
The effective set (pE) contains the capabilities it can use right now.
The permitted set (pP) contains those that it can add back into its effective set.
The inheritable set (pI) is used to determine its new sets when it executes a new file.
Files also have effective (fE), permitted (pP) and inheritable (fI) sets used to calculate the new capability sets of a process executing it.
At any time, a process can use cap_set_proc() to remove capabilities from any of the three sets.
Capabilities can be added to the effective set only if they are currently in its permitted set. They never can be added to the permitted set. And, they can be added to the inheritable set only if they are in the permitted set or if CAP_SETPCAP is in pE. When a process executes a new file, its new capability sets are calculated as follows:
The inheritable set remains unchanged.
The new permitted set is filled with both the file permitted set (masked with a bounding set, but for this article, I assume that always is full) and any capabilities present in both the file and process inheritable sets.
Capabilities in the file permitted set will be available to the new process—an example use for this is the ping program. Ping needs only the capability CAP_NET_RAW in order to craft raw network packets. It is typically setuid-root, so all users can run it. By placing CAP_NET_RAW in ping's permitted set, all users will receive CAP_NET_RAW while running ping.
Capabilities in the file inheritable set are available to a process only if they also are in the process inheritable set—an example of this would be to allow some users to renice other user's tasks. Simply arrange (as I explain a bit) for CAP_SYS_NICE to be placed in their pI on login, as well as in fI for /usr/bin/renice. Now, ordinary users can run renice without privilege, and the “special” users can run renice, but no other programs, with CAP_SYS_NICE.
The effective set is by default empty, or, if the legacy bit is set (see The File Effective Set, aka the Legacy Bit sidebar), it is set to the new permitted set.
The File Effective Set, aka the Legacy Bit
Linux has an unfortunate discrepancy between the setcap command API and what it actually does. Although setcap expects the user to define a file effective set, the kernel simply knows about a “legacy bit”. Practically speaking, if the file effective set is empty, the legacy bit is not set. If all bits in the file permitted and inheritable sets are in the effective set, the legacy bit is set. If only a subset of those bits are in the effective set, setcap will return an error.
The reasoning for the setcap API command is that Linux is loathe to change userspace APIs. The reason for using the legacy bit is that we want to encourage applications to begin with an empty effective set if they are capability-aware. Hence, the file effective set should be empty unless the application is not capability-aware. But if the application is not capability-aware, all capabilities available to it must be in its effective set from the start.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
- Using Salt Stack and Vagrant for Drupal Development
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- New Products
- Validate an E-Mail Address with PHP, the Right Way
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- New Products
- New Products
- The Pari Package On Linux
- Dart: a New Web Programming Experience
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.




2 hours 26 min ago
8 hours 4 min ago
14 hours 4 min ago
14 hours 26 min ago
14 hours 37 min ago
14 hours 41 min ago
15 hours 11 min ago
18 hours 2 min ago
18 hours 38 min ago
18 hours 39 min ago