Taking Advantage of Linux Capabilities

by Michael Bacarella

A common topic of discussion nowadays is security, and for good reason. Security is becoming more important as the world becomes further networked. Like all good systems, Linux is evolving in order to address increasingly important security concerns.

One aspect of security is user privileges. UNIX-style user privileges come in two varieties, user and root. Regular users are absolutely powerless; they cannot modify any processes or files but their own. Access to hardware and most network specifications also are denied. Root, on the other hand, can do anything from modifying all processes and files to having unrestricted network and hardware access. In some cases root can even physically damage hardware.

Sometimes a middle ground is desired. A utility needs special privileges to perform its function, but unquestionable god-like root access is overkill. The ping utility is setuid root simply so it can send and receive ICMP messages. The danger lies in the fact that ping can be exploited before it has dropped its root privileges, giving the attacker root access to your server.

Fortunately, such a middle ground now exists, and it's called POSIX capabilities. Capabilities divide system access into logical groups that may be individually granted to, or removed from, different processes. Capabilities allow system administrators to fine-tune what a process is allowed to do, which may help them significantly reduce security risks to their system. The best part is that your system already supports it. If you're lucky, no patching should be necessary.

A list of all the capabilities that your system is, well, capable of, is available in /usr/include/linux/capability.h, starting with CAP_CHOWN. They're pretty self-explanatory and well commented. Capability checks are sprinkled throughout the kernel source, and grepping for them can make for some fun midnight reading.

Each capability is nothing more than a bit in a bitmap. With 32 bits in a capability set, and 28 sets currently defined, there are currently discussions as to how to expand this number. Some purists believe that additional capabilities would be too confusing, while others argue that there should be many more, even a capability for each system call. Time and Linus will ultimately decide how this exciting feature develops.

The Proc Interface

As of kernel 2.4.17, the file /proc/sys/kernel/cap-bound contains a single 32-bit integer that defines the current global capability set. The global capability set determines what every process on the system is allowed to do. If a capability is stripped from the system, it is impossible for any process, even root processes, to regain them.

For example, many crackers' rootkits (a set of tools that cover up their activities and install backdoors into the system) will load kernel modules that hide illicit processes and files from the system administrator. To counter this, the administrator could simply remove the CAP_SYS_MODULE capability from the system as the last step in the system startup process. This step would prevent any kernel modules from being loaded or unloaded. Once a capability has been removed, it cannot be re-added. The system must be restarted (which means you might have to use the power button if you've removed the CAP_SYS_BOOT capability) to regain the full-capability set.

Okay, I lied. There are two ways to add back a capability:

  1. init can re-add capabilities, in theory; there's no actual implementation to my knowledge. This is to facilitate capability-aware systems in the event that init needs to change runlevels.

  2. If a process is capable of CAP_SYS_RAWIO, it can modify kernel memory through /dev/mem. Among other things, it can modify kernel memory to grant itself whatever access it desires. Remove CAP_SYS_RAWIO, but be careful: by removing CAP_SYS_RAWIO, programs such as X most likely will fail to run.

Editing cap-bound by hand is kind of tedious. Fortunately for you, there's a utility called lcap that provides a friendlier interface to cap-bound. Here's how one would remove CAP_SYS_CHOWN:

Once done, it becomes impossible to change a file's owner:
chown nobody test.txt
chown: changing ownership of `test.txt':
       Operation not permitted
Here's how you would remove all capabilities except CAP_SYS_BOOT, CAP_SYS_KILL and CAP_SYS_NICE:
One thing to note: modifying cap-bound restricts the capabilities of future processes only. Okay, not exactly future processes but any process that calls exec(2) (see the function compute_creds in the kernel source file fs/exec.c). Currently running processes keep the capabilities with which they started.

Modifying the capabilities of an existing process leads us into the next section, and here's the catch I spoke about above. Running lcap with no arguments lists what your system is capable of. If you see that CAP_SETPCAP is disabled, you need to make a change to your kernel. It's simple enough to describe here. In the kernel source tree, edit include/linux/capability.h. You're changing the lines:

to_cap_t(~0 & ~CAP_TO_MASK(CAP_SETPCAP))
#define CAP_INIT_INH_SET  to_cap_t(0)

so that they read:

#define CAP_INIT_EFF_SET  to_cap_t(~0)
#define CAP_INIT_INH_SET  to_cap_t(~0)
and then recompile.

There's actually a reason that CAP_SETPCAP is disabled by default: it's deemed a security risk to leave it enabled on a production system (a patch exists for this condition but has yet to be applied as of this writing). To be on the safe side, make sure to remove this capability when you're done playing.

The System Call Interface

As of this writing, the syscalls capset and capget manipulate capabilities for a process. There are no guarantees that this interface won't change. Portable applications are encouraged to use libcap (www.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.4) instead.

The prototype for capset is

int capset(cap_user_header_t header,
const cap_user_data_t data);

HEADER is a fancy way to say which pid you're operating on:

typedef struct __user_cap_header_struct {
        __u32 version;
        int pid;
} *cap_user_header_t;
If pid is -1, you will modify the capabilities of all currently running processes. Less than -1 and you modify the process group equal to pid times -1. The semantics are similar to those of kill(2).

The DATA argument allows you to choose which capability sets you plan to modify. There are three:

typedef struct __user_cap_data_struct {
        __u32 effective;
        __u32 permitted;
        __u32 inheritable;
} *cap_user_data_t;

The permitted set contains all of the capabilities that a process is ultimately capable of realizing.

The effective set is the capabilities a process has elected to utilize from its permitted set. It's as if you had a huge arsenal of poetry (permitted set) but chose only to arm yourself with Allen Ginsberg for the task at hand (effective set).

The inheritable set defines which capabilities may be passed on to any programs that replace the current process image via exec(2). Please note that fork(2) does nothing special with capabilities. The child simply receives an exact copy of all three capabilities sets.

Only capabilities in the permitted set can be added to the effective or inheritable set. Capabilities cannot be added to the permitted set of a process unless CAP_SETPCAP is set.

The Filesystem Interface

Sadly, capabilities still lack filesystem support, limiting their usefulness to a degree. Someday, the mainstream kernels will allow you to enable capabilities in a program's inode, obviating the setuid bit in many system utilities.

Once fully supported, permitting the ping utility to open raw sockets could be as simple as:

chattr +CAP_NET_RAW /bin/ping

Unfortunately, more pressing kernel issues have delayed work in this area.

If you're so inclined, you can use libcap to hack your favorite services so that they are capability-aware and drop the privileges they no longer need at startup. Several patches exist for xntpd that do just this; some even provide their modified version as an RPM. Try a Google search if you're interested in a capability-aware version of some root-level process you find yourself often shaking a fist at.

setpcap can be used to modify the capability set of an existing process. For example, if the PID of a regular user's shell is 4235, here's how you can give that user's shell the ability to send signals to any process:

setpcaps 'cap_kill=ep' 4235

An example use of this would be to allow a friend who is using your machine to debug a CGI script to kill any Apache processes that get stuck in infinite loops. You'd run it against their login shell once and forget about them.

Here's an example that utilizes execcap and sucap to run ping as the user “nobody”, with only the CAP_NET_RAW capability. Our target of choice for ping is www.yahoo.com:

execcap 'cap_net_raw=ep' /sbin/sucap nobody
nobody /bin/ping www.yahoo.com

This sample isn't terribly useful because you need to be root to execute it, but it does illustrate what is possible. Despite some of these shortcomings, system administrators still can take measures to increase the security of their system. A system without CAP_SYS_BOOT, CAP_SYS_RAWIO and CAP_SYS_MODULE is extremely difficult for an intruder to modify. They cannot hack kernel memory, install new modules or restart the system so that it runs a backdoored kernel.

If your system logs are append-only and your core system utilities immutable (see chattr(3) for details), removing the CAP_LINUX_IMMUTABLE capability will make it virtually impossible for intruders to erase their tracks or install compromised utilities. Traffic sniffers like tcpdump become unusable once CAP_NET_RAW is removed. Remove CAP_SYS_PTRACE and you've turned off program debugging. Such a hostile environment is a script kiddy's worst nightmare, and there is no choice but to disconnect and wait for the intrusion to be discovered.


Capabilities can provide sophisticated, fine-grained access control over all aspects of a Linux system. At last, security paranoids will have some tools they so desperately need in their endless fight against “them”.


Michael Bacarella (mike@bacarella.com) is president of Netgraft Corporation, a firm specializing in web system development and information security analysis. He shares an apartment in New York with his wonderful fiancée and a most fearsome green iguana (the iguana's name is Kang.
Load Disqus comments