Distributed Compiling with distcc

 in
You don't need a cluster to get cluster-like performance out of your compiler.
Security

distcc relies on the network being trusted. Anyone who is able to connect to the machine on the distcc port can run arbitrary commands on that host as the distcc user. It is vitally important that distccd processes are not run as root but are run as the distcc or nobody user. It's also extremely important to think carefully about your -allow statements and ensure that they are locked down appropriately for your network.

In an environment, such as a home or small workplace network, where you're security firewalled off from the outside world and people can be made accountable for not being good neighbours on the network, distcc is secure enough. It's extremely unlikely that anyone could or would exploit your distccd hosts if they're not running the dæmon as root and your allow statements limit the connecting machines appropriately.

There is also the issue that those on the network can see your distcc traffic—source code and object files on the wire for anyone to reach out and examine. Again, on a trusted network, this is unlikely to be a problem, but there are situations in which you would not want this to happen, or could not allow this to happen, depending on what code you are compiling and under what terms.

On a more hostile network, such as a large university campus or a workplace where you know there is a problem with security, these could become serious issues.

For these situations, distcc can be run over SSH. This ensures both authentication and signing on each end, and it also ensures that the code is encrypted in transit. SSH is typically around 25% slower due to SSH encryption overhead. The configuration is very similar, but it requires the use of ssh-keys. Either passphraseless keys or an ssh-agent must be used, as you will be unable to supply a password for distcc to use. For SSH connections, distccd must be installed on the clients, but it must not be listening for connections—the dæmons will be started over SSH when needed.

First, create an SSH key using ssh-keygen -t dsa, then add it to the target user's ~/.ssh/authorized_keys on your distcc hosts. It's recommended always to set a passphrase on an SSH key for security.

In this example, I'm using my own user account on all of the hosts and a simple bash loop to distribute the key quickly:

for i in 192.168.80.120 192.168.80.100; do cat ~/.ssh/id_dsa.pub 
 ↪| ssh jes@$i 'cat - >> ~/.ssh/authorized_keys'; done

To let distcc know that it needs to connect to the hosts under SSH, modify either the ~/.distcc/hosts file or $DISTCC_HOSTS variable. To instruct distcc to use SSH, simply add an @ to the beginning of the hostname. If you need to use a different user name on any of the hosts, you can specify it as user@host:

localhost @192.168.80.100 @192.168.80.120

Because I'm using a key with a passphrase, I also need to start my SSH agent with ssh-add and enter my passphrase. For those unfamiliar with ssh-agent, it's a tool that ships with OpenSSH that facilitates needing to enter the passphrase for your key only once a session, retaining it in memory.

Now that we've set up SSH keys and told distcc to use a secure connection, the procedure is the same as before—simply make -jn.

Other Options

This method of modifying the hostname with the options you want distcc to honour can be used for more than specifying the connection type. For example, the option /limit can be used to override the default number of jobs that will be sent to the distccd servers. The original limit is four jobs per host except localhost, which is sent only two. This could be increased for servers with more than two CPUs.

Another option is to use lzo compression for either TCP or SSH connections. This increases CPU overhead, but it may be worthwhile on slow networks. Combining these two options would be done with:

localhost 192.168.80.100/6,lzo

This option increases the jobs sent to 192.168.80.100 to six, and it enables use of lzo compression. These options are parsed in a specific order, so some study of the man page is recommended if you intend to use them. A full list of options with examples can be found on the distcc man page.

The flexibility of distcc covers far more than explained here. One popular configuration is to use it with ccache, a compiler cache. distcc also can be used with crossdev to cross-compile for different architectures in a distributed fashion. Now, your old SPARC workstation can get in on the act, or your G5 Mac-turned-Linux box can join the party. These are topics for future articles though; for now, I'm going to go play with my freshly compiled desktop environment.

Jes Hall is a UNIX systems consultant and KDE developer from New Zealand. She's passionate about helping open-source software bring life-changing information and tools to those who would otherwise not have them.

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix