Distributed Compiling with distcc
distcc relies on the network being trusted. Anyone who is able to connect to the machine on the distcc port can run arbitrary commands on that host as the distcc user. It is vitally important that distccd processes are not run as root but are run as the distcc or nobody user. It's also extremely important to think carefully about your -allow statements and ensure that they are locked down appropriately for your network.
In an environment, such as a home or small workplace network, where you're security firewalled off from the outside world and people can be made accountable for not being good neighbours on the network, distcc is secure enough. It's extremely unlikely that anyone could or would exploit your distccd hosts if they're not running the dæmon as root and your allow statements limit the connecting machines appropriately.
There is also the issue that those on the network can see your distcc traffic—source code and object files on the wire for anyone to reach out and examine. Again, on a trusted network, this is unlikely to be a problem, but there are situations in which you would not want this to happen, or could not allow this to happen, depending on what code you are compiling and under what terms.
On a more hostile network, such as a large university campus or a workplace where you know there is a problem with security, these could become serious issues.
For these situations, distcc can be run over SSH. This ensures both authentication and signing on each end, and it also ensures that the code is encrypted in transit. SSH is typically around 25% slower due to SSH encryption overhead. The configuration is very similar, but it requires the use of ssh-keys. Either passphraseless keys or an ssh-agent must be used, as you will be unable to supply a password for distcc to use. For SSH connections, distccd must be installed on the clients, but it must not be listening for connections—the dæmons will be started over SSH when needed.
First, create an SSH key using ssh-keygen -t dsa, then add it to the target user's ~/.ssh/authorized_keys on your distcc hosts. It's recommended always to set a passphrase on an SSH key for security.
In this example, I'm using my own user account on all of the hosts and a simple bash loop to distribute the key quickly:
for i in 192.168.80.120 192.168.80.100; do cat ~/.ssh/id_dsa.pub ↪| ssh jes@$i 'cat - >> ~/.ssh/authorized_keys'; done
To let distcc know that it needs to connect to the hosts under SSH, modify either the ~/.distcc/hosts file or $DISTCC_HOSTS variable. To instruct distcc to use SSH, simply add an @ to the beginning of the hostname. If you need to use a different user name on any of the hosts, you can specify it as user@host:
localhost @192.168.80.100 @192.168.80.120
Because I'm using a key with a passphrase, I also need to start my SSH agent with ssh-add and enter my passphrase. For those unfamiliar with ssh-agent, it's a tool that ships with OpenSSH that facilitates needing to enter the passphrase for your key only once a session, retaining it in memory.
Now that we've set up SSH keys and told distcc to use a secure connection, the procedure is the same as before—simply make -jn.
This method of modifying the hostname with the options you want distcc to honour can be used for more than specifying the connection type. For example, the option /limit can be used to override the default number of jobs that will be sent to the distccd servers. The original limit is four jobs per host except localhost, which is sent only two. This could be increased for servers with more than two CPUs.
Another option is to use lzo compression for either TCP or SSH connections. This increases CPU overhead, but it may be worthwhile on slow networks. Combining these two options would be done with:
This option increases the jobs sent to 192.168.80.100 to six, and it enables use of lzo compression. These options are parsed in a specific order, so some study of the man page is recommended if you intend to use them. A full list of options with examples can be found on the distcc man page.
The flexibility of distcc covers far more than explained here. One popular configuration is to use it with ccache, a compiler cache. distcc also can be used with crossdev to cross-compile for different architectures in a distributed fashion. Now, your old SPARC workstation can get in on the act, or your G5 Mac-turned-Linux box can join the party. These are topics for future articles though; for now, I'm going to go play with my freshly compiled desktop environment.
Jes Hall is a UNIX systems consultant and KDE developer from New Zealand. She's passionate about helping open-source software bring life-changing information and tools to those who would otherwise not have them.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- SUSE LLC's SUSE Manager
- Tech Tip: Really Simple HTTP Server with Python
- My +1 Sword of Productivity
- Non-Linux FOSS: Caffeine!
- Managing Linux Using Puppet
- Doing for User Space What We Did for Kernel Space
- Google's SwiftShader Released
- Parsing an RSS News Feed with a Bash Script
- SuperTuxKart 0.9.2 Released
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide