Managing Linux Using Puppet

At some point, you probably have installed or configured a piece of software on a server or desktop PC. Since you read Linux Journal, you've probably done a lot of this, as well as developed a range of glue shell scripts, Perl snippets and cron jobs.

Unless you are more disciplined than I was, every server has a unique, hand-crafted version of those config files and scripts. It might be as simple as a backup monitor script, but each still needs to be managed and installed.

Installing a new server usually involves copying over config files and glue scripts from another server until things "work". Subtle problems may persist if a particular condition appears infrequently. Any improvement is usually made on an ad hoc basis to a specific machine, and there is no way to apply improvements to all servers or desktops easily.

Finally, in typical scenarios, all the learning and knowledge invested in these scripts and configuration files are scattered throughout the filesystem on each Linux system. This means there is no easy way to know how any piece of software has been customized.

If you have installed a server and come back to it three years later wondering what you did, or manage a group of desktops or a private cloud of virtual machines, configuration management and Puppet can help simplify your life.

Enter Configuration Management

Configuration management is a solution to this problem. A complete solution provides a centralized repository that defines and documents how things are done that can be applied to any system easily and reproducibly. Improvements simply can be rolled out to systems as required. The result is that a large number of servers can be managed by one administrator with ease.


Many different configuration management tools for Linux (and other platforms) exist. Puppet is one of the most popular and the one I cover in this article. Similar tools include Chef, Ansible and Salt as well as many others. Although they differ in the specifics, the general objectives are the same.

Puppet's underlying philosophy is that you tell it what you want as an end result (required state), not how you want it done (the procedure), using Puppet's programming language. For example, you might say "I want ssh key XYZ to be able to log in to user account foo." You wouldn't say "cat this string to /home/foo/.ssh/authorized_keys." In fact, the simple procedure I defined isn't even close to being reliable or correct, as the .ssh directory may not exist, the permissions could be wrong and many other things.

You declare your requirements using Puppet's language in files called manifests with the suffix .pp. Your manifest states the requirements for a machine (virtual or real) using Puppet's built-in modules or your own custom modules, which also are stored in manifest files. Puppet is driven from this collection of manifests much like a program is built from code. When the puppet apply command is run, Puppet will compile the program, determine the difference in the machine's state from the desired state, and then make any changes necessary to bring the machine in line with the requirements.

This approach means that if you run puppet apply on a machine that is up to date with the current manifests, nothing should happen, as there are no changes to make.

Overview of the Approach

Puppet is a tool (actually a whole suite of tools) that includes the Puppet execution program, the Puppet master, the Puppet database and the Puppet system information utility. There are many different ways to use it that suit different environments.

In this article, I explain the basics of Puppet and the way we use it to manage our servers and desktops, in a simplified form. I use the term "machine" to refer to desktops, virtual machines and hypervisor hosts.

The approach I outline here works well for 1–100 machines that are fairly similar but differ in various ways. If you are managing a cloud of 1,000 virtual servers that are identical or differ in very predictable ways, this approach is not optimized for that case (and you should write an article for the next issue of Linux Journal).

This approach is based around the ideas outlined in the excellent book Puppet 3 Beginners Guide by John Arundel. The basic idea is this:

  • Store your Puppet manifests in git. This provides a great way to manage, track and distribute changes. We also use it as the way servers get their manifests (we don't use a Puppet master). You easily could use Subversion, Mercurial or any other SCM.

  • Use a separate git branch for each machine so that machines are stable.

  • Each machine then periodically polls the git repository and runs puppet apply if there are any changes.

  • There is a manifest file for each machine that defines the desired state.


David Barton is Managing Director of OneIT, a company specializing in custom business software development. He's been using Linux since 1998 and managing OneIT's Linux servers for more than 10 years.