Large-Scale Linux Configuration Management

Mr. Anderson describes some general principles and techniques for installing and maintaining configurations on a large number of hosts and describes in detail the local configuration system at Edinburgh University.
The Component Framework

A number of components on each machine take the resources from the repository and implement the specified configuration in whatever way is appropriate for that particular platform. The components are currently implemented as shell scripts which take a standard set of method arguments, rather like the rc.d startup scripts under Red Hat Linux:

  • START: executed when the system boots.

  • STOP: executed when the system shuts down.

  • RUN: executed periodically (from cron).

A client-server program (om) also allows methods to be executed on demand on multiple remote machines. Components may have other arbitrary methods in addition to the standard ones.

Different types of components will perform different actions at different times. Typically, a daemon might be started at boot time, reloaded periodically, and stopped at shutdown. Some components however, might simply perform a reconfiguration at boot time, or start only in response to the RUN method (for example, a backup system).

Component scripts normally inherit a set of subroutines from a generic component. This provides default methods and various utility procedures for operations such as resource retrieval. This makes simple components easy to write, and scripts are frequently quite short.

Some Important Components

A typical host runs 20 to 30 components, controlling subsystems such as web servers, printers, NIS services, NFS configuration and various other daemons. Two components are worth mentioning in more detail.

The boot component is the only one run directly from the system startup files. This uses resources to determine which other components to start. The set of services running on a particular machine is therefore controlled by the boot resources.

The update component normally runs nightly, as well as at boot time. This uses the extremely useful updaterpms program which compares the RPMs installed on a machine with those specified via the resources. RPMs are automatically installed or deleted to synchronise the state of the machine with the specification. This means that all machines in the same class are always guaranteed to have identical sets of up-to-date packages. Changing an inherited resource file will automatically reconfigure the RPMs carried by all machines in the class.

Machine Installation

As much configuration as possible is performed dynamically by the various components. However, some configuration, such as disk partitioning, must be hard-wired at installation time. New machines are booted using an installation floppy, which mounts a root file system from the network, a CD or Zip drive. The boot process runs a special install component which determines all necessary install-time parameters by interpreting the machine's install resources. A very minimal template is installed on the new system and the update component is used to load the initial set of RPMs.

This supports completely unattended builds of new machines, as well as rebuilds of existing machines. If there is any doubt about the integrity of a system, it is normal for us to simply rebuild it from scratch.

Problems and Future Plans

The concept of an open, lightweight framework has been very important; many people have contributed components so that virtually everything which varies between our machines is now handled by LCFG. This has made the system very successful; however, much of the implementation is still based on technologies originally intended to be temporary. We are currently planning to expand the use of LCFG beyond our own department and this is motivating a redesign of some of the subsystems, although the basic architecture will remain the same:

  • We hope to implement a new syntax for specifying the resources, together with a special-purpose resource compiler.

  • We hope to replace the NIS distribution with something simpler which is available earlier in the boot sequence.

  • We would like to re-implement the components in Perl, using Perl inheritance to provide generic operations.

Other items on the wish list include caching support for portables and secure signing of resources.

Resources

Acknowledgements

email: paul@dcs.ed.ac.uk

Paul Anderson is a Senior Computing Officer with the Division of Informatics at Edinburgh University. He has been involved with UNIX systems administration for 15 years. Further information is available from www.dcs.ed.ac.uk/~paul, and comments by e-mail are welcome at paul@dcs.ed.ac.uk.

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix