Orchestration with MCollective
I originally got into systems administration because I loved learning about computers, and I figured that was a career that always would offer me something new to learn. Now many years later that prediction has turned out to be true, and it seems like there are new things to learn all the time. In particular, every now and then a new technology comes around that dramatically changes how sysadmins do their jobs. For instance, in the October 2012 issue of LJ, I wrote an article titled "How to Deploy a Server" where I described the progression of how sysadmins deployed servers from by-hand bespoke configuration, to images, to post-install scripts and finally with configuration management.
So in this article, I'm going to expand on that concept to talk about how to use orchestration tools (in particular, MCollective) to manage orchestration tasks on servers post install. Many MCollective installation guides already exist, so I won't repeat that here; instead, my goal is to provide examples of how these tools can automate administration tasks further and to describe how I personally use them. And although I'm specifically discussing MCollective, these same concepts can be adapted and applied to any number of other orchestration tools.
These days, configuration management still is one of the most popular ways for sysadmins to configure a server, but over time, many administrators started pushing these tools past configuration management into what's being called orchestration. Orchestration refers to tools to help you push changes—in particular, software installation and updates—across your environment in a measured, staged way.
Although some administrators might be fine with pushing software updates randomly, if you want smooth upgrades, usually you want to follow an approach where you might update one server first, then if that succeeds, update a few more before updating the rest. Before you update software, you may want to notify upstream systems so they can stop sending traffic, and after you update the software, you may want to restart the service. This process is nothing new; it's just that in the past, administrators would do this by hand by logging in to machines one by one, or they would write custom scripts. With orchestration tools, you can perform these same steps from a centralized location.
The line between configuration management and orchestration is bit clearer with tools like Puppet and Chef than, say, with SaltStack or Ansible. Although Puppet and Chef can run in a masterless way, the default approach is to have clients check in to a master server periodically to see whether they comply with the central configuration and if not, to change until they do. Usually you have clients check in to the master in a somewhat randomized way or otherwise send them a trigger to apply changes.
Because tools like SaltStack and Ansible work on top of SSH, they already include an orchestration component that allows you to trigger certain kinds of changes from a central place in a staged way. Although you can make Puppet and Chef perform orchestration, many administrators who try it end up becoming frustrated and replace the tool with something else instead of realizing that those tools are very capable of what they were built to do but just not as strong at orchestration.
I personally ran into a similar kind of frustrating situation myself with Puppet when I was trying to use it to stage software updates. Puppet works great for configuration management, but wasn't ideal for orchestration in my experience. Instead of throwing away Puppet, I simply supplemented it with a very powerful tool, MCollective, that was expressly intended for orchestration and integrates well with Puppet.
MCollective lets you send out commands that query the value of particular Puppet facts, start and stop services, query and update software, and even start Puppet itself. MCollective also can restrict which servers run the command with the same facts from Facter that Puppet uses. So, for instance, you could send out a command that executes only on machines running a particular Linux distribution.
Although many orchestration tools exist, most of them take a glorified "SSH for loop approach", and the end result is some centralized admin host that has SSH root access everywhere and runs commands one server at a time. MCollective has a strong security model where your commands are restricted to specific plugins that exist on each client, and when you run a command from your admin node, it is signed with your user's local key and sent to a central job queue. Each client checks whether the command is intended for it, and if so, it picks up the job off the queue, validates the signature, and if the plugin is installed, only then will it execute the command. With this security model, attackers can't compromise the job queue and inject new jobs, because they can't sign them, and if attackers compromise the administrative node, they are restricted to whatever plugins you have enabled. Also, because MCollective uses a job queue, commands run in parallel, so a command sent to 50 servers should return about as fast as a command sent to one.
Instead of describing every default MCollective plugin and its arguments, a better way to illustrate MCollective as an orchestration tool is to walk through how it helps automate what is (unfortunately) a pretty frequent task for sysadmins these days: patching a security hole in OpenSSL. The basic steps an administrator would have to perform by hand on each server would be the following:
Check what version of OpenSSL is installed on a server. Proceed with the rest of the steps if it isn't up to date.
Confirm that the OpenSSL package is now the patched version.
Restart any services on the host (like Apache, nginx or PostgreSQL) that use OpenSSL so they load the new library.
Although you certainly could use a configuration management tool to make sure that you always are running the latest version of OpenSSL, the process of restarting any services that use OpenSSL is probably not something you want to occur at random the next time the client checks in. Here's how you could perform the above steps using MCollective from a central admin host.
package plugin allows you to query packages on a system, and this particular command polls
all of the hosts in your environment at the same time and returns the version of the OpenSSL
package each of them has:
mco package openssl status
You also can use the package plugin command to update packages, and this particular command updates OpenSSL on every host in your environment to the latest version:
mco package openssl update
In the output, it will return with a complete tally of how many hosts have OpenSSL installed and at what version.
service plugin lets MCollective start, stop, restart and query the state of init services
on a system. This particular command restarts the nginx service on every host in your
environment at the same time:
mco service nginx restart
Any hosts that don't have an nginx service will safely do
nothing. You could replace
nginx in the above command with any other init service on your system.
So there you have it. With three commands, I could patch OpenSSL and restart nginx across the
entire environment. If I had just needed to patch bash (such as back in the days of the
Shellshock vulnerability), I could have done it with a single
mco package bash
Of course, most administrators won't want to apply a command (especially a restart command)
across every server at the same time. Instead, you want to stage things to parts of your
environment at a time. The simplest way to do this is with the
-I argument that lets you apply
a command to a particular server. So for instance, you could reboot nginx only on web1.example.com
mco service nginx restart -I web1.example.com
MCollective allows you to apply very sophisticated filters to your commands so that they
apply only to particular groups of hosts with the
-W argument For
example, if you wanted to
update OpenSSL only on hosts running Debian 8.5, you could type:
mco package openssl update -W "operatingsystem=Debian ↪operatingsystemrelease=8.5"
What's more, because these filters can be based on Facter facts, you don't have to maintain and
update local lists of server categories like back in the bad old days of SSH for loop scripts.
So for instance, if you spin up a new Debian 8.5 server in AWS, the next MCollective command
you run that happens to reference the distribution version, fact will return this server in the
results without your having to do anything. You even can use the
find command to return a
list of all of the servers that match a particular fact:
mco find -W "operatingsystem=Debian operatingsystemrelease=8.5"
You can use any facts that show up in the output from the facter command, and if you use Puppet, you also can take advantage of any custom facts from Puppet. So for example, the way that I take advantage of this is to split up my hosts into different high-availability groups based on the number in a host's hostname. In my case, when I create a host in AWS, I divide the availability zones into three groups, and the number in the hostname reflects one of those groups. So all hosts with a 1, 4 or 7 in their hostname, for instance, would be in one availability zone; 2s, 5s and 8s would be in another; and 3s, 6s and 9s in another. I then set a custom fact in Puppet I called hagroup to a, b or c, based on which of these three groups the host is in. So if I wanted to update OpenSSL across all servers but only restart nginx in a fault-tolerant way, I might do something like this:
mco package openssl update mco service nginx restart -W hagroup=c
This way, I restart nginx only in a third of my environment. If there were some kind of problem,
the other two-thirds of the environment would be fine. Then I would wait for all the nginx
hosts in that group to return, and repeat the
nginx restart command for
hagroup=b and then
hagroup=a. When I'm updating software that possibly could crash or packages that
automatically restart the service after an update, I also limit the package update command to a
What's nice about MCollective is that because you can limit it based on facts that are set automatically on each system, it's particularly easy to create shell scripts that wrap around a group of MCollective commands to perform common sysadmin tasks (like, say, upgrading OpenSSL) that apply in a consistent but fast and automated way. You also can extend MCollective with your own custom plugins that are relatively easy to write.
In my next article, I plan to describe how I wrapped a series of MCollective commands, including some custom plugins we wrote in-house, to automate all of the steps you would normally do by hand to upgrade in-house software on production systems.