Orchestration with MCollective

Although many orchestration tools exist, most of them take a glorified "SSH for loop approach", and the end result is some centralized admin host that has SSH root access everywhere and runs commands one server at a time. MCollective has a strong security model where your commands are restricted to specific plugins that exist on each client, and when you run a command from your admin node, it is signed with your user's local key and sent to a central job queue. Each client checks whether the command is intended for it, and if so, it picks up the job off the queue, validates the signature, and if the plugin is installed, only then will it execute the command. With this security model, attackers can't compromise the job queue and inject new jobs, because they can't sign them, and if attackers compromise the administrative node, they are restricted to whatever plugins you have enabled. Also, because MCollective uses a job queue, commands run in parallel, so a command sent to 50 servers should return about as fast as a command sent to one.

Instead of describing every default MCollective plugin and its arguments, a better way to illustrate MCollective as an orchestration tool is to walk through how it helps automate what is (unfortunately) a pretty frequent task for sysadmins these days: patching a security hole in OpenSSL. The basic steps an administrator would have to perform by hand on each server would be the following:

  • Check what version of OpenSSL is installed on a server. Proceed with the rest of the steps if it isn't up to date.

  • Update OpenSSL.

  • Confirm that the OpenSSL package is now the patched version.

  • Restart any services on the host (like Apache, nginx or PostgreSQL) that use OpenSSL so they load the new library.

Although you certainly could use a configuration management tool to make sure that you always are running the latest version of OpenSSL, the process of restarting any services that use OpenSSL is probably not something you want to occur at random the next time the client checks in. Here's how you could perform the above steps using MCollective from a central admin host.

The package plugin allows you to query packages on a system, and this particular command polls all of the hosts in your environment at the same time and returns the version of the OpenSSL package each of them has:

mco package openssl status

You also can use the package plugin command to update packages, and this particular command updates OpenSSL on every host in your environment to the latest version:

mco package openssl update

In the output, it will return with a complete tally of how many hosts have OpenSSL installed and at what version.

The service plugin lets MCollective start, stop, restart and query the state of init services on a system. This particular command restarts the nginx service on every host in your environment at the same time:

mco service nginx restart

Any hosts that don't have an nginx service will safely do nothing. You could replace nginx in the above command with any other init service on your system.

So there you have it. With three commands, I could patch OpenSSL and restart nginx across the entire environment. If I had just needed to patch bash (such as back in the days of the Shellshock vulnerability), I could have done it with a single mco package bash update command.

Of course, most administrators won't want to apply a command (especially a restart command) across every server at the same time. Instead, you want to stage things to parts of your environment at a time. The simplest way to do this is with the -I argument that lets you apply a command to a particular server. So for instance, you could reboot nginx only on web1.example.com like this:

mco service nginx restart -I web1.example.com

MCollective allows you to apply very sophisticated filters to your commands so that they apply only to particular groups of hosts with the -W argument For example, if you wanted to update OpenSSL only on hosts running Debian 8.5, you could type:

mco package openssl update -W "operatingsystem=Debian

What's more, because these filters can be based on Facter facts, you don't have to maintain and update local lists of server categories like back in the bad old days of SSH for loop scripts. So for instance, if you spin up a new Debian 8.5 server in AWS, the next MCollective command you run that happens to reference the distribution version, fact will return this server in the results without your having to do anything. You even can use the mco find command to return a list of all of the servers that match a particular fact:

mco find -W "operatingsystem=Debian operatingsystemrelease=8.5"

You can use any facts that show up in the output from the facter command, and if you use Puppet, you also can take advantage of any custom facts from Puppet. So for example, the way that I take advantage of this is to split up my hosts into different high-availability groups based on the number in a host's hostname. In my case, when I create a host in AWS, I divide the availability zones into three groups, and the number in the hostname reflects one of those groups. So all hosts with a 1, 4 or 7 in their hostname, for instance, would be in one availability zone; 2s, 5s and 8s would be in another; and 3s, 6s and 9s in another. I then set a custom fact in Puppet I called hagroup to a, b or c, based on which of these three groups the host is in. So if I wanted to update OpenSSL across all servers but only restart nginx in a fault-tolerant way, I might do something like this:

mco package openssl update
mco service nginx restart -W hagroup=c

This way, I restart nginx only in a third of my environment. If there were some kind of problem, the other two-thirds of the environment would be fine. Then I would wait for all the nginx hosts in that group to return, and repeat the nginx restart command for hagroup=b and then finally hagroup=a. When I'm updating software that possibly could crash or packages that automatically restart the service after an update, I also limit the package update command to a particular hagroup.

What's nice about MCollective is that because you can limit it based on facts that are set automatically on each system, it's particularly easy to create shell scripts that wrap around a group of MCollective commands to perform common sysadmin tasks (like, say, upgrading OpenSSL) that apply in a consistent but fast and automated way. You also can extend MCollective with your own custom plugins that are relatively easy to write.

In my next article, I plan to describe how I wrapped a series of MCollective commands, including some custom plugins we wrote in-house, to automate all of the steps you would normally do by hand to upgrade in-house software on production systems.


Kyle Rankin is SVP of Security and Infrastructure at Zero, the author of many books including Linux Hardening in Hostile Networks, DevOps Troubleshooting and The Official Ubuntu Server Book, and a columnist for Linux Journal. Follow him @kylerankin