Orchestration with MCollective, Part II

Traditionally, administrators might perform all of the above steps manually by logging in to different servers and interacting with different web interfaces perhaps. The next step they follow generally involves wrapping a series of SSH commands that would perform these actions into a shell script and then maintain some local configuration file that defines lists of servers.

With MCollective, the process is similar with the main difference being that MCollective doesn't need to have SSH root privileges on these machines. Instead, MCollective performs its tasks by putting a limited set of commands in a job queue that all of the servers check. The commands are restricted by what MCollective plugins you have installed on a particular server, and MCollective does a good job of sanitizing input from the plugins it includes by default.

Most of the above commands in that deploy list can be completed using the default plugins MCollective includes. I use Nagios for monitoring, and although MCollective does include a plugin that lets you perform NRPE commands (a Nagios agent that runs on each server that allows Nagios to run local commands to check disk space, RAM and so on), it doesn't include anything that could directly set a maintenance mode in Nagios.

Another missing piece in the above list of commands is the ability to interact with a load balancer. Many people might skip this step these days, as they are using something like nginx's internal load-balancing abilities and may not have an easy way to set something like a maintenance mode to drain existing connections to a host. In that case, you may just skip ahead to stopping the service and let the health check detect the failure. That approach risks dropping existing connections though, and because I use Haproxy as my load balancer, I can use its built-in command mode to set a maintenance mode on specific servers if I'm logged in to the load balancer.

Fortunately, MCollective has the ability to extend its existing set of commands with your own custom plugins to perform specific tasks. Unfortunately, writing, packaging and deploying even trivial MCollective plugins can be a bit complicated the first time you do it, and it's involved enough that it would require an article all of its own. MCollective's plugin documentation is a good place to start, and in particular, the documentation on writing plugins that use MCollective's RPC framework makes the code you have to write much more straightforward, even if you aren't familiar with Ruby.

When you write a custom MCollective plugin, you choose a new plugin name (say, haproxy) and then define a list of commands you want to pass that new plugin (such as disable_server and enable_server). If a command needs some kind of argument passed to it, you also define those. Then, you map those commands and arguments into basic command-line commands using their RPC framework, or you can dig in to using native Ruby libraries if you are familiar with that.

I wrote a custom Nagios plugin and an Haproxy plugin that would send my custom commands to their command file and command socket, respectively. So to set a maintenance mode on server1.example.com for Nagios and Haproxy, I would type these commands:

mco rpc nagios maintenance server=server1.example.com duration=5m
mco rpc haproxy disable_server server="serverrole/server1"

Because I took advantage of MCollective's RPC framework, I have to type rpc in front of my custom commands.

Next I provide the name of my plugin, then the command I want to run, followed by any custom arguments. Then on the Nagios server side, I intercept that command and format it into a format I can write to Nagios' local command file so it can execute. In the case of the Haproxy plugin, this command goes out to any server that happens to be running Haproxy. If a particular Haproxy server doesn't have my server defined in its configuration, it doesn't do anything harmful, and otherwise, it sets it to maintenance mode.

With these plugins in place, you can replace the above generic list of steps to specific MCollective commands:

  • mco find -S "domain=example.com and resource('Package[myapp]).managed=true"

  • mco rpc nagios maintenance server=myapp1.example.com duration=5m

  • mco rpc haproxy disable_server server="myapp/myapp1"

  • mco rpc package apt_update -I myapp1.example.com

  • mco service myapp stop -I myapp1.example.com

  • mco package myapp update -I -I myapp1.example.com

  • mco service myapp start -I myapp1.example.com

  • mco service myapp status -I myapp1.example.com

  • mco nrpe check_app_health -I myapp1.example.com

  • mco rpc haproxy enable_server server="myapp/myapp1"

I've ended up wrapping all of these commands inside a basic shell script that takes the name of a particular application as an argument, then performs the first mco find command to get the list of servers that have that package installed. Then at that point, I just run the next set of commands in a basic for loop. Where appropriate, I added a sleep command here and there to give a service time to come up. If any of the commands fail, the script exits out and reports the error so the administrator can investigate. Otherwise, it runs through each server in order.

Of course, later versions of this script have become a bit more sophisticated, so it can accept some custom arguments, log the output to a known log file and be more efficient in how it sleeps between commands. But the end result for the sysadmin is a simple "deployapp" script they can run that properly updates the application the right way, every time, with no risk of skipping or forgetting a server or steps in the process.


Kyle Rankin is Chief Security Officer at Purism, a company focused on computers that respect your privacy, security, and freedom. He is the author of many books including Linux Hardening in Hostile Networks, DevOps Troubleshooting and The Official Ubuntu