Sysadmin Tips on Preparing for Vacation
Read on for ways to help reduce the chance that your vacation will be interrupted by sysadmin issues.
Every year or two my family and I like to take a vacation abroad. Normally, vacation is a time to unplug, and if you are a sysadmin who's on an on-call rotation, someone else on the team typically takes over your on-call duties. Yet as you progress in your career, you start to gain more expertise and responsibilities over systems, and even with someone else on-call, there's a certain class of emergency where the team might need to reach out to you for help even when you're on vacation. I recently took a vacation abroad, and before I left, I went through a set of tasks to reduce the chance that I would need to jump on an emergency while I was away. So in this article, I describe some of the steps I take to prepare for a vacation that will help you unplug on your next trip.
Preparing Your Computer
One of the first questions you should answer before going on vacation is whether you will need to take your work laptop with you. Depending on your organization and its security controls, you might be able to perform basic emergency administrative tasks from your personal computer, tablet or phone, or you may be able to connect to production only from your work computer. In other cases, you may not need a computer, because you can just serve an advisory role over the phone or chat with other people on the team and walk them through what to do in the event of an emergency.
If you do need to take your computer, I highly recommend making a full backup before the trip. Your computer is more likely to be lost, stolen or broken while traveling than when sitting safely at the office, so I always take a backup of my work machine before a trip. Even better than taking a backup, leave your expensive work computer behind and use a cheaper more disposable machine for travel and just restore your important files and settings for work on it before you leave and wipe it when you return. If you decide to go the disposable computer route, I recommend working one or two full work days on this computer before the vacation to make sure all of your files and settings are in place.
Good documentation is the best way to reduce or eliminate how much you have to step in when you aren't on call, whether you're on vacation or not. Everything from routine procedures to emergency response should be documented and kept up to date. Honestly, this falls under standard best practices as a sysadmin, so it's something you should have whether or not you are about to go on vacation.
First, all routine procedures from how you deploy code and configuration changes, how you manage tickets, how you perform security patches, how you add and remove users, and how the overall environment is structured should be documented in a clear step-by-step way. If you use automation tools for routine procedures, whether it's as simple as a few scripts or as complex as full orchestration tools, you should make sure you document not only how to use the automation tools, but also how to perform the same tasks manually should the automation tools fail.
If you are on call, that means you have a monitoring system in place that scans your infrastructure for problems and pages you when it finds any. Every single system check in your monitoring tool should have a corresponding playbook that a sysadmin can follow to troubleshoot and fix the problem. If your monitoring tool allows you to customize the alerts it sends, create corresponding wiki entries for each alert name, and then customize the alert so that it provides a direct link to the playbook in the wiki.
If you happen to be the subject-matter expert on a particular system, make sure that documentation in particular is well fleshed out and understandable. These are the systems that will pull you out of your vacation, so look through those documents for any assumptions you may have made when writing them that a junior member of the team might not understand. Have other members of the team review the documentation and ask you questions.
One saying about documentation is that if something is documented in two places, one of them will be out of date. Even if you document something only in one place, there's a good chance it is out of date unless you perform routine maintenance. It's a good practice to review your documentation from time to time and update it where necessary and before a vacation is a particularly good time to do it. If you are the only person that knows about the new way to perform a procedure, you should make sure your documentation covers it.
Finally, have your team maintain a page to capture anything that happens while you are gone that they want to tell you about when you get back. If you are the main maintainer of a particular system, but they had to perform some emergency maintenance of it while you were gone, that's the kind of thing you'd like to know about when you get back. If there's a central place for the team to capture these notes, they will be more likely to write things down as they happen and less likely to forget about things when you get back.
The more stable your infrastructure is before you leave and the more stable it stays while you are gone, the less likely you'll be disturbed on your vacation. Right before a vacation is a terrible time to make a major change to critical systems. If you can, freeze changes in the weeks leading up to your vacation. Try to encourage other teams to push off any major changes until after you get back.
Before a vacation is also a great time to perform any preventative maintenance on your systems. Check for any systems about to hit a disk warning threshold and clear out space. In general, if you collect trending data, skim through it for any resources that are trending upward that might go past thresholds while you are gone. If you have any tasks that might add extra load to your systems while you are gone, pause or postpone them if you can. Make sure all of your backup scripts are working and all of your backups are up to date.
Emergency Contact Methods
Although it would be great to unplug completely while on vacation, there's a chance that someone from work might want to reach you in an emergency. Depending on where you plan to travel, some contact options may work better than others. For instance, some cell-phone plans that work while traveling might charge high rates for calls, but text messages and data bill at the same rates as at home. If you plan to get a local sim card, text messages sent over the cell network from home might cost more than those sent over the data plan. In the event of a local sim card, you will have to work out some way to communicate that new number to your team.
Discuss with your team what escalation path they should use to contact you in an emergency. For instance, in my case, I knew my cell-phone plan would provide me with unlimited text messages and the same data plan as at home, but I also didn't want work email to distract me. This presented a problem, as email is the primary way I'm paged. In my case, I disabled email syncing while I was on vacation and instructed everyone to contact me via text message in the case of emergency. I also needed to be on the secondary escalation path for any alerts that weren't resolved within a certain amount of time, so I configured my monitoring tool to use an email-to-SMS gateway as my email address for alerts.
If there are certain days when you know you (or your on-call counterpart at home) might be in areas with limited cell coverage, work out those dates ahead of time and put them in your calendar. If nothing else, it might encourage others to wait on making a risky change if they know they absolutely will not be able to reach you for the next two days. In general, set expectations on your availability, and also make sure everyone takes any time zone differences into account.
Overall, a vacation should be a time for you to be completely removed from your work's on-call process. Whether that's possible or not, the more you prepare ahead of time, the less likely your vacation will be interrupted. Finally, when you get back, do a post mortem with your team about anything that went wrong and any documentation that was confusing or incomplete, so you can make improvements for your next vacation.