More on Contingency Plans
A couple of weeks ago, I tangentially mentioned the need for contingency plans. Today, I want to look at them a little more closely. My current job is about as far away from Continuity of Operations (COOP) and disaster recovery (DR) as you can get, yet I still deal in the issues of disasters, and preventing them, both professionally and personally. As some of you know, I am an amateur radio operator, specializing in emergency communications. As such I spend a lot of personal time sitting in meetings with other department heads, discussing the very real issues of continuity of government, risk assessment and mitigation and recovery in ways that many in Information Technology will never have to consider. Most of the topics revolve reducing life threatening risk (you will note I said reduce, not eliminate).
In Information Technology, in most cases, if there is a system failure, someone will not die (yes, yes, those of you who work in health care have a completely different level of requirements) but that does not mean that we should not make sure that our plans do not take into account that there is risk in everything we do, more so if we are managing systems responsible for running other systems. Last Tuesday, this was hammered home to me in a very personal manner. I commute to my job by train. The train shares the tracks with freights and Amtrak, which means that a breakdown in any one of the intricate mesh of systems that is the rail system in the United States can cause havoc. Last Tuesday, as anyone who rides the rails knows, was one of those fateful days.
The CSX railroad, which controls rails from Boston to Florida, lost the use of their dispatch system. This is a teletype system that every engineer relies on to get their "traffic orders" that list equipment function level, signal outages, rail conditions, etc (as it was explained to us lay people) and even on a short run like the train I take, it can run up to 10 pages that EACH train must have. Of course, the first question is why is this on paper, but ignoring that for the moment, the loss of this system's capabilities is catastrophic. It essentially froze many trains in place. In my corner of the world, that means the impact was several thousand people. Most of these people ended up having to take to the roads, which in the Washington, DC area only means chaos and traffic in an area where chaos and traffic are routine. When these sorts of delays occur, it costs money. The train company has to pay for people to take the Subway, people have to pay to drive and park in addition to what they have paid for their train ticket. And I am only aware of some of the local costs. I would hazard to guess that this outage was very expensive, not just for my local train system but for CSX.
The system was up and operation in a fairly short period of time. Did they implement a disaster recovery plan? Roll back a patch? I do not know, but they got the system back up and running.
When we as systems people design systems for disaster recover or continuity of operations, we are usually looking at downtime. How long will the system be down? We really should be looking at cost and effect. Sadly, many companies cannot tell you the cost of downtime. This is especially true for service companies. I expect that CSX could tell me to the penny how much each minute of downtime cost them, but does that include the costs borne by secondary and tertiary systems impacted? Was the cost high enough that the outage could have been engineered to prevent it? This, of course, is the other side of the coin. Most of us, given the time and the equipment could build a system that was as close to bullet-proof as humanly possible, but the costs would probably be so high as to not be within the realm of reality. So the trade-offs are made. We have all been there and argued for the extra cluster, only to have the accountants tell us that for that 1% chance, it is not worth spending the extra half-million dollars, despite a return on investment that would prevent the company from losing two million dollars.
So we build contingency plans, write disaster recovery procedures (that are not exercised enough, and are not known by enough people) and cross our fingers that tomorrow we will not find our company on the front page of the Wall Street Journal.
Fast/Flexible Linux OS Recovery
On Demand Now
In this live one-hour webinar, learn how to enhance your existing backup strategies for complete disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible full-system recovery solution for UNIX and Linux systems.
Join Linux Journal's Shawn Powers and David Huffman, President/CEO, Storix, Inc.
Free to Linux Journal readers.Register Now!
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Chris Birchall's Re-Engineering Legacy Software (Manning Publications)
- Linux Mint 18
- The Italian Army Switches to LibreOffice
- Petros Koutoupis' RapidDisk
- ServersCheck's Thermal Imaging Camera Sensor
- Oracle vs. Google: Round 2
- The FBI and the Mozilla Foundation Lock Horns over Known Security Hole
- Privacy and the New Math
Until recently, IBM’s Power Platform was looked upon as being the system that hosted IBM’s flavor of UNIX and proprietary operating system called IBM i. These servers often are found in medium-size businesses running ERP, CRM and financials for on-premise customers. By enabling the Power platform to run the Linux OS, IBM now has positioned Power to be the platform of choice for those already running Linux that are facing scalability issues, especially customers looking at analytics, big data or cloud computing.
￼Running Linux on IBM’s Power hardware offers some obvious benefits, including improved processing speed and memory bandwidth, inherent security, and simpler deployment and management. But if you look beyond the impressive architecture, you’ll also find an open ecosystem that has given rise to a strong, innovative community, as well as an inventory of system and network management applications that really help leverage the benefits offered by running Linux on Power.Get the Guide