The Network is Down, and other things you don't want to hear in a disaster.

If you are a Systems Administrator, the refrain the network is down is routine. Depending on the sophistication of your users, it might be a more specific lament, such as the Internet is broken or email is unavailable. If you have really sophisticated users, you might actually get a real error message to work with. And if you work in a large organization, the level of sophistication could be all over the map. Computers and the associated networks have become so much a part of our daily life that when they are not available, everything grinds to a halt. But they have also become so user-friendly (to a point) that user training has been eliminated in most companies for all but the most sophisticated applications. End-users are expected to know how to use the standard suite of office automation applications. But rarely are they expected to know more than that. In fact, if you gathered ten random people in a room and asked how the information got from point A to point B, nine of them would look at you blankly and wonder why you were asking them such a technical question. This point was driven home to me in some reading I am doing. Cisco Press has released the third edition of Pricilla Oppenheimer's Top-Down Network Design. When it was first released, it was really the only reference implementation for good network design and the information in it has been updated to reflect the change in direction, and thus the need to rethink the preconceptions we have been building networks under in the new paradigm of the glass cloud we are moving into. But what got my attention was the section on Disaster Recovery. First, she asks this question:

Have you figured out what to do if the disaster involves a serious disease where the server and network administrators need to be quarantined?

If you are looking at this question and your first response is, "it will never happen," I would urge you to think again, especially if you are in line or senior management. As we consolidate, the skills sets are being consolidated and the jobs are being eliminated. If your IT shop is representative, more than half of the staff are gone. And that is after a period of downsizing where more than a quarter of the staff were let go since 2000. Or, to put it another way, a really bad cold could knock out your key people for several days even if they are not the ones who get sick! But the second point that really hit home was this:

Not only must the technology be tested, but employees must be drilled on the actions they should take in a disaster...people should practice working with the network in the configuration it will likely have after a disaster.... The drills should be taken seriously and should be designed to include time and stress pressures to simulate the real thing.

To put it another way, do your users know what their responsibilities are in the event that the network is down and how to do a little troubleshooting to help out? Most corporate disaster plans have a lot of magic happens here sections where the magic is to be filled in by the responsible department. And if you are the responsible person in that department, I am willing to bet that disaster, or continuity of operations planning, is hardly the top item on your to do list. In fact, I would be surprised if it even was on your to do list. Yet an emergency can strike at any moment, with or without a plan, and unless you are really a wizard, magic does not happen here. So I encourage you to do your part. You might not get the boss to agree to sending everyone to network training, but every little bit helps. And at the end of the day, it might actually make your job easier.

______________________

David Lane, KG4GIY is a member of Linux Journal's Editorial Advisory Panel and the Control Op for Linux Journal's Virtual Ham Shack

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

The most common complaint I get...

drokmed's picture

The most common complaint I get from users is:

User: The f*'ing computer is f*'ing f*'ed up! f*!

Tech: Can you be more specific?

User: It don't work!

Tech: What exactly doesn't work?

User: The f*'ing computer, man!

Tech sits at computer, tries everything he can think of. Internet is up, access to internal servers and databases are up, can print to local printer.

Tech: Everything works for me.

User: Bull sh*! Look-ee here!

User sits down, opens up the browser, goes to google, searches for "t*tties"

Computer: Dansguardian says "Access Denied"

Tech: Ah, I know how to fix that.

Tech goes to his office, pulls up the dansguardian log for that user, and sees tons of violations. Prints it out, gives to user's manager. Manager fires user. Problem fixed.

Another code ID10T

Monitoring

boylec's picture

Yes, I believe that monitoring is the best way to reach out the users and help support is a must. I am in favor on the idea that the employees must be drilled and they should know their responsibilities.

Our latest client lost all

Anonymous's picture

Our latest client lost all his cnc plans for his aluminium radiators because they threw out their sever without backing it up. He thought everything was stored on the internet.

Agreed

Joshua Rasnier's picture

I totally agree with what you are saying. Unfortunately in a time of downtime users that are unaware of what to do during an outage. Tend to become a burden.

Very nice read!

Gene Liverman's picture

Very nice read!

Gene Liverman is a Systems Administrator of *nix and VMware at a university.

Active monitoring is a must

fturley80x's picture

Active monitoring is a must for networks that cant afford downtime but then again it comes down to costs and money involved in having it monitered actively

[url=http://www.prince2-ug.be]Prince2[/url]

Very nice!

click's picture

Very nice article! Other things to do is to have an really good monitoring, one thing in the enterprise business is that if your users know there is a problem it is already too late. In my company when there is a problem 90% of the time is spent in explaining "What is the issue" and 10% is spent in actually fixing the issue. Employers, please, please educate your users it will only help your company!

Net Down

Anonymous's picture

My favorite was one of our department managers calling to say the "net" was down. It turned out the printer he was trying to print to was out of paper.

I dropped an 'r'

David Lane's picture

Sorry it distracted from the overall message.

David Lane, KG4GIY is a member of Linux Journal's Editorial Advisory Panel and the Control Op for Linux Journal's Virtual Ham Shack

Not really...

tagMacher's picture

I apologise. My comment was more in jest and I did mark it OT. While I am not a computer professional in the same sense as most of your readers (I'm just one of many scientists at this research centre who uses a variety of computers and software for materials research) I have done time as an unofficial (network + mail server admin + windows domain admin + client configurator + web master + Linux + ...) guy for our small lab intranet with ~ 70 users. So I do have appreciation for the overall message. BTW, you also dropped at least another one letter - an 'f' :-)

[Off-topic post] Proof read

tagMacher's picture

This article can be used to highlight the dangers of totally relying on an automated spell check!

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix