Linux System Administration: First Tasks

You might know how to write code or build applications, but do you know what is required of a good Linux sysadmin?

Linux system administration has a place of its own in the hierarchy of information technology specializations. Some people excel in special areas of free software technology but haven't needed to learn system administration. For example, you may specialize in configuring e-mail or writing applications using Apache and MySQL. You may focus only on Domain Names Services and know esoteric ways of setting up servers on provider lines that frequently change IP addresses. But if I asked you to babysit a busy server or servers, you might not have the temperament or have learned the plethora of skills required to do so.

The above does not mean that good system administrators do not excel in areas such as configuring Apache, maintaining DNS zone files or writing Perl Scripts. It simply means that if you want to work as a system administrator in the Linux world, you need to know how to do everything from installing a server to securing the filesystem from mischievous crackers on the Internet. In between, you need to prepare your system to recover from the myriad ways a server can fail.

Consider, for example, a case in which you find that one of the Web sites you manage has gone down; the server has locked up and nothing works. How do you recover in the fastest possible way? Such an event happened to me two weeks ago. One of my articles wound up on Slashdot.org, Digg.com, NewsForge and other sites at the same time. None of my colleagues had seen that much traffic on a Linux site before. Aside from the several million hits on our server, we had a quarter of a million unique visitors concentrated in a five-hour period.

When you see that kind of traffic, you don't want the server to go down or you'll miss new readers. In our situation, a reboot allowed the system to return to service for a few minutes, but then it locked up again. Normally, we used less than ten percent of our system resources, so we thought we had prepared for the hottest day of the year.

Knowing the server and all the running processes, we could shut some down and focus on allowing a massive increase in simultaneous connections to our database. Although we have several thousand subscribers, we turned off processes such as those that restricted comments to registered readers. In the end, we made it through the day with only a short period of down time. But the surge of traffic rocked our boats.

Service outages such as the one described above can happen in the confines of a private network. Many services experience peak usage at specific times. For example, administrators know that one of the heaviest loads they'll have during the day occurs first thing in the morning, when people check their e-mail. People arrive at work about the same time, crank up their e-mail clients and read mail while drinking coffee.

The mail server might experience 75% of its use between 8 and 10 AM. Gateway traffic also increases and bandwidth on the network bogs down. Should you provide separate dedicated servers for mail, routing, proxy and gateway services? The majority of IT shops do that.

What if those computers averaged only 10% of CPU and memory capacity during the course of the day, but required 75% of resources for only a couple of hours a day, five days a week? Rather than buying individual computers, vendors have started recommending higher capacity machines and creating virtual severs.

You might want to configure a little larger metal to provide virtual machines for e-mail and related applications. Then, using Xen for example, you could let each application run in its own space. In that case, you might find server capacity utilization running around 50%, which helps maximize your resources and reduces server sprawl.

A system administrator should know how to climb a learning curve quickly. If a new technology arrives, such as virtualization, you need to master it before it masters you. You also need to know how to apply it in your environment.

What kinds of tasks occupy a system administrators day? That depends on the environment in which he or she works. You may find yourself managing dozens or even hundreds of Web servers. In contrast, you might find yourself running a local area network that supports knowledge workers and/or developers.

Regardless of your environment, you will find that some tasks are common to all system administration functions. For example, monitoring system services and starting and stopping them takes on a role of its own. Your Linux box might appear to be running smoothly while one or more processes have stopped. A Linux server might seem happy on the outside, for example, while the database serving Web pages has failed.

When services to users become critical needs, you need to be prepared and stay ahead of problems. Imagine a failed printing job is locking up a queue, keeping users from getting their documents printed. Do you wait to do something until you hear from irate users, or do you have a way to stay ahead of the problem?

Most system administrators have to face the fact that something will happen at some point that causes down time. Such events usually occur outside of our control. Perhaps your system incurs a power outage or spike. Sometimes a system bug pops up due to a combination of factors that exist only on your server; it's something that never occurred during project testing. In reality, sysadmins never know when a problem will occur; they only know that eventually one will arise.

Administrators need to monitor their systems in an efficient and effective manner. To this end, many administrators have discovered a plethora of monitoring and alert tools within the Free Software community. Some require you to log into a remote system by SSH and run command-line tools such as pstree, lsof, dstat and chkconfig.

Another useful monitoring tool is Checkservice, which provides the status of services on (remote) hosts. It provides results by way of logs, a PHP status page or output to other tools. Some administrators like tiger, which performs a thorough check of a system and reports the results to a log file. You can find a list and explanation of tools for Debian here.

When you have to monitor a larger server farm and do not want to spend all your time logging into remote servers and running command-line tests, look for free software tools you can use with a browser. I like a tool called monit. This monitoring and alert system works on a number of Linux-type systems. Monit provides a system administrator with the ability to define, manage and monitor processes, the filesystem and even devices. You also can configure monit to restart processes if they fail.

Stanford University keeps an updated list of network monitoring tools and sponsors a working group called the Internet End-to-End Performance Monitoring Group. Be sure to check out the latest tools at the top of the Stanford list. Cacti, for example, has become one of the more popular tools among system administrators.

Professional Linux system administration requires you to know a broad number of tasks associated with networking and providing services to users. It takes a special breed of person to work in this capacity. Obviously, many people have both the character and the interest to do the job. Over the next few months, we will explore the tasks that make up Linux system administration. I hope you'll join me for the ride.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

i didnt expect this

toto's picture

wow ,i didnt think the job of a linux sysadmin was this complex, am still in college but hell, seems like what they teach in college is just 1% of what a real sysadmin does daily to protect their servers.

What is work linux administrator

Anonymous's picture

Hi Dear,

May I know what are works does linux administrator do (releated to services and firewall)

Reply

dstat report file

Anonymous's picture

Dstat allows its data to be directly written to a CSV file to be imported and used by OpenOffice, Gnumeric or Excel to create graphs.

Can any1 help me how to do it.... it will be really helpful...

Learn till the end ...and teach till the end......

Arun Mane's picture

Hi people ,

I am working as a NOC Engineer .I would like to become Sys Admin because i like the task which they are doing .They seems to be GOD of our network .So i have decided to join class for Linux-REDHAT.So some one who likes to guide me through please right me to sam02324@gmail.com.I am expecting some good study material and also links .Thanks for the deliberate help....

Regards
Arun.

Are you for real?

kneekoo's picture

Wow... an engineer asking for help in such a newbie way... Either that was a spambot or the people in HR (human resources) at that firm/company are doing a lame job.

Arun, if you are real and ever come back to this page, then my advice is to learn how to ask for help. If you have learnt this by now, then let other people know how to ask questions the smart way.

Wants to Learn

Anonymous's picture

I want to Learn Linux, and sometime in the not so distance future work on webpages, could anyone give me some helpful advice on some beginners steps, i.e. webpages, specific books, or best Linux format... I have heard lots of Ubuntu, your help is very much appreciated...

Switching to Linux

Anonymous's picture

If you're a Windows user, I'd suggest looking at this page, made for users switching to Ubuntu from Windows.

https://help.ubuntu.com/community/SwitchingToUbuntu/FromWindows

Ubuntu is a good first choice for Windows users.

Search Amazon.com for books on Linux or Ubuntu, and pay attention to the reviews.

Also, watching the Linux Journal tech tips can get you going quickly!

linux admin

shyamjidubey's picture

its all right for help

Very useful info

Rich's picture

I've been using Linux for about 1.5 years, considering myself a very experienced... Recently got a job to maintain and program a commercial web site: mostly PHP and MySQL. It turned out that the job is mostly to be a sysadmin, which is not taught in schools. And I am not an expert. Now I am digging in books... Found the info in the article very useful, not usual academic style discussions. Thanks.

sysadmin diversity

Phil49's picture

Hi,

I read all these comments, and I think that we forget all other sysadmin than web sysadmin: a sysadmin can admin lot of server and not only linux/unix webserver.
In my case I have unix/linux servers, but also windows servers and firewall like Pix and Checkpoint, and switches. for each system i need tools for monitoring and prevent unknown crash. Always you need to know if your server is good enough to accept a new application or a new database, and to be aware of the charge of your box. I am not alone in this way. and lot of us are working very hard to know. just to know in case of... and to tell(or teach) to anybody who have a question.
I looked at the different link you post in your article, and i am very please to find them and to see that someone collect these information for us. Haaa! sysadmins my brothers! we must thank SLAC to their good site!

Phil

Learning to be a sysadmin

Anonymous's picture

Hi All,

I have just found this link and found it very interesting as I am
wondering just how to go about learning to be a Linux sysadmin. I
have been using Linux for a couple of years now and I dont consider
myself an expert. I would like to learn more and get more details
about exactly what skills I would need to learn. Any hints, and
tips would be appreciated.
Thanks...
Les...

Great article: includes a lot of encouragement for new sysadmins

Mark Rais's picture

This was a very good summation article that will serve well to encourage new system admins on a lot of the fundamentals.

We need more encouragement for new system admins, who often get slammed their first week on the job!

I appreciate the point regarding dealing with scale/server load issues. Although I would add one item that may also help others looking to address the same "sudden spike in load" issues.

In my time at an internet company, we found a kind of quasi dynamic-static compromise that allowed a lot of our sites to retain ridiculous peak loads for limited hardware costs.

We ran our dynamic website on a stating server, then using basic scripts and rdist, flattened the dynamic content into base html and pushed it across sites to other webservers.

Because the Apache servers were simply delivering flat html, they could handle much better peak loads.

There's a lot more to this of course, regarding latency issues and how to generate the static html frequently enough to retain the benefits of the site design while dealing with an anomoly day of super high loads.

Just a basic thought to add to your very useful article.

millions of hits?

farnsworth's picture

"One of my articles wound up on Slashdot.org, Digg.com, NewsForge and other sites at the same time. None of my colleagues had seen that much traffic on a Linux site before. Aside from the several million hits on our server, we had a quarter of a million unique visitors concentrated in a five-hour period."

I'm curious what this article is. Got a link?

One of my secret SA weapons

Skapare's picture

One of my secret SA weapons is a backup instance of sshd running set to a niceness level of -20, configured to only allow root to login, on a secret port number, on which I frequently have a login session sitting idle ready to go.

why?

Anonymous's picture

for what what would you need that for? sshd was compromised? system rooted? do you think if your system is compromised that they might have already run nmap against it. or what do you do with your firewall or co. firewall access?

re: why?

Anonymous's picture

I didn't write the post but I'll respond to your question.

First, it's a legitimate backdoor, especially if you reroute the daemon to respond on a different port. It should run as a separate instance of sshd in a chroot if possible. You can then enter a compromised system that's still running and clean it up. OR

You can also have it start in a power down/ power up scenario where you have serial capabilties to see a safely booted system in a single user mode without having to be on site. You would set the boot sequence to start up in a different mode or have a special purpose Live CD come on line.

Linux and UNIX are so dumbed down these days that we may have lost the basics.

> You can then enter a

Francis Litterio's picture

> You can then enter a compromised system that's still running and
> clean it up.

Umm. Wiping the disk and reinstalling is the only safe way to "clean up" a compromized system. You can't do that over SSH.

"clean up"

sixty4k's picture

I need to pull any user data off, I need to keep the compromized system limping along till I can throw up a replacement server, I need to shut down the spambot that's been set up through means other than the usual ssh process which the hacker has shutdown, or otherwise locked me out of.

As well, if my system gets a heavy load (a process goes out of control, it get's hit by more than expected web traffic, etc) I may need a terminal session to get in on and fix things. Cause a server under heavy enough load, won't log you in anymore, or will be unable to run anything from a normal ssh session if you can login.

We work in imperfect worlds, where wipe and reload are not acceptable solutions, at least in the short term. Things must work, now, and a fix that means downtime is no fix at all.

webmin

Anonymous's picture

Try www.webmin.com. Exposes all tools under a Unix based system and
has a modular design for extensions. Excellellent tool.

Try webmin and what

Anonymous's picture

Sorry, but suggesting webmin to anyone but a complete hacker lacks smarts. Then if you have a complete hacker running it, he probably doesn't have enough skills to administer a server. Nah.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix