From the Publisher
The end of March is quickly approaching as I write. Here at SSC, March is always an exciting time as it is the end of our accounting year. For the last two weeks Gena Shurtleff and I (with help from others) have been working on a budget for the next year of LJ--something that has made some of us very grouchy. [mainly, our publisher --Ed.]
To make this process even more fun, there have been a lot of computer-related problems—all related to our Linux systems. [Note: if you are humor-impaired you may want to skip a lot of this.] First, our main server failed. Linux apparently caused a pin to break off the cable to the external SCSI disk drive. Then, a week or so later, Linux broke a head on the disk drive in our firewall. Next, the fan in our editor's computer started growling at her. On top of all that, various systems in the office were mysteriously crashing or exhibiting very strange behavior in general. The end result was too many hours of downtime, lost e-mail and an unhappy working relationship with our computers.
We began to wonder if Linux was good enough for us, and if perhaps Windows NT might support “automatic pin re-soldering” and “disk-head replacement”.
Once things calmed down we took a closer look at the problems. By the way, we refers specifically to Jay Painter, our new systems administrator, Peter Struijk and me. First, we concluded that it probably wasn't the fault of Linux that hardware was breaking. As scary as it is that multitasking operating systems write to disks whenever they think it's a good time, this really isn't a reason for a cable to break.
To address the issue of being down for longer periods of time than we thought appropriate, let's look at some specific cases.
Note: at this time, our various systems ran a host of different software versions (kernels and C libraries), and we were in the midst of converting them to Debian in order to have consistency across all machines.
The failure of the cable caused the data on the server disks to be scrambled. Fortunately, we had back-up copies of the files, so it seemed like a good time to do an upgrade. We made the logical decision to reload a standard Slackware system (rather than try to change to Debian), restore the user files and be on our way. It turned out to be not so easy. The new load of Slackware had more differences from the old version than we expected. Libraries were different. NFS and NIS were different. Adobe fonts we use for doing reference cards had to be reloaded. Configuration files for groff had to be updated. A lot of work was done to get the new configuration talking to all the old systems and to get everything tuned.
Possibly inspired by the extra work the firewall was doing during the time the server was down, a head died on the disk in the firewall the next weekend resulting in the loss of a lot of mail. Why was it lost? Why wasn't it queued and then forwarded? Was this another Linux shortcoming?
On investigation we found that backup MX (mail exchanger) records were in place to take care of this very problem; however, they were pointing to the wrong machine. Again, the problem could not be pinned on Linux; it was an administrative error by a previous systems administrator. The mistake went undetected because this backup had never been exercised before, since Linux had been working flawlessly.
Let's move on to those strange software problems I mentioned. Surely we can find something to pin on Linux here.
One machine, used as our DNS name server, had been less than reliable. Two things in particular happened quite regularly. The first was that syslogd, the system log daemon, would hang in a loop eating up all available CPU time. While this problem appeared to be related to the location of the log file disappearing (caused by a reboot of the file server or a network problem), we haven't been able to fix it. However, it doesn't appear to happen in newer Linux releases (our problem machine is running 1.2.13) and, while it is irritating, it does not cause the machine to crash—just to run slower than normal.
The other problem on this same machine was stranger, although it turned out to be fixable. Multiple copies of crond (the cron daemon) kept appearing on the machine even though only one was initiated at boot time. One day, I found 13 crond jobs running, killed 12 and, a few hours later, found three still running.
At this point Jay jokingly said, “Maybe there is a cron job starting cron jobs.” Well, since there were processes being started by cron that didn't exist on the other machines, I started looking around for suspicious jobs. The first couple of extraneous jobs I found were benign, but then I found one that made both of us realize that Jay's attempt at a joke wasn't a joke at all. There was, in fact, a cron job that initiated another cron job. Or, more accurately, a cron job that grepped for everything but a cron job, attempted to kill all cron jobs and then started a new cron job. In other words, it looked like a partially written script to do “who knows what”--nothing that would actually work. It was signed and dated, so we could see both who wrote it and that the creation date was about the time when stability problems first appeared. Again, we had found “pilot error”, not a problem with Linux.
There are more of these stories than there is space to tell them. Basically what we found out was that even though various distributions may have some kinks in them like a wrong file permission at install time, they do all install. That's true for Caldera, Linux FT, Debian, Red Hat, Slackware, Yggdrasil and all the others. Software does not wear out. If the system is running it is not likely to stop aside from hardware errors. As a case in point, I still have a 0.99 kernel running on the main machine at fylz.com that was installed in August, 1993. It is an NFS server with three 38.8KB modems on it. The hardware is a 386DX40 with 8MB of RAM. Why haven't I upgraded it? It works, and it is extremely stable. The last reboot was in November 1996, when I turned off the machine to remove a zip drive from the SCSI bus.
Phil Hughes
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- RSS Feeds
- New Products
- Trying to Tame the Tablet
- What's the tweeting protocol?
- Dart: a New Web Programming Experience
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.




3 hours 4 min ago
5 hours 26 min ago
22 hours 15 min ago
1 day 47 min ago
1 day 2 hours ago
1 day 2 hours ago
1 day 3 hours ago
1 day 7 hours ago
1 day 8 hours ago
1 day 10 hours ago