From the Publisher

To make this process even more fun, there have been a lot of computer-related problems—all related to our Linux systems.
What We Are Doing Differently

We are now back up and running and continuing our conversion to Debian. We are fixing problems as we find them—not just hardware and software but also administrative. For example, if we had been able to find the cellular phone number of our ISP, we could have had him address some problems much more quickly than by communicating via e-mail from a system that was mostly broken.

Before I get into specifics, let me say, no matter what software you are running, it is hard to program around hardware malfunctions. Detecting and halting them is about the best you can do.

You can also do backups and document. At some point we had a configuration disk for our firewall; but when we needed to replace the hard disk, the configuration disk had vanished. This loss cost hours of work time and probably a day of uptime. Having a complete backup of everything, boot disks for all machines, spare cables and disk drives and other assorted parts can make a big difference in the elapsed time to deal with a problem.

Another essential element that we are missing is a monitoring system. If something breaks at 3AM on Sunday, it may not be noticed until 8AM Monday. Being able to detect a problem immediately is crucial to getting it fixed in a timely manner. Even if it is just one person's workstation, having it fixed or replaced before the user comes to work on Monday makes for much higher effective reliability for the network.

One other innovation that we are working on is a backup file server to perform daily backups. Scripts will copy everyone's home directories over to this machine and then its disks will be written to tape. Then, if the main server fails, we can just move this machine in to take over as main.

What's Missing?

One other thing that still seems to be a problem is a reliable Auto Mount Daemon. While we have run AMD semi-successfully, it has proved to be somewhat unstable in our configuration. AMD depends on NIS, and there have been failures caused by the physical network or the yp server.

To address this problem, Jay had an idea. Note that it is still an idea not a design specification, but it is worth mentioning. Here it is in his words:

Anyway, my idea for a new VFS-based automount would be to have a daemon which mounted a special VFS file system similar to PROC and then, through a series of system calls, populated it according to its automount maps; NIS or file or otherwise. Whenever someone descended into a directory, the kernel would send a standard Unix signal to the daemon, which would execute a system call to figure out what it had to do. This would prevent the daemon from having to use shared memory or other system V IPC, and it would allow the daemon to be ported to Berkeley OSes.

For those not familiar with it, VFS is an interface layer between the user and the actual file system being supported. This made implementation of the DOS, System V and other file systems much easier than on systems without VFS. Implementing an auto-mount file system should be relatively easy using VFS, and it could offer an integrated, reliable solution to a problem that plagues many operating systems.

One other thing that would be nice is a journaling file system (JFS). Unix System V, Release 4 has one and it is probably the one place where SVR4 is superior to Linux. As this is an editorial rather than a technical article, let's just say that JFS maintains enough information to reconstruct the exact file system structure after a crash without having to revert to the seek out and destroy sort of approach that fsck uses. This means a lower probability of losing anything and much quicker reboot times after a crash.

That's it from this end. If there are people out there interested in working on either of these last two projects, drop me a line at phil@ssc.com. Maybe we can be better than everyone after all.

______________________

Phil Hughes

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState