Linux in the Real World
When I came to SSC (publishers of Linux Journal), I was told the first thing I had to do was learn the computer system. Having never been exposed to Unix, I set out to discover as much as I could. Coming from an MS-Windows environment, I had a lot to learn. The more I learned about the system we use, the more questions I asked. Here is what I found out.
The first thing I noticed was the multi-tasking capabilities of Linux (I'm not even going to get into Win95). Everyone at SSC has a Linux system (workstation) at their desk, which they log into every morning. In addition, there are two non-Linux systems in the office: a Windows for Workgroups system used for graphics and magazine layout and a Unix System V, Release 4.2 system used to run the Progress database, which has not yet been ported to Linux.
Once logged onto their local system, users can perform tasks locally (such as reading electronic mail) or access the other computers via rlogin, telnet, ftp, and so on.
All of the workstations are linked via a thin Ethernet backbone, except for a few which are connected via twisted pair Ethernet to a twisted pair hub, which is then connected to the thinnet backbone.
The main backbone ends at the Orion Firewall System that sits between the internal network and a second, externally visible network that connects to the Internet through a Xyplex router and a CSU/DSU (also known as a “digital modem”) over a T1 Frame Relay connection to our Internet Service Provider, Alternate Access, Inc. Our Web server, www.ssc.com, is also on this externally visible network, outside our firewall.
There is a third network in the office. The Windows for Workgroups (WfW) machine is on this network with one Linux system also connected to the regular internal network. This setup keeps the large amount of traffic between the WfW system and the Linux system (which drives the Imagesetter) from bogging down the main network.
Often, in a multi-user environment like ours, every computer has a unique password file, local to that system. If someone wants to change their password, they have to log into every system individually to make the change office-wide. All of our systems instead use NIS (Network Information Service) to centrally manage all password and group files, access permissions, host address information, and data on a single server. NIS distributes a single master password file to all the systems transparently. Since the network is running NFS (the Network File System), files can be accessed between systems easily. This is easier for both the user and the administrator.
SSC uses sendmail as its mail daemon to monitor and manage the delivery of all electronic mail messages. Sendmail is the de facto standard Mail Transfer Agent (MTA) for most complicated networks. Although it is not easy to configure, it is the most configurable and most flexible of all the mail daemons available. It determines whether each e-mail address is local or remote, delivers local e-mail locally, and sends remote e-mail to remote systems via the Internet (using SMTP, the Simple Mail Transfer Protocol) or UUCP (described later).
All outgoing mail at SSC is routed through a single workstation for delivery via sendmail. This centralizes the e-mail system so there is only one log file, one daemon, one thing to break and be fixed. Incoming mail is queued on the Web server outside the Orion firewall system with smap, a secure mail queue program. Smap acts like a normal mail daemon and queues mail. Then smap calls sendmail to process the queued mail, sending it to the real internal hub for local delivery. The smap client implements a minimal version of SMTP, accepting messages from over the network and writing them to the disk for future delivery by smapd. Like anonymous ftp, smap is designed to run under chroot, except it also runs as a non-privileged process to overcome potential security risks presented by privileged mailers running where they can be accessed over a network. Sendmail still runs, but only when it's told to, instead of all the time.
UUCP (Unix-to-Unix copy) delivered mail is also forwarded to the sendmail hub via smap. Sendmail then sorts the regular SMTP mail from the UUCP delivered mail. The SMTP mail is delivered locally and the UUCP delivered mail is spooled in directories where the UUCP system can find them. Since the modems which deliver the spooled UUCP delivered mail to local recipients are on the Web server, which is on the externally-visible network, these files are transfered from the internal system to the Web server with tar and scp, a secure version of the rcp (remote copy) command.
tar (tape archive) and scp (secure copy) are used every three hours to transfer the mail automatically to the Web server. The mail is then deleted from the local workstation to avoid duplication.
By handling mail this way, only one machine, the Web server, needs to have a modem and access/exposure to the outside world, and the line doesn't need to remain open.
PPP (Point-to-Point Protocol) service allows employees remote dial-in access outside the firewall by the Web server. Users then access the internal network (and their own desktop workstations, if they wish) using ssh, a secure shell that encrypts all the data sent between the internal workstation being accessed and the Web server. Employees can also use ssh to get from their home computers to the Web server ensuring a completely secure line, or telnet if they don't have ssh on their home machines.we
All the user home directories at SSC, as well as the local binaries directory, are shared via NFS (Network File System) between all workstations. With the current system, every time users want to read files in their home directory, the files must be transferred across the network to their computers. Soon we will be moving all the user directories from the office file server to each user's own workstation for the following reasons:
Transferring file across the network is much slower than transfering them from the local hard drives, making file access slower.
Also the network has a limited amount of bandwidth (amount of information it can carry at one time), and eliminating unnecessary use will speed up the network.
SSC's Web server is an AMD 486DX4/100 machine running Linux and the Apache server software. The server contains SSC's catalog and product information, as well as information on Linux Journal, including covers and tables of contents from all issues, selected articles from some issues, and advertiser indices. We also offer space for sale to Linux Journal and WEBsmith advertisers if they need a home for their Web pages.
One of the big advantages of Apache is that it can appear to be different servers with different domain names and IP addresses—that is, it is “multi-homed”. This makes it possible for our single computer running one Web server to serve as the Web server for Zebu Systems, L.L.C. (http://www.zebu.com/) and Cucumber Information Systems (http://www.cucumber.com/) as well.
The server has been in operation since May and accesses continue to increase. We are currently receiving around 35,000 hits per day. Apache is very efficient and reliable. Even with only 16MB of RAM on the machine, we seldom see a load average of over 0.2. This means there is an average of one process waiting to run 20% of the time, and the rest of the time the machine is idle.
Keeping the Web server outside of the Orion Firewall System keeps the internal network safe in the event the Web server is compromised. The Web server is mirrored from the internal network so that if it is compromised, it can be restored easily.
Most of the printing we do at SSC is done using PostScript:
PostScript is a Page Description Language (PDL) developed by Adobe Systems, Inc. PostScript tells any printer which has PostScript built in, how to print a page that consists of text and/or graphics. The page must be generated by software that includes a driver which converts the page into PostScript code; the code, in turn, is translated by the printer. PostScript is the de facto PDL standard for high-end desktop publishing because, among other reasons, it can operate across a range of platforms, is very precise, and has color capabilities. Holt and Morgan—UNIX: An Open Systems Dictionary.
Text files, such as invoices, that we need to print out are done in ASCII. They are sent to one of the dot matrix printers, or the one laser printer reserved for printing plain text.
We have seven printers on the network, shared via lpd. Users can select their default printer by setting their PRINTER environment variable.
Many of the printers are selected based on their location relative to the person using them. Others may be selected because of their speed. One special printer, a Tektronix Phaser III PXi is a wax-transfer color PostScript printer. This is used for producing color proofs of SSC products, magazine covers and pages, and other graphics such as Web pages.
The database we use, Progress, isn't available for Linux. Therefore, the database is run on a Unix System V, release 4.2 system. Progress is run in character mode, and users access the database via rlogin or telnet from their Linux workstations.
This database is used to store all customer and vendor-related information. This includes customer contacts, subscription information, sales transactions, reader service leads, the Linux consultants directory and a whole lot more. We are in the process of moving advertising booking to this database as well. Other small databases (article tracking, for example) are written in Perl and run on Linux systems.
We have used Progress for years. (We used to run the whole office on a Xenix system.) If we had it to do all over again, we would select a database methodology that would make it possible to run everything on Linux and not have a foreign Unix system in the network. While it is not a reliability problem, it does mean we have to support another operating system.
When I tell people I work for a publishing company, many ask, “Are you using a Macintosh for your layout?” The answer is “no”. We use Quark Express and Corel Draw! on Windows for Workgroups and tie the process directly into our Linux network. In order to explain this, let's examine the magazine production process from start to finish.
First, Michael K. Johnson, the editor, finds people to write articles for Linux Journal on various topics. Note that Michael is located in North Carolina, while the remainder of SSC's staff is located in Seattle. This means that Michael uses his local Linux systems to do much of his work and uses his Internet connection to access machines in Seattle. When the articles are sent in (via e-mail), Michael edits them and sends them to our production editor in Seattle. At this point the articles that he sends are in Quark Tag Format—ASCII text with various escape sequences added to them to indicate paragraph types, font changes, and other formatting. We are currently developing a new language closely related to HTML (HyperText Markup Language), the language of the World Wide Web. Once this language is complete, we will be able to automatically translate articles into Quark Tag Format for layout and into HTML for addition to our Web site.
The production editor files the articles, runs some basic checks [like spelling; yours truly can't be trusted to spell anything right—ED] and prints them out in preparation for our copy editors. The print process is done using a shell script that uses sed and awk to translate the Quark tagged file into troff source and pipes the result through groff to produce a reasonable approximation of what the article will look like when it is imported into Quark and printed.
Articles returned from copy editors are then reviewed by the production editor and changes are made as needed. This step sometimes involves contacting the author for clarification—a step that is generally carried out via e-mail.
The final version of the articles are slightly modified by another shell script which like the first, uses sed and awk to do its work. This is necessary because Quark requires that paragraphs be one continuous line. Also there are some particularly awful Quark escape sequences that we have aliased to simpler escape sequences which need to be converted to the real sequences in order to be interpreted correctly by Quark. These modified files are written to a location on the Linux filesystem that can be accessed directly by the WfW system.
Our layout is done using Quark Express on our WfW system. Once the ads are placed, the articles are put in. The result of the first layout process is a semi-complete magazine. A copy is printed locally on the Tektronix printer for review, and a PostScript image is written to a Linux filesystem so Michael can download it and print it in North Carolina. The locally-printed copy is printed from the WfW machine on a Linux-connected printer; this is much faster than directly connecting the printer to the WfW system, as was originally done.
The production editor merges the changes from Michael and the other reviewers and sends them back to layout for final changes. If the changes are extensive, the article may be re-edited as text using vi or emacs on a Linux system and re-submitted to layout.
After the second layout step, a paper copy of the magazine is printed for review by our proofreader. Changes from this cycle are made, and the layout system outputs a PostScript image of the magazine (in 2 to 8-page chunks) to a Linux filesystem connected to the imagesetter.
Although the layout system is running WfW and typesetting is running Linux, the file transfer is completely transparent to both departments, just like printing. This is made possible by Samba.
In the most basic sense, everything in the office uses TCP/IP as the underlying protocol. Windows for Workgroups speaks SMB (Session Message Block protocol), a protocol which uses TCP/IP. In order for Linux to speak SMB, we run Samba. Samba is a free implementation of the SMB protocol for Unix. It was developed by Andrew Tridgell in 1992 in an attempt to mount disk space from a Sun to a PC running Pathworks. [See Linux Journal Issue 7 for an account of Samba's development—ED] The Linux filesystem appears as network drives to WfW.
So Samba is used to seamlessly transfer the PostScript files from the WfW system to the Linux machine that drives the AGFA imagesetter. This is essentially a camera with a laser instead of a lens; it uses the laser to expose a large sheet of photographic film. There is one sheet of film for each black and white page and four sheets of film for each color page. This film is what is mailed to the printing company.
Once a magazine is finished, the files are archived on an Iomega Zip drive via Samba. The Zip drive is a removable 100 MB magnetic disk drive, much like a monster floppy, which is connected through a SCSI interface to the same Linux workstation that connects to the imagesetter. It connects to any SCSI-2 controller and acts like any other SCSI device. It is slightly slower than a typical hard drive but faster than some old MFM/RLL drives.
The ZIP disks are mounted on the Linux system like any other Linux filesystem. Linux views the new drive as a mounted partition of an existing drive. It can then be added to the Windows for Workgroups system, where the ZIP drive is seen as another network drive.
Besides Linux Journal, SSC publishes a series of books and references, primarily on Linux and Unix. Most of these products are done using troff and/or groff. The exceptions are the LDP (Linux Documentation Project) books, which are done in LaTeX, and the Internet Public Access Guide, which was done in Quark Express.
Again, we produce film directly at SSC to ship to the printer. One of the more interesting recent innovations was the ability to produce spot color separations using groff. This came about when we were updating the Korn Shell Reference. This card is in four colors. Besides the additional cost of having the printer produce the separations it was going to be very hard to proof a multi-color card if only black and white output was available.
Work on the part of Arnold Robbins and SSC staff produced an easy way to write the color changes directly in the groff source. With two targets in the Makefile, one for printing the color output and one for producing the color-separated film, we were able to accomplish the desired task. These changes require groff and won't work with troff, since they are done by including raw PostScript commands in the groff source.
Many people may be asking why we continue to produce products using what many consider outdated tools. For those who really use the Unix environment, groff offers some advantages. For example, many of our recent products have been written by authors located far away from our Seattle offices. By using groff, we can send small ASCII files and use tools like diff to only send changes between locations. It also means we don't need multiple copies of expensive layout programs to accomplish the task.
Our office manager uses QuickBooks to do our accounting. Why? Because it was available and inexpensive. She uses DOSemu on her Linux workstation for this task, allowing her to quickly switch back and forth between QuickBooks and, for example, her e-mail. In the future we hope to convert this task over to some software that runs on Linux, but for the moment, this offers a reasonable solution.
We have much planned for Linux in the future. For example, we currently do credit card verification off-line. We hope to write the necessary software to do this directly from a Linux system. We also hope that Progress will be ported to Linux. If this doesn't happen we will probably re-write our database system using a database that runs on Linux.
In conclusion, no, we don't use Linux for everything. But we come pretty close. Linux has proven inexpensive, easily expandable and reliable. For those who thought we didn't use Linux to produce the magazine, now you know. With all but one of our employees sitting at a Linux console every day, I think we practice what we preach rather well.
Some of the tools mentioned in this article may not have been included with your Linux distribution.
NIS: Documentation is available from sunsite.unc.edu/mdw/HOWTO/NIS-HOWTO.html and binaries are available via ftp from sunsite.unc.edu in /pub/Linux/system/Network/admin/yp-clients.tar.gz.
Orion: This firewall was available for purchase from Zebu Systems, http://www.zebu.com or (206)-781-9566 and is based on the Mazama Packet Filter software, www.mazama.com.
smap: smap is part of the Trusted Information System's Firewall Toolkit at ftp://ftp.tis.com/pub/firewalls/toolkit/.
ssh: ssh has a home page at www.cs.hut.fi/ssh/.
Samba: Samba has a home page at lake.canberra.edu.au/pub/samba/samba.html.
Apache: Apache has a home page at www.apache.org.
xv: xv is available with some Linux distributions, and is available via ftp from ftp.cis.upenn.edu in /pub/xv/.
LaTeX, groff, sendmail, tar, diff, and other utilities mentioned in this article should be included with your Linux distribution.
Kevin Pierce is a Marketing Assistant at SSC. He grew up in New Hampshire, went to school at Florida State University, and reserves most Saturday afternoons for college football.