Overview of Linux Printing Systems
This article presents a brief overview of the main printing systems in use on most Linux systems, with an introduction to the concepts and procedures at the core of UNIX printing. We will finish by approaching the future of Linux printing, and how it is quickly improving.
It is important to understand that printing in the Unix world revolves almost entirely around the PostScript page description language, developed by Adobe Corp. as a full-fledged programming language used to describe the contents of each page of a document. Many printers nowadays have an embedded PostScript interpreter, which is in charge of rendering the pages to paper using their PostScript description. All modern Linux desktop applications that have a print option will produce PostScript data to print full-page documents.
This approach is widely different from other desktop-oriented operating systems, and from it stems most of the problems that made Unix printing such a daunting task. Operating systems like Windows or MacOS have much more tightly integrated APIs made available to applications, often exposing the capabilities of the printers and providing an abstraction layer so that applications don't have to worry about device-specific details. Moreover, the printing API is usually integrated with the graphics API used for displaying on the screen, something that has yet to happen with X11.
On most Unix systems, the only available interface pretty much boils down to "submit a job to a queue and hope that it prints correctly". There is no unified way of gathering printer or job status, which seriously impairs the possibilities offered by Linux applications with regards to printing.
While PostScript is the de facto standard for producing documents to be printed on Linux, the printer itself doesn't have to understand PostScript, which stays a relatively expensive technology. In many cases, especially with lower-end printers, the PostScript data will have to be translated to the native page description language of the printer. This is done through the use of a special conversion filter. Generally speaking, a filter is a special program that will process its input and produce processed data on its output. There are different types of filters that are used in the context of Linux printing : conversion filters, I/O filters (responsible for transferring data to the device), processing filters (that transform the document data).
The basis of a printing system is the spooler. The spooler manages queues of print jobs. A queue is usually associated with a single printer, and jobs submitted by users are processed on a first come, first serve basis. When a job gets to be processed, its data is usually passed through a certain number of filters before it gets to the printer itself. UNIX print spoolers come in many different forms. We will focus here on the most popular variants that are widely present in most Linux distributions.
As its name implies, this print system spawned from the Berkeley distribution of UNIX. The Line Printer Daemon (LPD) is still the basis for many other printing systems and spoolers that borrowed its interface and configuration file format, the printcap files. While LPD was initially developed for use with line printers that could only print a line of text at a time, it can be used for full page printers as well.
This was the printing system that made it in the first complete Linux distributions, like the early versions of Slackware. Nowadays, many distributions still ship this print spooler (Debian, Slackware), often alongside other more modern print systems like the other ones discussed in this article. There are many variants of the original BSD spooler still in use today.
The BSD printing system is really just a spooler - that is, its core functionality is limited to queuing jobs. It consists of a daemon (lpd), a couple of configuration files in /etc where queues and their properties are defined, a spooling directory where pending jobs will be held (usually /var/spool/lpd), and a set of basic commands to submit, delete and manipulate jobs (lpq, lprm, lpc).
Queues are defined in the /etc/printcap file, which follows the same format as termcap files, used to describe the capabilities of UNIX terminals. A typical printer entry would look like this :
# Sample queue definition for BSD LPD lp|printer1: :sd=/var/spool/lpd/lp: :lp=/dev/lp0: :if=/usr/sbin/somefilter: :mx#0: :sh:
Each entry defines a queue. There can be several queues referring to the same physical printer (for instance to distinguish certain options). A queue can also have several aliases. In the example above, the queue lp has an alias 'printer1'. Jobs can be sent to either of these printer names, and will be dropped in the same queue. As a side note, 'lp' is usually considered the default queue in the BSD world.
Jobs are submitted to the spooler via the lpr command. A specific queue can be specified with the -P argument. For instance :
lpr -Pprinter1 /path/to/some/file
Jobs that have been submitted but have not yet been processed can be removed from the queue, using the lprm command. The job ID number, as well as various status information, can be obtained by running the lpq command.
BSD LPR is significant because it also defined the LPD network protocol, which is used to submit jobs to remote LPD daemons, and allows UNIX workstations to function as print servers. This protocol is nowadays natively supported by virtually all networked printers. Because of its widespread usage, all other printing systems have had the requirement to at least be able to talk to other LPD daemons and thus implement this protocol.
Here is an example of how to define a remote queue in a printcap file. The jobs will be immediately transferred to the remote queue on the remote LPD daemon, and won't be processed on the original host.
# Sample queue definition for a remote LPD queue on a client remote: :sd=/var/spool/lpd/remote: :rm=printserver.domain.tld: :rp=queue: :mx#0:
The rm attribute indicates the address of the remote LPD server. The rp attribute is the name of the queue on this server where jobs will be sent.
The /etc/lpd.hosts file is used to define which hosts are allowed to forward jobs to the local LPD daemon.
The LPD protocol sends data in two different pieces. First, a control file describing the job will be constructed and sent. This control file includes information about the originating user, the name of the files, and any options attached to the job. Then, the data file follows - it is the document itself and its format is entirely dependent on the printing language in use at the time.
While BSD provided the basis of Unix print spoolers, its functionality is limited. Several projects were started to improve on it and add better configurability and more flexibility. The most wide-spread BSD-based printing system on Linux nowadays is LPRng (LPR Next Generation), written by Patrick Powell. It is essentially a rewrite of the original BSD LPR system, but all of the previous concepts still apply.
While LPRng keeps the printcap file format, it introduces a number of new attributes that make its configuration much more flexible. Filter definitions can be separated and true I/O filters can be defined. Users can also define their own queues, by writing a .printcap file in their home directory.
LPRng also provides commands that emulate the UNIX System V-style printing commands (lp, lpstat, etc).
LPRng comes with the IFHP filter, which can be used with queues to automatically perform some data format conversions (for example to print ASCII text or images).
CUPS is a relatively recent project started by Easy Software. Designed from the ground up, it aims at replacing the BSD-derived printing systems, and integrates a number of emerging standards and technologies making it a very cutting-edge printing system. Recently, CUPS became even more popular as Apple selected it to become the new standard printing system in MacOS X 10.2 (Jaguar).
CUPS is based around the Internet Printing Protocol standard (IPP), which is an IETF protocol derived from HTTP. The CUPS daemon understands IPP requests and it is the primary means of communication with its client applications. As an Internet protocol, IPP makes it easy to deploy print servers on wide-area networks. CUPS also supports other popular protocols used to communicate with printers, and thus can be used to act as a bridge for networked printers that don't have native IPP support. Just like HTTP on which it is based, IPP can be secured by using authentication and SSL connections. CUPS provides native support for this, allowing for secure printing.
Another standard that CUPS embraces is the PostScript Printer Definition file format (PPD), which is another Adobe standard used to describe the capabilities of Postscript printers. CUPS extends their usage to non-Postscript printers, making it one of the cornerstones of modular drivers in this architecture.
CUPS also uses many filters as a means of translating and transporting data to the printers. However, unlike BSD-like print spoolers, this is done in a much more intelligent way. There are different classes of filters available for CUPS.
Back-end filters provide the endpoint where the final data will be delivered. There are filters for parallel ports, TCP/IP socket connections, LPD, and other endpoints.
Document conversion filters ship in standard with CUPS, and are available to convert images, ASCII text, PDF files and HP-GL/2 vector documents to Postscript.
Interface or raster filters are referred to from the PPDs and are called to convert the documents from Postscript to an intermediary file format.
As with other printing systems, a translation filter is needed in order to print to non-PostScript printers. CUPS allows PPD files to describe the filter used to translate to the device's native language, as such :
*cupsFilter: "application/vnd.cups-raster 0 rastertohp"This example is from a PPD file for HP Deskjet printers. What this line means, is that the "rastertohp" program, a filter usually located in /usr/lib/cups/filter, will take data of the MIME type "application/vnd.cups-raster" as its input and convert it to a format suitable to be sent directly to the printer, in this case HP PCL data. This MIME type is a special CUPS type meaning rasterized data, which is basically a raw bitmap format in a format that CUPS filters understand. CUPS ships with a modified version of Ghostscript that is able to translate PostScript into CUPS raster data: the pstoraster filter. The number 0 in the line above indicates the "cost" of the filter, and is a value used by CUPS to prioritize the filters in the chain.
Printer classes are also implemented by CUPS. Originally a feature implemented by some System V printing systems, printers can be grouped into 'classes', to implement automatic load balancing. A class can be sent jobs to just like a regular queue. A job submitted to a class will be dispatched to the first available printer in that class.
Another neat feature of CUPS is its automatic network configuration. Using a broadcasting protocol, all CUPS daemons on the same LAN communicate with each other, and queues configured on a server are automatically browsable and made available on other systems. CUPS also provides "implicit classes" for several printers that have the same name on different servers, providing automatic load balancing. CUPS also has support for SLP (Service Location Protocol), that some devices may implement to broadcast their presence.
On the client side, CUPS has both LPD-like and System V-like interfaces, meaning it provides both lpr, lpq... as well as lp, lpstat, et al. commands to the system. All of these commands are essentially IPP clients communicating with the CUPS daemon through IPP requests.
Additionally, CUPS comes with a Web-based administration interface to directly let administrators and users configure queues from a Web browser, or just check their status. This is generally a lot more user-friendly than using cryptic commands, or manually defining queues by editing the printers.conf file in /etc/cups.
One last feat of the CUPS printing system is that it is not necessarily limited to PostScript as its input data. While it could be argued that other printing systems do not necessarily have this limitation either, CUPS makes it a lot easier. We've seen previously how a conversion filter could be specified in a PPD file for use with CUPS, and how this involved specifying a MIME type. CUPS makes extensive use of the MIME types to determine the flow of data between job submission and the final data on the printer. Filters can be defined in *.convs files (usually in /etc/cups), that describe each of the filter programs : the type of data they accept as input, their "cost", and the type of data they output. Given a job of a certain type, CUPS will then intelligently decide on the chain of filters to be called in order to obtain the final type accepted by the printer: PostScript, or the type specified in the PPD on the cupsFilter line as we saw previously.
Today, the PostScript language stays the primary interface for printing in the UNIX world. All major applications will output at least generic PostScript that will then be processed by the printing system until it gets printed. This is obviously very limited, because applications have no unified way of querying printing features, or even know if a job printed correctly. Very few applications are able to use PPD files to access printer features, although StarOffice and OpenOffice are notable exceptions.
But the situation is improving. For instance, CUPS provides a basic C API that allows applications to be integrated more easily with their printing system. This API includes functions to communicate with a CUPS daemon through IPP, as well as functions to read and parse PPD files, and thus gather detailed information about printers and their capabilities. This still stays quite limited for the application developer, as this only works with CUPS and similar IPP servers.
On the free software side, the Gnome and KDE desktop projects now both include middle-level layers to facilitate printing : KDEPrint and Gnome-Print. These frameworks propose to provide a unified APIs to the applications, by abstracting the underlying printing system.
Things are much better than they were just a few years ago with the emergence of more advanced printing systems. As this is a subject essential to enterprises, we are beginning to see support from big name vendors like HP or IBM that strive to improve on this infrastructure.
Moreover, the Free Standards Group is working on the OpenPrinting project, whose stated goal is to define the next generation of the printing infrastructure for the Linux operating system. Gathering many experts from the industry, this workgroup is defining APIs and standards that will bring Linux up to speed with its competitors.
Stephane Peter is a senior software engineer working for Codehost, Inc in Culver City, CA. When not playing with printing systems, he can be found playing his guitar or biking around in Southern California.