Graphics Formats for Linux
If there is a single unifying feature of the computer world, it is non-conformity. In no place is this better illustrated than computer graphics. Literally dozens of different formats exist to do the same thing: store graphic images. Proprietary formats, independent free standards, and international standards all vie for attention. Fortunately, the computer community has settled on just a handful of formats. Some, like TIFF, are the jacks-of-all-trades and are widely used. Others, like GIF, have a niche that they dominate. But whatever the context, there is always more than one way to store images.
This article first provides a brief introduction to graphics on Linux, including format and type, compression, file structure, and various features. It then continues with descriptions of the common graphics formats found in the Linux world and programs with which to manipulate them.
Graphics formats come in two basic varieties: raster and vector. Raster formats store the entire image pixel by pixel and are the most commonly-used formats. Vector formats, also called descriptive formats, do not contain pixel image data; instead, they contain a series of commands telling a display program how to draw the image.
The most common use of vector formats is in computer aided drafting (CAD). Describing a line, for example, as a starting point, an ending point, and a color, as opposed to many colored points in a field of points, allows a program to manipulate the line as an object, making it easy to change color, length, and position. Another common use is in raytracing programs. These programs take geometric objects and trace the path of light rays to find shadows and highlights. Since the rendered objects are geometric shapes, it is much easier to store an image as “dodecahedron at (3,4,3) with size 2 and color 27” than as an array of pixel values, especially since the scene to be rendered is three-dimensional. Of course, once the scene is raytraced, the resulting image is stored in a raster format.
Images come in four basic types: monochrome (black and white), grayscale, color palette, and true color.
Grayscale and color palette images typically have 2, 4, or 8 bits per pixel. They use lookup tables that must be included in the image file; each entry in the color lookup table gives the red, green, and blue values for a color. Grayscale images have a similar lookup table for gray values. Each color or gray value is in a range of 0-3 for 2-bit images, 0-15 for 4-bit images, and 0-255 for 8-bit images. The actual image pixel data is a series of color or gray values corresponding to table entries which give the actual color of each pixel.
True color images are 24-bit images and are usually stored as 8 bits of red, 8 bits of green, and 8 bits of blue per pixel. Some formats may use different color models than RGB. The JPEG format, for example, uses a colorspace based on the luminance and two chrominance values (YCbCr). True color images do not need a color lookup table or palette.
Compression is integral to most graphics formats. An uncompressed 640x400x256 color image requires just over 256,000 bytes, while a true color image of the same size requires three-quarters of a megabyte. Although many compression schemes exist, only a few are common in graphics applications. LZW, LZ77, Huffman, and run-length encoding are lossless, meaning that the image is reproduced exactly when uncompressed. The cosine transform is the basis for the “lossy” JPEG format. We aren't concerned with the details of the compression schemes here.
JPEG gives by far the best compression, but at the expense of exact image reproduction and speed. Acceptable results are common with compression ratios of 25 to 1, that is, the compressed file is 25 times smaller than the original. Ratios of 100 to 1 may be possible if image quality is not a primary concern.
[FOOTNOTE:A better algorithm does exist, however: fractal image compression. This method gives good results at 100 to 1 and is finding many applications, though it has not yet been widely implemented as a general purpose graphics format.]
LZW and LZ77 are the best lossless schemes and produce similar results: usually around 2 to 1 compression. The modified LZ77 used in the PNG graphics format is slightly slower at compressing images than LZW, but it creates smaller files and is faster at decompressing. LZ77 is also used by the general purpose compression programs ZIP, GZIP, and PKZIP. Until recently, the use of LZW was much more common than LZ77. Its use is falling, however, because it is a patented algorithm, and UNISYS requires royalties for its use. Nonetheless, both GIF and TIFF, two of the three most common formats, use it.
Run length encoding (RLE) is the most common scheme and the least effective. It is, however, very fast compared to other compression algorithms. Its basic mechanism is very simple: when a string of pixels with the same value is encountered, RLE outputs the pixel value and the number of consecutive pixels with that value. Complex images do not compress well with this scheme, although monochrome images often compress very well.
Simple formats consist of nothing more than the image data and a short header giving the minimum amount of information necessary to display the image: size and number of colors. More robust formats often employ a tag or chunk structure. After an initial header, the image is stored in a series of chunks, each prefixed by a code telling the decoding software what the chunk contains. The biggest chunk is the actual image data. Other chunks may include comment blocks, palette information, or other special image information. Software may only support some chunks of a given format: if an unknown chunk is encountered, it is simply skipped.
Finally, three special features bear closer examination: magic numbers for format identification, gamma correction, and alpha transparency masks. In some cases it is possible to identify the format of a file by looking at the file itself. Many formats use magic numbers—special (unique, it is hoped!) codes—to identify the format. GIF files, for example, have ”GIF87a” or “GIF89a” at the start of the file. Because these magic numbers are ASCII coded, using less, strings or even cat (though that can accidentally change your character set) to display them on the screen is good enough for identifying them. For example, to look for GIF files, you can use strings filename | grep GIF.
Gamma correction is an important part of the proper display of an image. Whenever an image is displayed on a monitor, the monitor's characteristics affect the image. How the monitor handles varying degrees of brightness is especially important. In general, the brightness on the screen is related to the brightness levels of the original image by a simple formula: screen brightness equals image brightness raised to the gamma power. The value of gamma for most monitors is around 2.5. To compensate for this, image capture hardware and software may make an “opposite” operation on the image. The result is an image with a gamma of 0.45. When this image is displayed, the gamma of 0.45 and 2.5 cancel each other out, producing an apparent gamma of about 1 (called linear brightness scale).
But what does gamma mean visually? An image displayed with a gamma less than one uses more of its pixel codes in the darker areas, that is, darker areas have better color resolution. A gamma value greater than one uses more of its pixel codes in the lighter regions. A murky image, or an image with too much contrast, may need to have its gamma corrected. Correcting the gamma of an image is inherently lossy because of round-off error during the taking of powers; hence, gamma correction should be performed when the image is displayed, keeping the original image file intact without loss.
Alpha transparency masks are a way of hiding, or masking, parts of an image. In addition to image data, a value is included in the image file for every pixel. A value of 2bitdepth-1 is opaque and the pixel is displayed normally. A value of 0 is transparent, allowing the background color of the screen to show through—the image is not visible at all. The simplest application is to make a non-rectangular image. Define an array the size of the image and then draw a line between two opposite corners. Set the value of each point above the line to one, and the values below to zero. The result will be a triangular image.
BMP is the native bitmapped format used by Microsoft Windows. It is a minimal format that has few features and uses simple run length encoding for data compression. With the widespread use of Windows, BMP is the most common format. In many ways BMP is very similar to the PCX format, and has assumed the role that PCX once held.
BMP encodes 1, 2, 4, 8, and 24 bit images. OS/2 uses a similar BMP format, the only difference being a slightly simpler header. The first two characters of the BMP header are always “BM”.
Another format native to Microsoft Windows is the Windows MetaFile (WMF). WMF uses Windows' Graphical Device Interface (GDI) function calls to store images that appear repeatedly in applications. The GDI calls provide support for setting up the screen, defining regions, colors, text, pixels, lines, polygons and bitmaps. WMF supports uncompressed monochrome, color palette, and true color images.
The Graphics Interchange Format (pronounced “jiff”) was originally developed by the Compuserve Information Service as a color replacement for the earlier RLE format. Since the 1987 release, GIF has become the standard for graphics interchange, especially on networks. A second release in 1989 added several new features to the format. GIF is a lossless format based on the LZW compression scheme. It uses a tag system to identify extension blocks in the file, although only a few tags have been defined. The biggest assets of the format are its relative simplicity, excellent compression, and widespread availability.
However, the format has two major drawbacks. First, it is based on a copyrighted compression scheme and commercial software using it must pay royalties to Unisys, the patent holder. Second, GIF can be used only for images with 256 or fewer colors. (Strictly speaking, that is. Since GIF supports local color maps, the palette can be changed in the middle of an image allowing for more than 256 colors. Unfortunately, much of the available software does not support, or supports poorly, this feature, and it is, at best, clumsy to use.) GIF files can be identified by the string GIF87a or GIF89a at the start of the file.
JPEG (pronounced “jay-peg”) is a standard for lossy compression of images. JPEG achieves compression by breaking the image into 8x8 boxes of pixels, performing a mathematical operation called a cosine transform on each, and throwing out the high frequency/small detail components; the more components thrown out, the greater the compression and the poorer the image quality. The remaining frequency components are then run length encoded and compressed with the Huffman algorithm. To view an image, a JPEG file must be uncompressed, decoded, and an inverse cosine transform performed. This three-step process makes displaying JPEGs very slow. Fortunately, JPEG produces excellent compression with little or no visible image loss.
JPEG itself is not actually an image format but a set of standards for compressing image data. JFIF, the JPEG File Interchange Format, is the format commonly referred to as JPEG. JFIF does not support all of the features of JPEG, but is intended as a minimal implementation for image transfer. A full featured implementation of JPEG is included in version 6.0 of the TIFF format. Designers hope the two implementations of JPEG, one minimal and one full featured, will deter software vendors from defining proprietary formats based on JPEG (early Macintosh versions were especially notorious for incompatibilities).
JPEG works best on real world images; line drawings and cartoons will not compress nearly as well as scanned images. Unlike most other formats, JPEG does not store a pixel as red, green, and blue values. Instead it uses a format called YCbCr. Since most display hardware uses RGB values, the YCbCr values must be converted—yet another step slowing decompression. The JFIF header includes the ASCII characters “JFIF” for format identification.
These three formats are intermediate formats used by the PBMPLUS utilities. The acronyms stand for Portable BitMap/GrayMap/PixMap. PBM is for monochrome images, PGM for grayscale images with up to 256 shades of gray, and PPM for color images using up to 24-bits of true color. A fourth “format” is the Portable AnyMap, PNM. PNM is not actually a format itself. A program that uses PNM can read and write PBM, PGM, and PPM files. PNM is used for utility programs that support multiple image types. For instance, since the image type of a TIFF file may not be known, PNM reads the TIFF file and writes the appropriate file type.
Each of the four formats can read the other ones that carry less information. That is, a PGM utility reads PGM and PBM, a PPM utility reads PPM, PGM, and PBM. PBM, PGM, and PPM utilities always write in their own format, while PNM utilities generally write whatever format they have read. The formats store data either as ASCII or binary data and are otherwise basic formats consisting of a header and image data. The header consists of a magic number to identify the format, image size, and (except for PBM) the number of colors/gray shades. The magic strings are PBM P1(P4), PGM P2(P5), and PPM P3(P6), where the first code is the code for ASCII data, and the code in parentheses is for binary data. True color images store pixel data as a triplet of numbers for RGB data. For more information on the PBMPLUS package, see the section on software below.
The PCX format is the native format of Z-Soft's PC Paintbrush program and uses run length encoding. As one of the first general purpose formats, its use has been very widespread: few programs exist that do not recognize it. PCX is a basic format consisting of a 128 byte header followed by image data. Monochrome as well as 4, 8, and 24-bit color images are supported. The palette for 4-bit images is included in the header while the 8-bit palette is appended after the image data. This non-uniformity is the result of an older format being updated for newer hardware. For this reason, the use of PCX has dwindled in favor of the more coherent and unified BMP.
Amazingly, with all the other formats available, there has been an important niche left unfilled: a portable, relatively simple, free standard for the lossless exchange of true color images—in short, a 24-bit version of GIF with a non-patented compression algorithm. There are existing formats that can be used, but only TIFF has the necessary features, and it suffers from over-completeness: very few implementations make use of the entire TIFF specification. What is needed is a format that is simple enough that any image saved under it can be read by any viewer supporting it.
Compuserve called their development specification GIF24, but when a group of Internet graphics experts developed PNG (Portable Network Graphics, pronounced “ping”), Compuserve adopted it as the successor to GIF. PNG is a natural extension to GIF, although it is not backwards compatible because of a change in compression scheme. In addition to the original GIF features, PNG supports true color images up to 48 bits and grayscale images to 16 bits, as well as full alpha-channel, gamma correction, and detection of file corruption. On 8 bit and larger data, PNG can use a preprocessor on the image data prior to compressing. In many cases this processing improves the compression efficiency and results in smaller file sizes. Expect to see PNG files appearing at an archive near you soon.
PostScript (PS) is a page description language developed by Adobe Systems that is both a descriptive and raster format and has become one of the most common printer languages. PostScript files are created by many application programs as a device-independent output format. Encapsulated PostScript (EPS) is a limited version of postscript for single pictures and is used for images to be included, or “encapsulated”, in programs or postscript files. The first line of a postscript file is a line of the form:
where the number refers to the PostScript version the file was created under. EPS files append EPSF-3.0 to this line. PostScript is a large and versatile language. Display PostScript is an implementation of PostScript for controlling video hardware and is used by NeXT computers and software.
The Targa Truevision graphics format was developed for use in Truevision's product line. TGA uses run length encoding to store grayscale, color table, and true color images, and can include comments, gamma, alpha, color corrections, and a “postage stamp” version of the image. TGA was one of the first true color formats and is still used by some applications like the Persistence of Vision Ray Tracer. TGA files may contain a block at the end of the file that includes the text TRUEVISION-XFILE.
The Tag Image File Format is the most robust format. As its name suggests, TIFF makes liberal use of the tag concept. TIFF files are typically read using random access because the tag fields can come in any order; an image file directory provides offsets to the location of data in the file. (Even the directory can be anywhere in the file! A short header gives the directory location in the file.) The TIFF format defines several classes of image data: bilevel (monochrome), 4 to 8-bit grayscale or color palette, 24-bit RGB, and 24-bit YCbCr. It supports run length, Huffman, LZW, and JPEG compression. Most implementations do not support all TIFF features, making TIFF a potentially aggravating format. However, TIFF has a huge number of features (implemented in tags) making it unique. Originally intended for desktop publishing, TIFF has spread to video, fax, and document storage, as well as medical and scientific imaging. Because of its complexity, it is not commonly used for home applications.
X-Windows defines several formats for internal use. X Bitmap (XBM) is an ASCII format for including 1 bit images in the C source of a program: XBM images are integral to the code and included at compile time, not run time. XBM data is usually included in header files and includes two define statements for the width and height of the image followed by a static unsigned character array. XBM images are often used for icons and cursor bitmaps in X. X Pixmap (XPM) is the equivalent format for color palette images. Its use is identical to XBM except for the addition of the color table and three extra define statements for version number, color table length, and the number of bytes per pixel. X does define a general format for images, the X Window Dump (XWD or WD) format. XWD supports uncompressed raster data of all types.
Having looked at some of the common graphics formats, we finish up with a quick look at some available programs and libraries for converting images.
xv and ImageMagick
These two X programs display, convert, and manipulate images. xv can read and write JPEG, GIF, TIFF, PBM/PGM/PPM, and X formats, although it cannot display 24bit images as 24bit. In addition to format conversions, xv can perform simple image manipulations, e.g., rotate, flip, crop, magnify, and gamma correction. ImageMagick supports the formats xv does, plus TGA. It can send output directly to a postscript printer and has more image manipulation capabilities than xv: cut, copy, paste, resize, flip, flop, rotate, invert, emboss, and more, on the whole or on part of the image.
The canonical way to view and convert from PostScript (including EPS) is ghostscript. Ghostscript operates both from the command line options and interactively. Input is always a PostScript file and output may be any of several different file formats, printer languages, or screen types. By default, ghostscript sends its output to an X terminal, but it can also save images to file. Ghostview is a popular front end for ghostscript that improves the screen handling.
The PBMPLUS utilities are a set of 120 programs for image conversion and manipulation. PBMPLUS defines three intermediate formats: PBM, PGM, and PPM (see listing above). The basic philosophy is that if there are twenty formats, you need twenty squared, or 400, programs/subroutines to convert between them. The use of intermediate formats reduces that number to two times twenty, or forty. In addition to the conversion programs, there are a number of simple image processing programs: scaling, rotating, smoothing, convolution, gamma, cropping and more. NETPLUS is a newer version of PBMPLUS, but some versions suffer from serious bugs.
C Library support
Substantial support exists for C programmers working with graphics. Libraries for JPEG/JFIF, TIFF, and PNG are now available. In addition, code fragments for many of the other formats are readily available from Internet archive sites.
IJG JPEG/JFIF library
The easiest way to do JPEG programming is to use the Independent JPEG Group's (IJG) library. This library is based on the JFIF specification and includes two common programs found on many Unix systems for storage and retrieval of JFIF images: cjpeg and djpeg. The library is available as source and compiled code.
SGI TIFF library
Like the JPEG library, a full set of routines to implement TIFF is available. Written by Sam Leffler and SGI, this library is also available as source and compiled code. The library includes programs for converting, dithering, splitting, and displaying information on TIFF files.
With the recent announcement of support for the PNG format, Compuserve also announced a PNG toolbox. The toolbox uses the zlib library for the LZ77 code and is intended to speed the acceptance of the new format. Its use will be free of royalties. A beta version is available on Compuserve.
There are several archive sites on the Internet where programs and further information can be found:
anonymous ftp sites:ftp://sgi.com/graphics/tiff TIFF version 6.0 specification and the source for the SGI TIFF libraryftp://sunsite.unc.edu/pub/Linux/apps/graphics/convert PBMPLUSftp://sunsite.unc.edu/pub/Linux/X11/xapps/graphics ImageMagickftp://sunsite.unc.edu/pub/Linux/libs/graphics Compiled version of JPEG and TIFF libraries for Linuxftp://sunsite.unc.edu/pub/Linux/apps/graphics/viewers Compiled version of ghostscript for Linuxftp://ftp.wuarchive.edu/ Numerous code fragments for image formats are scattered throughout this large archivecompuserve.com GRAPHSUPPORT forum, library 20, LP071.zip, the beta version of the PNG toolboxwww.uwm.edu/~ggraef. List of links and other direct links to format information