Open Source in MPEG

Covenor of MPEG, Dr. Chiariglione gives the history of the Moving Picture Experts Group and explains the characteristics of the MPEG open-source software process.

For centuries my ancestors, who lived in the lower parts of the Alps near the city of Turin, had applied a simple idea: it was more comfortable for everybody if the paths criss-crossing the mountains were cobblestoned instead of left in the state in which the steps of millions of walkers had created them. It is not known whether that work was undertaken by the free decision of those mountain dwellers or by the local communal authority that imposed corvées on them during winter when work in the fields was minimal. After all, farmers are not known to be inclined to share anything with anybody and those were years in which despotism, enlightened or otherwise, ruled.

A few years ago computer people discovered that it was in (nearly) everybody's interest if the virtual equivalent of mountain paths—the raw CPU—could be “cobblestoned” with an operating system that was the result of a collective effort that could be used by all.

Traditionally, computer people have worked with data that was already represented, or could be easily converted to, a form that lent itself to processing by automatic computers. Other types of data, those that reach human ears and eyes, have a very different nature: they are intrinsically analogue. To add difficulty they are also “broadband”, a sliding definition that depends on the state of technology.

Processing and communication of audio and video data has been around for a long time but invariably as ad-hoc solutions. As part of the movement instigated by the Moving Picture Experts Group or MPEG-1, audio and video have been reduced to a form that allows the necessary process to be achieved by integrated circuits. The amount of bits have been reduced to such a level that transmission is possible over today's communication channels.

In parallel to the development of the MPEG-12, MPEG-23 and MPEG-44 standards, MPEG has developed reference software using a process similar to that of open-source software (OSS), even though the details may be frowned upon by the purists of the OSS community. It must be recognized, however, that this process had to be adapted to the rules governing the International Organization for Standardization (ISO), a traditional standards-setting organization under which MPEG operates.

The purpose of this article is to recall how digitization of audio and video was started explain the motivations that led to the establishment of the Moving Picture Experts Group summarize the elements of MPEG standards being used today and explain the characteristics of the MPEG open-source software process and the work currently under way.

The Digitization of Audio and Video

It took about 400 years after the invention of movable type, the first example of a technology for large-scale use of information processing not requiring direct human intervention, to see the invention of a technology of a similar impact. Starting in the 1830s, a long string of audio-visual information processing and communication technologies has been made available to humankind: photography, telegraphy, facsimile, telephony, phonography, cinematography, radio, television and magnetic recording.

One drawback of these technologies is that each of them has, in general, little to share with the others. Every time one of these types of information is processed, a special device has to be used. How different from the computer world where processing all kinds of information is made using the same basic technology!

The theoretical groundwork to achieve the goal of unifying all types of audio-visual information started some 15 years before the first electronic computer was built. It was discovered that a band-limited signal (of bandwidth B) could be sampled with a frequency of 2B and reconstructed without error. The second step of the groundwork was achieved some 20 years later with the definition of bounds to quantization errors depending on the number of bits used and the signal statistics.

Even though Bell Laboratories, where the theoretical groundwork had been done, made the first step of converting this groundwork into something practical with the invention of the transistor, there was still a long way to go for practical applications. Even a “narrowband” signal like speech that occupies the 0.3-3.4KHz band on the telephone wire, if sampled at 8KHz with 8 bits per sample, produced the staggering (for that time) value of 64Kbps.

After 15 years of experiments, bits were ready to play a role in speech communication. In the 1960s the then CCITT (now ITU-T) adopted a recommendation for the digital representation of speech. (This actually defined two such representations, called µ-law and A-law.) Both had a sampling frequency of 8 KHz, but the quantization law was 7 bits per sample for µ-law and 8 bits per sample for A-law, both nonlinear to take into account the logarithmic nature of human ear perception. One should not, however, attach too much meaning to this digitization of speech. The scope of application was the trunk network where multiplexing of telephone channels was more conveniently done in digital than in analogue. Nothing changed for the end users.

More interesting was Group 3 facsimile (Gr. 3 fax). An A4 page scanned by the 1728-sensors CCD of Gr. 3 fax in fine resolution mode (same resolution horizontally and vertically) holds about 4 Mbps. With the “high speed” modems of that time (9.6 Kbps) it would have taken about 20 minutes to transmit a page, but a simple compression scheme (sending “run lengths” encoded with variable-length code words instead of all blacks and whites and some bidimensional extensions) brought down transmission time to two minutes six seconds.

Digitized speech was an effective transmission method for the trunk network, but the local access remained hopelessly analogue. The advent of ISDN in the 1980s prompted the development of standards for speech compression with the bandwidth of 7 KHz, sampled at 16 KHz with a higher number of bits per sample (e.g. 14) than µ-law and A-law. Compression was needed because this kind of speech would generate in excess of 200 Kbps. Reduction to 64 Kbps and below (compression ratio of about four) was possible preserving high speech quality. This device used DSPs (Digital Signal Processing) but never gave rise to a mass market. Video presented a bigger challenge if one considers that its bandwidth is three orders of magnitude more than that of speech and involves more than one signal. Digital television is obtained by sampling the video luminance Y at 13.5 MHz and the two chrominance differences R-Y and B-Y at 6.75 MHz with 8 bits/sample. The total bitrate of 216 Mbps could be reduced to about 166Mbps by removing the nonvisual samples. Such high bitrates were unsuitable for any practical transmission medium and were used only for the digital tape (so-called D1) and transmission in the studio.

The first attempt to apply bitrate reduction to reduce this high bitrate to 1.5-2 Mbps to fit the American and European speech multiplexers of 24 and 32 digital speech channels, respectively, was (and still largely is) considered too challenging. Therefore, the input bitrate was first reduced by 2:1, subsampling the video signal in the horizontal and vertical (actually temporal, as the video signal is interlaced) directions and by further subsampling the chroma differences. Then, two simple techniques called DPCM and conditional replenishment were used. A second generation of codecs, using more sophisticated algorithms (DCT [Discreet Cosine Transform] and motion compensation), provided acceptable quality at 384Kbps and, by further 2:1 subsampling the video signal in the horizontal and vertical directions at 64/128Kbps, the bitrate of ISDN.

Going back to audio, in the early 1980s Philips and Sony developed the Compact Disc, a read-only digital storage device that employed laser technologies (a comparable system was developed at about the same time by RCA, but was short-lived). This was designed with stereo music in mind: two audio channels sampled at 44.1KHz with 16 bits per sample for a total bitrate of 1.41Mbps.

Lastly, in the US (through the Advanced Television initiative) and in Europe (through the development of an industrial company) steps were taken toward the development of a market for digital high-definition television.