Open Source in MPEG
For centuries my ancestors, who lived in the lower parts of the Alps near the city of Turin, had applied a simple idea: it was more comfortable for everybody if the paths criss-crossing the mountains were cobblestoned instead of left in the state in which the steps of millions of walkers had created them. It is not known whether that work was undertaken by the free decision of those mountain dwellers or by the local communal authority that imposed corvées on them during winter when work in the fields was minimal. After all, farmers are not known to be inclined to share anything with anybody and those were years in which despotism, enlightened or otherwise, ruled.
A few years ago computer people discovered that it was in (nearly) everybody's interest if the virtual equivalent of mountain paths—the raw CPU—could be “cobblestoned” with an operating system that was the result of a collective effort that could be used by all.
Traditionally, computer people have worked with data that was already represented, or could be easily converted to, a form that lent itself to processing by automatic computers. Other types of data, those that reach human ears and eyes, have a very different nature: they are intrinsically analogue. To add difficulty they are also “broadband”, a sliding definition that depends on the state of technology.
Processing and communication of audio and video data has been around for a long time but invariably as ad-hoc solutions. As part of the movement instigated by the Moving Picture Experts Group or MPEG-1, audio and video have been reduced to a form that allows the necessary process to be achieved by integrated circuits. The amount of bits have been reduced to such a level that transmission is possible over today's communication channels.
In parallel to the development of the MPEG-12, MPEG-23 and MPEG-44 standards, MPEG has developed reference software using a process similar to that of open-source software (OSS), even though the details may be frowned upon by the purists of the OSS community. It must be recognized, however, that this process had to be adapted to the rules governing the International Organization for Standardization (ISO), a traditional standards-setting organization under which MPEG operates.
The purpose of this article is to recall how digitization of audio and video was started explain the motivations that led to the establishment of the Moving Picture Experts Group summarize the elements of MPEG standards being used today and explain the characteristics of the MPEG open-source software process and the work currently under way.
It took about 400 years after the invention of movable type, the first example of a technology for large-scale use of information processing not requiring direct human intervention, to see the invention of a technology of a similar impact. Starting in the 1830s, a long string of audio-visual information processing and communication technologies has been made available to humankind: photography, telegraphy, facsimile, telephony, phonography, cinematography, radio, television and magnetic recording.
One drawback of these technologies is that each of them has, in general, little to share with the others. Every time one of these types of information is processed, a special device has to be used. How different from the computer world where processing all kinds of information is made using the same basic technology!
The theoretical groundwork to achieve the goal of unifying all types of audio-visual information started some 15 years before the first electronic computer was built. It was discovered that a band-limited signal (of bandwidth B) could be sampled with a frequency of 2B and reconstructed without error. The second step of the groundwork was achieved some 20 years later with the definition of bounds to quantization errors depending on the number of bits used and the signal statistics.
Even though Bell Laboratories, where the theoretical groundwork had been done, made the first step of converting this groundwork into something practical with the invention of the transistor, there was still a long way to go for practical applications. Even a “narrowband” signal like speech that occupies the 0.3-3.4KHz band on the telephone wire, if sampled at 8KHz with 8 bits per sample, produced the staggering (for that time) value of 64Kbps.
After 15 years of experiments, bits were ready to play a role in speech communication. In the 1960s the then CCITT (now ITU-T) adopted a recommendation for the digital representation of speech. (This actually defined two such representations, called µ-law and A-law.) Both had a sampling frequency of 8 KHz, but the quantization law was 7 bits per sample for µ-law and 8 bits per sample for A-law, both nonlinear to take into account the logarithmic nature of human ear perception. One should not, however, attach too much meaning to this digitization of speech. The scope of application was the trunk network where multiplexing of telephone channels was more conveniently done in digital than in analogue. Nothing changed for the end users.
More interesting was Group 3 facsimile (Gr. 3 fax). An A4 page scanned by the 1728-sensors CCD of Gr. 3 fax in fine resolution mode (same resolution horizontally and vertically) holds about 4 Mbps. With the “high speed” modems of that time (9.6 Kbps) it would have taken about 20 minutes to transmit a page, but a simple compression scheme (sending “run lengths” encoded with variable-length code words instead of all blacks and whites and some bidimensional extensions) brought down transmission time to two minutes six seconds.
Digitized speech was an effective transmission method for the trunk network, but the local access remained hopelessly analogue. The advent of ISDN in the 1980s prompted the development of standards for speech compression with the bandwidth of 7 KHz, sampled at 16 KHz with a higher number of bits per sample (e.g. 14) than µ-law and A-law. Compression was needed because this kind of speech would generate in excess of 200 Kbps. Reduction to 64 Kbps and below (compression ratio of about four) was possible preserving high speech quality. This device used DSPs (Digital Signal Processing) but never gave rise to a mass market. Video presented a bigger challenge if one considers that its bandwidth is three orders of magnitude more than that of speech and involves more than one signal. Digital television is obtained by sampling the video luminance Y at 13.5 MHz and the two chrominance differences R-Y and B-Y at 6.75 MHz with 8 bits/sample. The total bitrate of 216 Mbps could be reduced to about 166Mbps by removing the nonvisual samples. Such high bitrates were unsuitable for any practical transmission medium and were used only for the digital tape (so-called D1) and transmission in the studio.
The first attempt to apply bitrate reduction to reduce this high bitrate to 1.5-2 Mbps to fit the American and European speech multiplexers of 24 and 32 digital speech channels, respectively, was (and still largely is) considered too challenging. Therefore, the input bitrate was first reduced by 2:1, subsampling the video signal in the horizontal and vertical (actually temporal, as the video signal is interlaced) directions and by further subsampling the chroma differences. Then, two simple techniques called DPCM and conditional replenishment were used. A second generation of codecs, using more sophisticated algorithms (DCT [Discreet Cosine Transform] and motion compensation), provided acceptable quality at 384Kbps and, by further 2:1 subsampling the video signal in the horizontal and vertical directions at 64/128Kbps, the bitrate of ISDN.
Going back to audio, in the early 1980s Philips and Sony developed the Compact Disc, a read-only digital storage device that employed laser technologies (a comparable system was developed at about the same time by RCA, but was short-lived). This was designed with stereo music in mind: two audio channels sampled at 44.1KHz with 16 bits per sample for a total bitrate of 1.41Mbps.
Lastly, in the US (through the Advanced Television initiative) and in Europe (through the development of an industrial company) steps were taken toward the development of a market for digital high-definition television.
My work experience has been in a telecommunications research establishment. The telecommunication industry used to be characterized by considerable innovation in the network infrastructure where investments were not spared and by reluctance to invest in terminal equipment. This was in part because terminals were alien to its culture (even though the more enlightened individuals were aware that unless there were new digital terminals there would not be much need for network innovation), and in part because the terminal was technically and legally outside of its competence. The attitude was “Let the manufacturing industry do the job of developing terminals.” Unfortunately, the telecommunications manufacturing industry, accustomed to being pampered and running the risk of fewer orders from the telcos based in solid CCITT standards, had no desire to make investments in something based on the whim of end users they did not understand. The consumer electronics industry, which knew end users better and was accustomed to make business decisions based on their judgment of the validity of the products, still considered telecommunications terminals out of its interest. This explains why, at the end of the 1980s, there was virtually no end-user equipment based on compression technologies, with the exception of facsimile. To make cheap and small terminals one would have needed ASICs (Applications Speciftc Integrated Curcuits) capable of performing the sophisticated signal processing functions needed by compression algorithms.
I saw the attempts being made by both Philips and RCA in those years to store digital video on CDs for interactive applications (called CD-i and DVI, respectively) as an opportunity to ride on a mass market of video compression chips that could be used for video co-communication devices. What was required was the replacement of the laborious and unpredictable “survival-of-the-fittest” market approach of the consumer electronics world with a regular standardization process.
So started MPEG in January 1988 with the addition of the mandate a few months later for audio compression and the function needed to multiplex and synchronize the two streams (called “systems”). In four years the first standard MPEG-1 was developed. Interestingly, none of the two original target applications—interactive CD and digital audio broadcasting—are currently large users of the standard (video communication has not become too popular either). On the other hand, MPEG-1 is used by tens of millions of video CDs and MP3 players. One feature of MPEG-1 that is remarkable: MPEG-1 was the first audio-visual standard that made full use of simulation for its development. The laboratory at which I worked took part in the development of the 1.5-2Mbps video conference codec using three 12U racks and minimal support from computer simulation. Even more significant for future implications was the fact that MPEG-1—a standard in five parts—has a software implementation that appears as “part 5” of the standard (ISO/IEC 11172-5).
In July 1990, MPEG started its second project, MPEG-2. While MPEG-1 was a very focused standard for well-identified products, MPEG-2 addressed a problem everybody had an interest in: how to convert the 50-year-old analogue television system to a digital compressed form in such a way that the needs of all possible application domains were supported. This was achieved by developing two system layers. One, called the MPEG-2 Transport Stream (TS), was designed for error-prone environment targets (such as cable, satellite and terrestrial) of the transmission application domains. The other, called MPEG-2 Program Streams (PS), was designed to be software-friendly and was used for DVD. The idea was that MPEG-2 would become the common infrastructure for digital television; indeed, something that has been successfully achieved if one thinks that at any given moment there are more bits carried by MPEG-2 TS than by IP. The title of the standard “Generic Coding of Moving Pictures and Associated Audio” formally conveyed this intention. By the time MPEG-2 was approved (November 1994), the first examples of real-time MPEG-1 decoding on popular programmable machines had been demonstrated. This was, if there had been a need for it, an incentive to continue the practice of providing reference software for the new standard (ISO/IEC 13818-5).
In July 1993, MPEG started its third project, MPEG-4. The first goal is reflected in the original title of the project, “very low bitrate audio-visual coding”. Even though no specific mass-market applications were in sight, many sensed that the digitization of narrowband analogue channels, such as the telephone access network (Internet was not yet a mass phenomenon), would provide interesting opportunities to carry video and audio at a bitrate definitely lower than 1Mbps, roughly the lowest bitrate value supported by MPEG-1 and MPEG-2. For that bitrate range it was clear that a decoder could very well be implemented on a programmable device, unlike other MPEG standards. It was possible that there would eventually be more software-based than hardware-based implementations of the standard. This was the reason the reference software, part 5 of MPEG-4 (ISO/IEC 14496-5) has the same normative status as the traditional text-based descriptions of the other parts of MPEG-4.
MPEG-4 became a comprehensive standard as signaled by its current title, “coding of audio-visual objects”. The standard supports the coded representation of individual audio-visual objects whose composition in space and time is signaled to the receiver. The different objects making up a scene can even be of different origins: natural and synthetic.
This does not mean, however, that a particular implementation of the standard is necessarily “complex”. An application developer may choose among the many profiles—dedicated subsets of the full MPEG-4 tools—to select the one used to develop his application. For all these reasons, it is expected that MPEG-4 will become the infrastructure on top of which the currently disjointed world of multimedia will flourish.
Readers may wonder why, if the coding algorithm is implemented in software, there was a need to develop a standard. Shouldn't it suffice to download the code that allows for the decoding of the particular algorithm used to produce the bitstream of your interest?
In the early days of MPEG-4's development this question used to be asked very often, but today, with the ever-expanding use of MP3, it is easier to understand the benefits of having a standard: a playback device is not necessarily connected to the network. Instead, it may be on a broadcast channel, a stand-alone or portable device; the devices can use many different CPUs for which it could be too costly to develop playback codes; the hardware may use an ASIC for the audio-visual decoding that is not upgradeable; or it may have been designed to run with just the amount of RAM that the standard algorithm requires. In other words, it is simpler to have a common standard on which business opportunities can multiply, instead of having to struggle with incompatibilities all over the place.
Lastly, it should be kept in mind that compression coding is not a transparent operation. In general, the lower the bitrate used, the more the quality is affected negatively. Transcoding from one algorithm to another may simply produce garbage. Also, the idea that compression technology keeps improving is a myth. Only now, after many years, is MPEG re-issuing a call for proposals for video compression technologies because of the feeling that there may be something worth considering. For audio compression MPEG is still at the level of issuing a call for evidence because the group is not convinced this is an area currently worth pursuing.
The very size of the standard has transformed the development of the reference software into a huge undertaking. It is therefore interesting to see how such a project was managed. These are the most important features:
The condition was set that any component of the standard, both normative (decoder) and informative (encoder), had to be implemented in software. For any proposal to be accepted and adopted, it was a condition that source code be made available and the copyright released to ISO.
For each portion of the standard, a manager of the code was appointed: a representative of Microsoft and MoMuSys for video in C++ and C respectively, Fraunhofer for natural audio, MIT for Structured Audio, ETRI for Text-to-Speech interface, Optibase for the so-called “Core” (the code portion on which all media decoders and other components plug in), Apple for the so-called MPEG-4 File Format, etc.
Each portion of the standard had a manager of experiments appointed. This manager integrated the code of the accepted tools in the existing code base.
Unlike traditional open-source software projects, only MPEG members could participate in the project. Discussions were usually held (and the practice still continues) on e-mail reflectors that are open to non-MPEG members.
MPEG is a place where new ideas are continuously forged. One idea was generated by the fact that while the reference code is intended to be “reference” (normative or informative as the case may be), it is not intended to be efficient. Therefore, since December 1999, MPEG has been working on a new part of MPEG-4 that will contain optimized code (e.g., optimized ways to search for motion vectors, a computationally expensive part of the standard). Any implementer can take this code and use it free of copyright. The condition has been set, however, that such optimized code should not require patents. A second idea, launched in October 2000, led to the decision to develop an MPEG-4 “reference hardware description”. It is expected that this will further promote the use of MPEG-4 as the basic multimedia infrastructure in both software and hardware.
The text of the so-called “copyright disclaimer” that is found on all MPEG-4 software modules is given below.
This software module was originally developed by <First Name 1> <Last Name 1> (<Company Name 1>) and edited by <First Name 2> <Last Name 2> (<Company Name 2>), <First Name 3> <Last Name 3> (<Company Name 3>), in the course of development of the <MPEG standard>. This software module is an implementation of a part of one or more <MPEG standard> tools as specified by the <MPEG standard>. ISO/IEC gives users of the <MPEG standard> free license to this software module or modifications thereof for use in hardware or software products claiming conformance to the <MPEG standard>. Those intending to use this software module in hardware or software products are advised that its use may infringe existing patents. The original developer of this software module and his/her company, the subsequent editors and their companies, and ISO/IEC have no liability for use of this software module or modifications thereof. Copyright is not released for non-<MPEG standard>-conforming products. <Company Name 1> retains full right to use the code for its own purpose, assign or donate the code to a third party and to inhibit third parties from using the code for non-<MPEG standard>-conforming products. This copyright notice must be included in all copies or derivative works. Copyright ( 199_).
Currently, MPEG is engaged in the final stages of development of MPEG-7, “Multimedia Content Description Interface”, a standard to describe audio and video information, be it at the level of a complete movie or as a single object in a picture. The standard will be approved in July 2001. Also, for this standard there is a huge body of reference code that has been developed according to rules similar to those of MPEG-4.
In June 2000, MPEG started a new project called MPEG-21, “Multimedia Framework”. In this context, MPEG will develop and integrate, in collaboration with other bodies, all the technologies that are needed for electronic commerce of digital content on the network.
The key technologies that are needed by this project are:
Digital Item Declaration: a uniform and flexible abstraction and interoperable schema for declaring Digital Items.
Content Representation: how the data is represented as different media.
Digital Item Identification and Description: a framework for identification and description of any entity regardless of its nature, type or granularity.
Content Management and Usage: the provision of interfaces and protocols that enable creation, manipulation, search, access, storage, delivery and (re)use of content across the content distribution and consumption value chain.
Intellectual Property Management and Protection: the means to enable content to be persistently and reliably managed and protected across a wide range of networks and devices.
Terminals and Networks: the ability to provide interoperable and transparent access to content across networks and terminal installations.
Event Reporting: the metrics and interfaces that enable users to understand precisely the performance of all reportable events within the framework.
Of particular interest for this article is item five, Intellectual Property Management and Protection. Since MPEG-2 times, MPEG has been mindful of the need to provide solutions for those content and service providers who attach monetary value to content. So far, the solutions provided by MPEG have been at the level of enabling the use of proprietary protection technologies. However, these have the disadvantage that consumption of protected content is no longer transparent to the user, even in the case where users are willing to adhere to the conditions set by the rights holder. This is the reason MPEG is now developing a solution to provide “interoperability at the level of protected content”.
In the 15th century, “Letter patents” were already in use in Venice and Florence but unknown in Mainz. Therefore, the only way for Johannes Gutenberg to protect his invention was by hiding the secrets from everybody, including his financial backers, and this eventually led him to ruin. In the 19th century, all audio-and video-related inventions were protected by patents. This continued in the 20th century although the centre of gravity progressively shifted from individuals to the companies hiring them. When the prospects of using digital technologies became clear, all companies and organizations started making or funding research in audio and video coding. Today, the number of patents is counted by the thousands.
When MPEG started its work in audio-visual coding, it was immediately evident that either MPEG play by the existing rules in the audio-visual world—that standards usually require patents for their implementations—or it would have been impossible to produce any standard of practical value. This is in addition to the difficulty MPEG, with no funds of its own, faced becoming aware of patents required to implement its standards.
The problem of patents in standards is, of course, well known to the three main international standards organizations: IEC, ISO and ITU. They have developed the following general policy:
No patent should be required to implement a standard or;
The rights holder should release the rights or; and
The rights holder should make a statement where he or she engages to give license to his patent “on fair and reasonable terms and nondiscriminatory conditions”.
MPEG has, therefore, developed a policy for the development of its standards that deliberately neglects consideration of patents and seeks only to achieve the optimum performance. The result has been that MPEG standards usually require a large number of patents.
As many as 100 different standards are reportedly needed to implement an MPEG-2 decoder. Because of the high interest in a “one-stop shop” for MPEG-2 patents, a private organization giving license to most MPEG-2 patents has been set up. Interestingly, the amount to be paid for the patents in an MPEG-2 decoder has remained constant, while the number of relevant patents has increased.
The same is happening with MPEG-4. The MPEG-4 Industry Forum (http://www.m4if.org/) has been established with the goal of kicking off patents pools for MPEG-4 profiles. Of course, the MPEG-4 case is much more complex since many business models require decoder download. A similar organization for MPEG-7 is likely to be set up soon.
Through a completely different process, MPEG—as representative of the world of audio and video—has come to a conclusion similar to the world of data-processing regarding the need to provide open solutions expressed in software (or, as the case may be, hardware) to technologies that are considered part of the “infrastructure”. The outstanding difference is that while the data processing world likes to define fully open technologies, MPEG bows to the reality of the world of digital audio and video where patents are found all over the place. Therefore, reference software (and reference hardware description) is copyright-free but, in general, not patent-free.
MPEG-21, a project to define an ecosystem of content on the network, places standardization of the infrastructure one level higher compared to what has been done so far. As the provision of reference software, be it normative or informative, is now an integral part of MPEG standards, it can be expected that considerable challenges lie ahead when MPEG will need to accommodate libertarian spirits with other, more mundane considerations. But, I believe it is better to deal with this problem in a group of technical experts than in a court of law or in a parliament.
The cooperation of all parties is sought.