DVD Transcoding with Linux Metacomputing
DVD is based on a subset of standards ISO/IEC 11172 (MPEG-1) and ISO/IEC 13818 (MPEG-2). A DVD movie is divided into three parts: video objects (VOBs) files with a maximum size of 1GB each, multiplexing video and audio sources.
Three types of MPEG-2 frames exist: I (Intra), P (Predictive) and B (Bidirectionally-predictive). I frames represent full images, while P and B frames encode differences between previous and/or future frames. In principle, it seems obvious that video stream cuts must be located at the beginning of I frames. This is almost right, but not quite. Some parameters, such as frame rate and size, must be taken into account. This information is part of the Sequence Header. For this reason, packets chosen as cut points must have a Sequence Header. Fortunately, there is a Sequence Header before every I frame.
Another important issue is frame reordering due to the existence of P and B frames. After an I frame, B frames may follow that depend on P frames that came prior to the I frame. If the video stream is partitioned at the start of that I frame, it is not possible to maintain video transcoding consistency. The solution consists of assigning the late B frames to the previous chunk. As a consequence, a little extra complexity is added to video preprocessing.
Obviously, it is not interesting to fragment video to the maximum extent, because the size of the chunks would be too small. Typically, about 300KB exist between two consecutive I frames, although this length depends on several parameters, such as bit rate or image size.
We considered two basic load balancing strategies for our project. In the first, called Small-Chunks, the DVD movie is divided into small chunks of a fixed size. Condor assigns a chunk to every available computer. When a computer finishes transcoding one chunk, it requests another one. This process is repeated until there are no more chunks left on the server. In the other strategy, called Master-Worker, load balancing depends on the shares, which are determined by the master processor. Obviously, the other computers involved are the workers. This strategy often is used for high-throughput computations. For this project, chunk size for each particular computer is assigned according to a training stage, as explained in the next section.
It should be understood that we deliberately do not consider the possibility of machine failures or user interference. If those events take place, the performance of the simple Master-Worker implementation in this project would drop. Nevertheless, our two approaches are illustrative because they are extreme cases, pure Master-Worker on one hand and the high granularity of Small-Chunks on the other. Fault/interference-tolerant Master-Worker strategies lie in the middle. Our aim is to evaluate whether the behavior of our application is similar in the two extremes, in terms of processing time and transcoded file size. As the results described in this article suggest, Small-Chunks may be more advantageous due to its simplicity (it does not need a training stage) and because it adapts naturally to Condor's management of machine unavailability.
To provide information to the Master-Worker coordinator, it is necessary to evaluate all computers beforehand. Evaluation is performed in a training stage, which estimates the transcoding rate of each computer in frames per second. The training stage of our prototype consists of transcoding a variety of small video sequences in the target computer set and estimating the average frames per second delivered by each computer. This result then is used to set the sizes of the data chunks, which are proportional to the estimated performance of each computer. Ideally, this approach minimizes DVD transcoding time, because all computers should finish their jobs at the same time.
Our testbed emulated a typical heterogeneous computing environment, including machines at the end of their usage lives. It was composed of five computers (see Table 1), classified in three groups according to their processing capabilities. Two machines were in the first group (gigabyte and kilobyte), a single computer was in the second group (nazgul) and two machines with the worst performance (titan and brio) were in the third group.
Table 1. Test Bed Computers
|gigabyte||Intel Pentium 4||1,700||256 DDR||528,205||1,388|
|kilobyte||Intel Pentium 4||1,700||256 DDR||624,242||1,355|
|titan||Intel Pentium II||350||320||67,987||398|
|brio||Intel Pentium II||350||192||72,281||398|
In addition, all computers were linked to a 100Mbps Ethernet network, and the operating system used in all computers was Red Hat Linux 8.0. All computers shared the same user space, defined by an NIS server, and the same filesystem (NFS server in gigabyte). Finally, we installed Condor v. 6.4.7, and gigabyte was the central manager. Condor was configured to keep all jobs in their respective processors regardless of user activity. Thus, the timings in this section are best-case results, as mentioned above.
The DVD-to-DivX parallel transcoder was implemented with the following libraries:
libmpeg2 0.3.1: DVD MPEG-2 stream demultiplexing and decoding.
liba52 0.7.5-cvs: DVD AC3 audio decoding.
DivX 5.0.1alpha: MPEG-4 video encoding.
lame 3.93.1: MP3 audio encoding.