DVD Transcoding with Linux Metacomputing
DVD is based on a subset of standards ISO/IEC 11172 (MPEG-1) and ISO/IEC 13818 (MPEG-2). A DVD movie is divided into three parts: video objects (VOBs) files with a maximum size of 1GB each, multiplexing video and audio sources.
Three types of MPEG-2 frames exist: I (Intra), P (Predictive) and B (Bidirectionally-predictive). I frames represent full images, while P and B frames encode differences between previous and/or future frames. In principle, it seems obvious that video stream cuts must be located at the beginning of I frames. This is almost right, but not quite. Some parameters, such as frame rate and size, must be taken into account. This information is part of the Sequence Header. For this reason, packets chosen as cut points must have a Sequence Header. Fortunately, there is a Sequence Header before every I frame.
Another important issue is frame reordering due to the existence of P and B frames. After an I frame, B frames may follow that depend on P frames that came prior to the I frame. If the video stream is partitioned at the start of that I frame, it is not possible to maintain video transcoding consistency. The solution consists of assigning the late B frames to the previous chunk. As a consequence, a little extra complexity is added to video preprocessing.
Obviously, it is not interesting to fragment video to the maximum extent, because the size of the chunks would be too small. Typically, about 300KB exist between two consecutive I frames, although this length depends on several parameters, such as bit rate or image size.
We considered two basic load balancing strategies for our project. In the first, called Small-Chunks, the DVD movie is divided into small chunks of a fixed size. Condor assigns a chunk to every available computer. When a computer finishes transcoding one chunk, it requests another one. This process is repeated until there are no more chunks left on the server. In the other strategy, called Master-Worker, load balancing depends on the shares, which are determined by the master processor. Obviously, the other computers involved are the workers. This strategy often is used for high-throughput computations. For this project, chunk size for each particular computer is assigned according to a training stage, as explained in the next section.
It should be understood that we deliberately do not consider the possibility of machine failures or user interference. If those events take place, the performance of the simple Master-Worker implementation in this project would drop. Nevertheless, our two approaches are illustrative because they are extreme cases, pure Master-Worker on one hand and the high granularity of Small-Chunks on the other. Fault/interference-tolerant Master-Worker strategies lie in the middle. Our aim is to evaluate whether the behavior of our application is similar in the two extremes, in terms of processing time and transcoded file size. As the results described in this article suggest, Small-Chunks may be more advantageous due to its simplicity (it does not need a training stage) and because it adapts naturally to Condor's management of machine unavailability.
To provide information to the Master-Worker coordinator, it is necessary to evaluate all computers beforehand. Evaluation is performed in a training stage, which estimates the transcoding rate of each computer in frames per second. The training stage of our prototype consists of transcoding a variety of small video sequences in the target computer set and estimating the average frames per second delivered by each computer. This result then is used to set the sizes of the data chunks, which are proportional to the estimated performance of each computer. Ideally, this approach minimizes DVD transcoding time, because all computers should finish their jobs at the same time.
Our testbed emulated a typical heterogeneous computing environment, including machines at the end of their usage lives. It was composed of five computers (see Table 1), classified in three groups according to their processing capabilities. Two machines were in the first group (gigabyte and kilobyte), a single computer was in the second group (nazgul) and two machines with the worst performance (titan and brio) were in the third group.
Table 1. Test Bed Computers
|gigabyte||Intel Pentium 4||1,700||256 DDR||528,205||1,388|
|kilobyte||Intel Pentium 4||1,700||256 DDR||624,242||1,355|
|titan||Intel Pentium II||350||320||67,987||398|
|brio||Intel Pentium II||350||192||72,281||398|
In addition, all computers were linked to a 100Mbps Ethernet network, and the operating system used in all computers was Red Hat Linux 8.0. All computers shared the same user space, defined by an NIS server, and the same filesystem (NFS server in gigabyte). Finally, we installed Condor v. 6.4.7, and gigabyte was the central manager. Condor was configured to keep all jobs in their respective processors regardless of user activity. Thus, the timings in this section are best-case results, as mentioned above.
The DVD-to-DivX parallel transcoder was implemented with the following libraries:
libmpeg2 0.3.1: DVD MPEG-2 stream demultiplexing and decoding.
liba52 0.7.5-cvs: DVD AC3 audio decoding.
DivX 5.0.1alpha: MPEG-4 video encoding.
lame 3.93.1: MP3 audio encoding.
Fast/Flexible Linux OS Recovery
On Demand Now
In this live one-hour webinar, learn how to enhance your existing backup strategies for complete disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible full-system recovery solution for UNIX and Linux systems.
Join Linux Journal's Shawn Powers and David Huffman, President/CEO, Storix, Inc.
Free to Linux Journal readers.Register Now!
- Google's Abacus Project: It's All about Trust
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Seeing Red and Getting Sleep
- Secure Desktops with Qubes: Introduction
- Fancy Tricks for Changing Numeric Base
- Back to Backups
- Working with Command Arguments
- Secure Desktops with Qubes: Installation
- Linux Mint 18
- CentOS 6.8 Released
Until recently, IBM’s Power Platform was looked upon as being the system that hosted IBM’s flavor of UNIX and proprietary operating system called IBM i. These servers often are found in medium-size businesses running ERP, CRM and financials for on-premise customers. By enabling the Power platform to run the Linux OS, IBM now has positioned Power to be the platform of choice for those already running Linux that are facing scalability issues, especially customers looking at analytics, big data or cloud computing.
￼Running Linux on IBM’s Power hardware offers some obvious benefits, including improved processing speed and memory bandwidth, inherent security, and simpler deployment and management. But if you look beyond the impressive architecture, you’ll also find an open ecosystem that has given rise to a strong, innovative community, as well as an inventory of system and network management applications that really help leverage the benefits offered by running Linux on Power.Get the Guide