DVD Transcoding with Linux Metacomputing

A Condor high-throughput DVD transcoding system for Linux.

As a consequence of the many recent advances in video and audio encoding, the MPEG-2 format now is used for digital video broadcasting (DVB) transmission and DVD storage and is supported by a wide range of hardware devices. MPEG-2 movie files typically range in size from 3–6GB, sizes that are suitable for DVDs but not for CD-Rs. Similarly, high-quality MPEG-2 videos are suitable for DVB-S or DVB-T networks, but not for IEEE 802.11b or domestic HomePlug transmission. To solve these kinds of problems, improved encoding techniques have been developed, and as a result, MPEG-4 has been standardized. The MPEG-4 format can reduce movie sizes down to 700MB or so and maintain reasonably good quality.

Because much multimedia content is available as DVD MPEG-2 files, it is necessary to transcode them to obtain the MPEG-4 equivalents. In this article, we propose a Linux framework based on the Condor metacomputing platform to achieve high-throughput DVD transcoding. Although some LAN parallel transcoding tools for fixed sets of machines exist, we are not aware of any metacomputing system for parallel transcoding. Metacomputing refers to architectures that hide physical resources and instead offer a simplified virtual machine view. For example, the Condor tool we use “steals” cycles of available machines when neither users nor high-priority processes are using them.

Background

The DVD movie market has boomed thanks to the availability of cheap DVD players, the robustness of DVD as a storage media as compared to VHS cassettes and so on. The DVD recording media market, however, is incipient. Because CD-R technology has been around for a while and CD-R disks are much cheaper than DVD disks, domestic users have found ways to store DVD movies on CDs with similar subjective qualities. This kind of storage is possible due to the last generation of video and audio codecs. They are based on the MPEG-4 standard and offer high compression ratios. Transcoding a DVD to make its contents fit in a CD, however, still is expensive computationally for many desktop PCs.

Parallelization is a promising solution to accelerate DVD transcoding. The most obvious approach is manual parallelization, dividing input files in chunks manually, transcoding the chunks in different machines and joining the result in a single file. Manual parallelization may be adequate for users who wish to keep track of the whole process. However, it may be advantageous to use metacomputing to implement high-throughput, submit-and-forget DVD transcoding.

Parallelizing a process requires breaking it into elementary tasks, scheduling those tasks and collecting their results. Consequently, a resource management tool is necessary. Tools such as Condor and Globus provide basic metacomputing and parallelization software. In our case, we have chosen Condor because it does not add extra complexity, it is easy to install and configure and it works properly on Linux. Finally, Condor does not require a dedicated cluster.

Condor is a specialized workload management system for computation-intense jobs. Like other full-featured batch systems, Condor provides a job queuing mechanism, a scheduling policy, a priority scheme, resource monitoring and resource management. Users submit their serial or parallel jobs to Condor, and Condor places them into a queue, chooses when and where to run the jobs based on a policy, monitors their progress and ultimately informs the user of a job's completion.

While providing functionality similar to that of any traditional batch queuing system, Condor's architecture allows it to succeed in areas where traditional scheduling systems fail. Unique mechanisms enable Condor to harness wasted CPU power from otherwise idle desktop computers. For instance, Condor can be configured to use desktop machines only when the keyboard and mouse are idle. Should Condor detect that a machine is no longer available (say, a key press is detected), it is able to produce a transparent checkpoint and migrate a job to a different machine that would otherwise be idle. Condor also is able to redirect transparently all the job's I/O requests back to the submitting machine. As a result, Condor can be used to combine seamlessly all the computational power in a community.

The apparent lack of commercial metacomputing transcoding systems may exist because metacomputing mostly has been linked with the UNIX scientific community. On the other hand, entertainment software designers still give maximum priority to the metacomputing-unfriendly Microsoft Windows world. For example, the most recent version of the DivX codec—v.5.0.5 at the time this article was written—is a key tool for Linux transcoding development, but it did not work properly on Pentium 4 Linux boxes. The previous release was v.5.0.1alpha, an unstable version that had been released the previous year. This example provides an idea of the problems one may encounter when trying to port entertainment applications to metacomputing-friendly Linux platforms.

Although diverse transcoding applications are available, we outline the three that we found most interesting:

  • FlaskMpeg: one of the first transcoding applications to appear. Currently, it is one of the most popular in the Windows world. It does not support parallelization.

  • Mencoder: one of the top Linux applications for DVD transcoding. Its efficiency (output-to-input size ratio) in general, is slightly worse than FlaskMpeg's. As in the previous case, it does not support parallelization.

  • Dvd::rip: a high-level Linux transcoder based on another program, Transcode. Its results are comparable to those of Mencoder. Dvd::rip does support parallelization, but it is difficult to configure. Parallelization requires manual configuration of all computers involved in the transcoding process. This configuration is static, and it does not react to environmental changes (a major difference for a Condor-oriented system like ours). Dvd::rip does not admit audio streams. The audio stream must be processed sequentially due to technical problems Dvd::rip points out but does not solve; see Dvd::rip's Web page. This is a minor problem, though, because transcoding time is dominated by video transcoding, regardless of whether the audio transcoding strategy is employed in parallel or sequentially.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Source code for this experiment

Anonymous's picture

This experiment sounds interesting, any possibility of sharing the code that pulls it all together?

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix