DVD Transcoding with Linux Metacomputing
As a consequence of the many recent advances in video and audio encoding, the MPEG-2 format now is used for digital video broadcasting (DVB) transmission and DVD storage and is supported by a wide range of hardware devices. MPEG-2 movie files typically range in size from 3–6GB, sizes that are suitable for DVDs but not for CD-Rs. Similarly, high-quality MPEG-2 videos are suitable for DVB-S or DVB-T networks, but not for IEEE 802.11b or domestic HomePlug transmission. To solve these kinds of problems, improved encoding techniques have been developed, and as a result, MPEG-4 has been standardized. The MPEG-4 format can reduce movie sizes down to 700MB or so and maintain reasonably good quality.
Because much multimedia content is available as DVD MPEG-2 files, it is necessary to transcode them to obtain the MPEG-4 equivalents. In this article, we propose a Linux framework based on the Condor metacomputing platform to achieve high-throughput DVD transcoding. Although some LAN parallel transcoding tools for fixed sets of machines exist, we are not aware of any metacomputing system for parallel transcoding. Metacomputing refers to architectures that hide physical resources and instead offer a simplified virtual machine view. For example, the Condor tool we use “steals” cycles of available machines when neither users nor high-priority processes are using them.
The DVD movie market has boomed thanks to the availability of cheap DVD players, the robustness of DVD as a storage media as compared to VHS cassettes and so on. The DVD recording media market, however, is incipient. Because CD-R technology has been around for a while and CD-R disks are much cheaper than DVD disks, domestic users have found ways to store DVD movies on CDs with similar subjective qualities. This kind of storage is possible due to the last generation of video and audio codecs. They are based on the MPEG-4 standard and offer high compression ratios. Transcoding a DVD to make its contents fit in a CD, however, still is expensive computationally for many desktop PCs.
Parallelization is a promising solution to accelerate DVD transcoding. The most obvious approach is manual parallelization, dividing input files in chunks manually, transcoding the chunks in different machines and joining the result in a single file. Manual parallelization may be adequate for users who wish to keep track of the whole process. However, it may be advantageous to use metacomputing to implement high-throughput, submit-and-forget DVD transcoding.
Parallelizing a process requires breaking it into elementary tasks, scheduling those tasks and collecting their results. Consequently, a resource management tool is necessary. Tools such as Condor and Globus provide basic metacomputing and parallelization software. In our case, we have chosen Condor because it does not add extra complexity, it is easy to install and configure and it works properly on Linux. Finally, Condor does not require a dedicated cluster.
Condor is a specialized workload management system for computation-intense jobs. Like other full-featured batch systems, Condor provides a job queuing mechanism, a scheduling policy, a priority scheme, resource monitoring and resource management. Users submit their serial or parallel jobs to Condor, and Condor places them into a queue, chooses when and where to run the jobs based on a policy, monitors their progress and ultimately informs the user of a job's completion.
While providing functionality similar to that of any traditional batch queuing system, Condor's architecture allows it to succeed in areas where traditional scheduling systems fail. Unique mechanisms enable Condor to harness wasted CPU power from otherwise idle desktop computers. For instance, Condor can be configured to use desktop machines only when the keyboard and mouse are idle. Should Condor detect that a machine is no longer available (say, a key press is detected), it is able to produce a transparent checkpoint and migrate a job to a different machine that would otherwise be idle. Condor also is able to redirect transparently all the job's I/O requests back to the submitting machine. As a result, Condor can be used to combine seamlessly all the computational power in a community.
The apparent lack of commercial metacomputing transcoding systems may exist because metacomputing mostly has been linked with the UNIX scientific community. On the other hand, entertainment software designers still give maximum priority to the metacomputing-unfriendly Microsoft Windows world. For example, the most recent version of the DivX codec—v.5.0.5 at the time this article was written—is a key tool for Linux transcoding development, but it did not work properly on Pentium 4 Linux boxes. The previous release was v.5.0.1alpha, an unstable version that had been released the previous year. This example provides an idea of the problems one may encounter when trying to port entertainment applications to metacomputing-friendly Linux platforms.
Although diverse transcoding applications are available, we outline the three that we found most interesting:
FlaskMpeg: one of the first transcoding applications to appear. Currently, it is one of the most popular in the Windows world. It does not support parallelization.
Mencoder: one of the top Linux applications for DVD transcoding. Its efficiency (output-to-input size ratio) in general, is slightly worse than FlaskMpeg's. As in the previous case, it does not support parallelization.
Dvd::rip: a high-level Linux transcoder based on another program, Transcode. Its results are comparable to those of Mencoder. Dvd::rip does support parallelization, but it is difficult to configure. Parallelization requires manual configuration of all computers involved in the transcoding process. This configuration is static, and it does not react to environmental changes (a major difference for a Condor-oriented system like ours). Dvd::rip does not admit audio streams. The audio stream must be processed sequentially due to technical problems Dvd::rip points out but does not solve; see Dvd::rip's Web page. This is a minor problem, though, because transcoding time is dominated by video transcoding, regardless of whether the audio transcoding strategy is employed in parallel or sequentially.
|Speed Up Your Web Site with Varnish||Jun 19, 2013|
|Non-Linux FOSS: libnotify, OS X Style||Jun 18, 2013|
|Containers—Not Virtual Machines—Are the Future Cloud||Jun 17, 2013|
|Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer||Jun 12, 2013|
|Weechat, Irssi's Little Brother||Jun 11, 2013|
|One Tail Just Isn't Enough||Jun 07, 2013|
- Speed Up Your Web Site with Varnish
- Containers—Not Virtual Machines—Are the Future Cloud
- Linux Systems Administrator
- Non-Linux FOSS: libnotify, OS X Style
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- RSS Feeds
- Reply to comment | Linux Journal
3 hours 58 min ago
- Yeah, user namespaces are
5 hours 15 min ago
- Cari Uang
8 hours 46 min ago
- user namespaces
11 hours 39 min ago
12 hours 5 min ago
- One advantage with VMs
14 hours 34 min ago
- about info
15 hours 7 min ago
15 hours 8 min ago
15 hours 9 min ago
15 hours 11 min ago
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?