DVD Transcoding with Linux Metacomputing
As a consequence of the many recent advances in video and audio encoding, the MPEG-2 format now is used for digital video broadcasting (DVB) transmission and DVD storage and is supported by a wide range of hardware devices. MPEG-2 movie files typically range in size from 3–6GB, sizes that are suitable for DVDs but not for CD-Rs. Similarly, high-quality MPEG-2 videos are suitable for DVB-S or DVB-T networks, but not for IEEE 802.11b or domestic HomePlug transmission. To solve these kinds of problems, improved encoding techniques have been developed, and as a result, MPEG-4 has been standardized. The MPEG-4 format can reduce movie sizes down to 700MB or so and maintain reasonably good quality.
Because much multimedia content is available as DVD MPEG-2 files, it is necessary to transcode them to obtain the MPEG-4 equivalents. In this article, we propose a Linux framework based on the Condor metacomputing platform to achieve high-throughput DVD transcoding. Although some LAN parallel transcoding tools for fixed sets of machines exist, we are not aware of any metacomputing system for parallel transcoding. Metacomputing refers to architectures that hide physical resources and instead offer a simplified virtual machine view. For example, the Condor tool we use “steals” cycles of available machines when neither users nor high-priority processes are using them.
The DVD movie market has boomed thanks to the availability of cheap DVD players, the robustness of DVD as a storage media as compared to VHS cassettes and so on. The DVD recording media market, however, is incipient. Because CD-R technology has been around for a while and CD-R disks are much cheaper than DVD disks, domestic users have found ways to store DVD movies on CDs with similar subjective qualities. This kind of storage is possible due to the last generation of video and audio codecs. They are based on the MPEG-4 standard and offer high compression ratios. Transcoding a DVD to make its contents fit in a CD, however, still is expensive computationally for many desktop PCs.
Parallelization is a promising solution to accelerate DVD transcoding. The most obvious approach is manual parallelization, dividing input files in chunks manually, transcoding the chunks in different machines and joining the result in a single file. Manual parallelization may be adequate for users who wish to keep track of the whole process. However, it may be advantageous to use metacomputing to implement high-throughput, submit-and-forget DVD transcoding.
Parallelizing a process requires breaking it into elementary tasks, scheduling those tasks and collecting their results. Consequently, a resource management tool is necessary. Tools such as Condor and Globus provide basic metacomputing and parallelization software. In our case, we have chosen Condor because it does not add extra complexity, it is easy to install and configure and it works properly on Linux. Finally, Condor does not require a dedicated cluster.
Condor is a specialized workload management system for computation-intense jobs. Like other full-featured batch systems, Condor provides a job queuing mechanism, a scheduling policy, a priority scheme, resource monitoring and resource management. Users submit their serial or parallel jobs to Condor, and Condor places them into a queue, chooses when and where to run the jobs based on a policy, monitors their progress and ultimately informs the user of a job's completion.
While providing functionality similar to that of any traditional batch queuing system, Condor's architecture allows it to succeed in areas where traditional scheduling systems fail. Unique mechanisms enable Condor to harness wasted CPU power from otherwise idle desktop computers. For instance, Condor can be configured to use desktop machines only when the keyboard and mouse are idle. Should Condor detect that a machine is no longer available (say, a key press is detected), it is able to produce a transparent checkpoint and migrate a job to a different machine that would otherwise be idle. Condor also is able to redirect transparently all the job's I/O requests back to the submitting machine. As a result, Condor can be used to combine seamlessly all the computational power in a community.
The apparent lack of commercial metacomputing transcoding systems may exist because metacomputing mostly has been linked with the UNIX scientific community. On the other hand, entertainment software designers still give maximum priority to the metacomputing-unfriendly Microsoft Windows world. For example, the most recent version of the DivX codec—v.5.0.5 at the time this article was written—is a key tool for Linux transcoding development, but it did not work properly on Pentium 4 Linux boxes. The previous release was v.5.0.1alpha, an unstable version that had been released the previous year. This example provides an idea of the problems one may encounter when trying to port entertainment applications to metacomputing-friendly Linux platforms.
Although diverse transcoding applications are available, we outline the three that we found most interesting:
FlaskMpeg: one of the first transcoding applications to appear. Currently, it is one of the most popular in the Windows world. It does not support parallelization.
Mencoder: one of the top Linux applications for DVD transcoding. Its efficiency (output-to-input size ratio) in general, is slightly worse than FlaskMpeg's. As in the previous case, it does not support parallelization.
Dvd::rip: a high-level Linux transcoder based on another program, Transcode. Its results are comparable to those of Mencoder. Dvd::rip does support parallelization, but it is difficult to configure. Parallelization requires manual configuration of all computers involved in the transcoding process. This configuration is static, and it does not react to environmental changes (a major difference for a Condor-oriented system like ours). Dvd::rip does not admit audio streams. The audio stream must be processed sequentially due to technical problems Dvd::rip points out but does not solve; see Dvd::rip's Web page. This is a minor problem, though, because transcoding time is dominated by video transcoding, regardless of whether the audio transcoding strategy is employed in parallel or sequentially.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- Tech Tip: Really Simple HTTP Server with Python
- SUSE LLC's SUSE Manager
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- My +1 Sword of Productivity
- Managing Linux Using Puppet
- Non-Linux FOSS: Caffeine!
- Returning Values from Bash Functions
- Doing for User Space What We Did for Kernel Space
- Rogue Wave Software's Zend Server
- Parsing an RSS News Feed with a Bash Script
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide