Streaming Media | Linux Journal

Audio/Video

by Frank LaMonica

on January 1, 2001

Modern technology is not only growing fast, but the growth literally is exploding. There are so many areas of specialized technology that it has become impossible for one person to fully comprehend the intricacies of the subsystems that combine to produce the latest buzzword. Just as we now use acronyms inside acronyms to succinctly describe the latest innovation, we encapsulate hundreds of human years of work into a single abstract expression, and then piece the abstractions together to create larger, more complex systems, which themselves soon become just another piece of a larger puzzle.

A logical question to ask is “does anyone need to understand systems at that level of detail?” Can't we just piece together all the black boxes knowing only their input/output (I/O) specifications and achieve a working solution? The answer is clearly no. We don't need to understand the most minute detail of every technology we use, but if we are going to make the best use of our technology, we must look deeper than just the visible specifications of each subsystem.

Multimedia content delivery is a good example of a task that requires this type of detailed understanding. Multimedia is simply a combination of two or more types of communications media, such as audio, video, graphics, text or any other element that can stimulate the human perceptual senses. This article looks at streaming media—the process of using computer-based technology for time-dependent delivery of multimedia data as opposed to time-independent data delivery—to show how a fairly deep understanding of many complex technologies must be achieved in order to make these subsystems function together to achieve the desired result. The steps outlined in the streaming process can be presented linearly and, as such, might appear to be independent black boxes. However, the ideal implementation of each step requires knowledge and a consideration of all of the steps in the process.

Overview of a Streaming Media System

The streaming media process is simply a way to communicate data. This communication can originate from many sources, and it can be targeted to many destinations. The target of these communications is called the receiver. Each receiver can also be the source of data for other “downstream” receivers. The four permutations of source/destination streaming media flow are: One to One, One to Many, Many to One and Many to Many. Each of these four combinations requires a set of common streaming media services, colored by the specific requirements of the source and destination mix. Protocols have been developed by the Internet Engineering Task Force (IETF) to enable all aspects of streaming media, and the industry is only beginning to fully implement these designs.

On the hardware side, a system should be designed with growth and scalability in mind. A relatively small up-front investment into higher performing components could provide large savings later on. It is also very easy to waste money upgrading components that do not affect the areas in the systems where performance bottlenecks actually exist. We will follow the logical path from data creation to transmission to the receiver and discuss the details as we go.

Data Creation

There is an entire industry devoted to the task of creating multimedia data. Historically, expensive, highly proprietary workstations create most high-end, “Hollywood grade” multimedia. The PC revolution produced lower-cost alternatives for many of the tasks that previously required expensive solutions. These solutions were initially on Macintosh computers and have migrated to Windows-based systems, but they were still proprietary systems that locked the user into a dependency on the vendor.

There has recently been a move to Linux. Besides the “free beer” aspect of Linux, end users see a real advantage with Linux because they can free themselves from a dependency on any single vendor. Time-critical content creation is the hallmark of Hollywood production, so companies cannot depend on even the most propitious vendor. In order to protect their own businesses, they must be in a position to do the whole job themselves.

The Visual Effects Society is an industry group with about 24 member companies, each of which base their business on performing the various tasks required to create multimedia data. That society has announced a desire to move entirely to Linux within the next few years, but there is much work that needs to be done on the Linux OS in order to make that possible. Fortunately, the migration of streaming technology to the Linux OS is well underway.

If you want to provide streaming media services, the entire process depends on the data that you intend to stream. Will you stream live video? DVD or CD-ROM content? Will the data include computer-generated graphics images or some composition of several media types? How will the data be mixed and manipulated in order to produce the final data that you intend to stream to your clients? These questions must first be answered at a strategic level—what is your planned business? Then the answers must be researched based on the equipment, skill sets, time and financial resources available. Those parameters must be considered in terms not only of the data creation itself, but also in terms of the remaining services that you must provide for the entire solution. If, for example, your business is to stream recorded content to paying clients, you had better understand the requirements of your target audience. Will they pay per view, or will they expect “free” content? What is the mix of client technology that you have to support with your streams? These, and many other questions, which we touch on as we explore each step in the process, affect your decision on the software and hardware you need in your system.

Data Encoding

What is data encoding and why is this necessary? The most important process of encoding for computer delivery requires that the data be put into digital form. The real world is based on continuous or analog data. Most computers in use today are based on digital technology. Everything in the system must be represented by some combination of 1s and 0s because all any computer can do is turn individual circuits on (1) or off (0).

Two other factors exist that require encoding to be more than just a simple conversion of analog to digital data. Because typical data is owned and current majority opinion is that it must be protected from illegal copying and distribution, the digitized data stream is most often encrypted during or after the encoding process. Technological limitations on the speed of data flow over various communications links also require that the data be reduced in size, or compressed, so that existing hardware can smoothly execute the streaming process in real time.

The method you use to encode your content depends on encoding software and hardware, CPU processing capabilities and speed, network technology, and the data storage and retrieval requirements. If you intend to stream the data to an audience under your direct control, your questions will focus more on the technology required. If your business is to stream arbitrary data to arbitrary clients, in addition to the technology, you have to consider standards, de facto standards and specific customer requirements.

The term codec refers to the coder/decoder pair that is used to encode and decode your content. Most codecs are proprietary, and because of the huge volume of data that has been encoded in these hidden formats, it is likely to be some time before the industry completely moves to open standards, if it ever does. The most popular proprietary codecs used today come from RealNetworks, Microsoft and Apple.

However there is a strong movement toward the Moving Picture Experts Group (MPEG) formats. MPEG formats have the advantage of being recognized as international standards, and they offer the highest compression ratios with the smallest loss of data quality. There are two versions in common use, MPEG-1 and MPEG-2, but MPEG-4, a new format designed specifically for interactive multimedia applications, is likely to become the standard codec until MPEG-7 is released (possibly in 2001). The most popular flavor, MPEG-2, contains patented IP that can be used by executing a single MPEG-2 patent portfolio license (in the US, http://www.mpegla.com/).

Although the licensing is not free, the encoding and decoding algorithms are open, and open-source implementations can be created. MPEG LA announced in September of this year, a plan “aimed at providing fair, reasonable, nondiscriminatory worldwide access under one license to patents that are essential for implementing the international MPEG-4 Systems standard”. Use of open standards, such as MPEG, solve at least one main concern of companies encoding their multimedia technology. They will not be forced to return to any specific vendor for service in the future. There are so many patents that cover MPEG technology that it will take a significant engineering breakthrough to produce a competitive freely licensable alternative.

Do you need to understand the technology behind each codec thoroughly before making a decision on which one(s) to use? I would say that a complete understanding is not necessary, but certainly you should understand to the degree necessary to estimate resulting file sizes and to take into account any encoding and decoding overhead. You should also understand the needs of your clients before you decide on the codecs you will use. Proprietary codecs often have no client support for any platform other than the initial encoding platform. It is also very important to consider any technology licensing issues, and be especially conscious of how you will deal with financial liabilities that you may accumulate because of the use of specific technologies.

Data Storage, Retrieval and Transmission

Once you have required hardware and software to create encoded content, consider how you will store your content for transmission at a rate that satisfies the demands of your clients. If your data comes from a real-time, direct-media feed, such as from a video camera, current encoding hardware usually allows only one live feed per encoding card, and your storage requirements are simply the system RAM needed to hold encoded data before it passes to the Network Interface Card (NIC). For One to One, or One to Many data flow, this type of content delivery is straightforward. Because each live feed you supply to your clients requires an individual encoding card, and usually a dedicated computer as the server for that stream, use simple addition to figure out the increased costs.

If your data is preprocessed and streamed either at some later scheduled streaming time, or video on demand (VOD) by the client, then the hardware and software you use for these functions could become a critical bottleneck to the data flow through your system. Here is an example to demonstate the type of information you will need in order to estimate your system resources.

Suppose your business expects to supply a maximum of 10,000 clients each with continuous stream for one hour, but with a peak time of two hours per day where 80% of the clients could be expected to be on-line. Suppose that the average client demands 300 KBps streams to display flicker free, uninterrupted multimedia content. Suppose also that on average there are 100 different content files being accessed at any time from a pool of 5,000 titles.

Looking at the peak demand sets the upper-limit requirements of our data storage and retrieval subsystem. We decide not to offer VOD to our clients, but allow them only to tune in to some scheduled broadcast. That strategic decision allows us to weigh the number of different pieces of simultaneous content much more heavily than the maximum number of clients. Assuming that your network router can handle multicasting, i.e., sending out the same information to as many IP addresses as may register to receive the data (multicasting is just beginning to become available through some local ISPs), we have enough information to estimate the storage and retrieval subsystem requirements. Our total storage requirement is the product of the number of titles, the average run length of each file and the bit rate at which the data will be streamed. In this case, it is 5,000 files x 3,600 seconds run length per file x 300,000 bps/8 bits per byte of data = 675GBs of storage.

How fast must our data travel from the disk drive to the NIC in order to keep up with the expected client demand? To calculate this answer, we compute the product of the number of different files being read simultaniously from the disks and the average data rate: 100 x 300,000 / 8 = 3.75MBs per second. Had we decided to offer VOD, that number could jump by a factor of 8,000, the peak number of users who would be asking to start viewing the stream at totally random times, to give us a requirement of 30GBs per second of data needed to be read from our disk drives! We would most likely want to reduce that bandwidth requirement by some intelligent management method. For example, we could choose to allow a new video to start only at the start of each minute. That would be fairly transparent to the end user but would help by allowing us to take advantage of multicast capabilities in our system. We now have to consider the size and number of disk drives, the maximum average data-transfer rate from the drives, and the maximum data-handling capacity of the PCI bus where the data must pass twice before it gets streamed (once from the disk drive to the host, then across the bus again to the NIC).

We cannot find one disk drive that stores that much data, so we have to come up with a scheme to divide the storage across at least several drives. We also know that our clients will be really upset if their show dies midstream, so we have to create for some type of backup plan to account for possible disk failures. Solutions to these problems are often handled with RAID systems and with sophisticated load-balancing software.

If a RAID 0 solution is used, where two identical drives contain mirror images of the same data, how do we access the data in the most efficient manner? Each disk controller typically contains proprietary control software that attempts to optimize the data flow from multiple disk drives. This is a very complex problem, and there is no universal “best” solution. If only two files are being accessed, we can read one file from disk one while we seek the correct track to read the next file from disk two. Now we add a request for file three. Is it more efficient to get it from disk one or from disk two? Maybe it would be best to interweave sectors from each of the two disks, but remember that the slowest operation on a disk drive is the track seek time.

We can help tune our system by some intelligent management of file placement. We might load popular files on all of the disks in our system, but only load the less actively viewed content on two different disk drives. That approach assists in load balancing and reduces the total storage required for necessary redundancy.

The more we know about how our disks and disk controllers operate, the better we can tune our system for optimum performance. Here again, we do not need to know exactly how our storage and retrieval subsystems works, but the more we know, the better off we are.

The final link between your system and the Internet is provided by your network cards. Try to engineer your system so that multimedia content has the shortest possible path to the Internet. Be conscious of the total bandwidth limitations in your network and in each NIC, and remember that adding multiple NICs to a single computer may not significantly improve your throughput due to bus speed limitations.

Theoretical maximum throughputs listed in specifications are rarely achieved in the real world. When trying to estimate the amount of hardware you need in your system, good engineers who know how to find and benchmark bottlenecks are worth their weight in gold. Before you purchase a huge system, build a smaller prototype and spend a little time finding out where the bottlenecks lie. The better you profile your system, the more effective you will be in tuning it to handle your cost requirements.

Data Streaming

Although the hardware you choose for streaming media is similar to that which is required for any LAN or WAN data transmission, the software determines how efficiently that hardware is used, and, ultimately, what hardware you should purchase. Apple provides an open-source version of their streaming server called the DARWIN Streaming Server. It requires the execution of the Apple Public Source License (see Resources). The DARWIN Streaming Server can stream “QuickTime Hinted” files, which are proprietary to Apple, but because the server is open source, it is likely that it can be modified to support other file formats.

QuickTime is not currently supported on the client side under Linux. Lucent Technologies recently announced the free availability of its OptiMedia MediaServe streaming media server application for Linux (see Resources). They claim to use the industry-standard Real-Time Streaming Protocol (RTSP) and to support a variety of file formats that should make the server useful across many client platforms.

All the other streaming servers I have reviewed are closed source. Some, like Entera and RealNetworks, run on Linux (see Resources). Entera's TeraCAST and TeraEDGE streaming servers use open standard RTP/RTCP streaming protocols in conjunction with RTSP. RealNetworks has it own proprietary RDP protocol that they use with RTSP to communicate to RealPlayer clients. In contrast, Microsoft's proprietary technology, has not (yet?) been ported to Linux.

Quality of Service (QoS)

Let's review some basic streaming technologies that are usually mentioned glibly as acronyms. What are they and how do they interact with each other? The ultimate goal is to introduce you to QoS, the Internet term that refers to the performance, reliability and adherence to any real-time requirements and any other aspect of Internet service that could impact communication between the source and destination. Many of the commercial streaming-media products differentiate themselves in the market by their QoS capabilities. Be careful. You will be confronted with many decisions on hardware and software purchases based on the level of QoS you want to provide to your expected audience, but unless you control the complete end-to-end path from source to destination, you will always be subject to QoS provided by the poorest performing link in the path. All you can do is make sure that your piece of the path is the best it can be and that your server software makes optimum use of existing QoS protocols.

IP (Internet Protocol) is the most basic protocol used over the Internet. According to the DOD STANDARD INTERNET PROTOCOL, 1990, IP provides “...the functions necessary to deliver a package of bits (an Internet datagram) from a source to a destination over an interconnected system of networks”. All the other streaming protocols and mechanisms sit on top of IP.
UDP (Unreliable Datagram Protocol) is the usual choice of packet types to send over IP if it is not critical that the data arrive at the destination. This is a low overhead protocol that is especially useful for RTP.
RTP (Real-Time Protocol) is a real-time transport protocol that sits on top of UDP because it cares much more about “letting the show go on” than sitting around waiting for perfect data. RTP was originally designed to support multicast-multimedia conferencing, but it can be used for other applications.
TCP (Transmission Control Protocol) on the other hand, has been engineered to provide robust, error-free data transmission. It operates at the same level as RTP but is used in most time-insensitive Internet communications or connections that require high reliability of data.
RTSP is an application-level protocol that uses the lower-level elements of RTP to manage multiple streams that must be combined to create a multimedia session. These streams can originate from many sources, even geographically detached locations.
RTCP (Real-Time Control Protocol) packets are carried over TCP connections to work in conjunction with RTP packets to monitor QoS and present feedback to the participants in an RTP session so they can take any actions possible to improve the QoS.
RSVP (Resource ReSerVation Protocol) is closely tied to QoS. RSVP has been designed to work as a separate process that allows an application to request various QoS levels. Nodes along the route to the requested data analyze the RSVP requests, decide if the requester has the rights to that QoS level and that the requested QoS level is available, and either report back that they will provide the requested QoS or report an error.

You do not need to completely understand any of these protocols, but you should know that RSVP is an extremely rich protocol that gives server vendors a lot of room for QoS improvement. Before you choose your streaming-media server software, it would be useful to ask how their product handles QoS issues, especially via the RSVP protocol.

Decoding and Viewing

These functions are performed on the client computer, but I will mention them briefly here in respect to how they could affect server-side systems. Because of the nature of standard Internet protocols, it should not be necessary to consider the client platform when deciding on the server system. Unfortunately, the power of an illegal monopoly can ignore open standards and force its own proprietary technology on the market. As the open-source movement grows, this type of control will be more difficult to sustain. If you plan to stream closed-proprietary data formats, then your choices on the server side are severely limited. You might want to weigh the future value of promoting a process that restricts competition before you make business decisions that force you down that road.

Conclusion

The need for streaming multimedia content over a LAN or WAN has created a huge market for every conceivable hardware or software niche that can be tailored to the specific needs of the multimedia business. A good resource for information on this industry comes from Streaming Media, Inc., who bill themselves as the “home of the streaming media industry”. A few weeks of exploration on industry offerings could save you countless hours and dollars down the road.

If you plan to use Linux, look carefully at the vendors to determine their stance on open source, open standards, licensing and their qualifications to offer the depth of support necessary to guide you through the chaotic hawking that you are about to experience from these vendors. Be wary of “multimedia appliances” that prepackage everything you need. Unless the package contains a full solution for all aspects of this problem, you could be setting yourself up to be spending a lot of money so, to use an old army expression, your data can “hurry up and wait” for the next link to your client.

Resources

Frank LaMonica holds a BSEE from the University of Texas in Austin and has been working in the computer graphics industry for over 18 years. His previous experience includes work with most computer operating environments, including UNIX, Linux and Windows/NT. Frank was CEO of Precision Insight (PI), the company that developed the 3-D Direct Rendering Interface (DRI), which is now part of XFree86 4.0 and has been instrumental in bringing open-source accelerated 3-D graphics to Linux. Frank now serves as the strategic director of MultiMedia for VA Linux since PI was acquired by that company in April of 2000. Besides promoting open-source multimedia, Frank continues to pursue his moonlighting career as a concert classical guitarist.

Load Disqus comments