Linux as a Telephony Platform
In “Let Linux Speak” (LJ, January, 1997), I demonstrated some fun applications for the SPO256 text-to-speech board. Buried in that article was a brief discussion on the potential for using text-to-speech as a telephony resource and for using Linux as a telephony services platform.
Talking about “telephony services”, or computer telephony in general, can mean many things. The history of computer telephony, while an interesting subject, is not our primary focus. Instead, I will discuss the use of Linux as a platform for voice response and path switching, for PBX integration and switch call control, and for the extension of traditional voice applications onto the Internet.
Voice response includes many applications such as traditional voice mail and interactive voice response (IVR) systems utilized as automatic “dial and survey” machines. These applications are typically built around multi-channel voice telephony boards that capture and play back digitized speech and that generate and listen for DTMF digits and call progress tones. The more advanced of these boards offer on board DSP resources, emulate FAX/modem services and perform speech recognition. The largest single vendor of this kind of board is the Dialogic Corporation.
PBX integration involves direct computer control of a PBX switching system. Many vendors have specialized boards and/or serial interfaces which run proprietary protocols for gaining access to different switch features. Generally, PBX integration is implemented as a telephony server or API, such as Microsoft TAPI or TSAPI (currently supported only under Novell Netware). Usually, these APIs implement first party call control for desktop applications (such as putting a telephone image and dialer on a desktop, controlling a digital telephone directly as a “terminal device”) or as third party call control for server applications (such as ACD: automatic call distributors).
The whole area of Internet telephony is vastly interesting and intriguing. Most often, the first thing that comes to mind when one says “Internet telephony” are those nifty programs that allow computer users to place low-grade international telephone calls for free over the Internet. This same technology, when applied on a private corporate LAN with sufficient bandwidth, could provide a cheap means of inter-office switching (much like tie line services and expensive private T-1 networks) and a better solution for ACD agent positions.
Traditional low-cost telephony solutions have historically been implemented either under MS-DOS (with, perhaps, a custom real-time kernel) or under OS/2. The need for highly specialized real-time operating systems to drive multi-channel voice applications has disappeared as a result of increased CPU power, and equally important, the increased sophistication and power of add-on telephony boards. Many of these boards now manage I/O and call state largely on their own with only an occasional need for direct intervention. In the past guaranteed maximum interrupt latency was the mantra for evaluating real-time performance in complex voice processing systems; I now find support for real-time predictable scheduling policies more important.
While Windows NT is commonly touted as a “telephony operating system” these days, there are several serious considerations. The first is simply expense; a Windows NT machine means a machine with a video display. More often, telephony applications are deployed on dedicated stand-alone machines which sit in phone closets and, ideally, require only remote management.
Some of the same optimizations that make NT work better than Windows 95 as a desktop machine get in the way of using it for telephony applications, or for use as a telephony server and workstation concurrently. For example, one finds strange scheduling quirks that occur as NT optimized video drivers, which are now given the highest priority, update large areas of the screen.
Finally, even in today's world of cheap RAM, NT requires a minimum of 32MB, while Linux runs smoothly in 8MB or less. Even in the low-end commodity voice market, where MS-DOS-based voice mail systems predominate and a $50.00 change in margins can make or break a product, these costs are very important. Now, imagine a $2000 voice mail system, or, even better, a $1000 voice mail “machine”, with desktop integration, multi-site networking and voice/email exchange, and what that system would do to the bottom of the commercial voice mail market.
Why not OS/2? Well, first there is always the question of “will it be around?” Second, the last non-desktop optimized release of OS/2 was 1.3, and it is still the most commonly used and supported release of OS/2 in voice processing products today. OS/2 driver support still exists in the voice response OEM marketplace, but it is not a rising star.
Why not DOS? Simply put, one cannot easily run network services from a DOS machine. In tomorrow's world, voice mail will have to present voice messages on the desktop, whether through proprietary means or through a web server and standard e-mail protocols. Other advanced user applications and networking services will need to be leveraged onto these once dedicated stand-alone voice processing machines.
And what about Unix? For many years, some variants of Unix have been used successfully in voice processing, typically in vertical market applications. The complete failure of the major Unix vendors to understand the CTI market and create appropriate software licensing terms or stripped down embedded releases have kept the cost of using these systems prohibitively expensive as a general purpose CTI platform. For example, a Unix machine for voice processing may not need NFS, many user utilities or X Windows. However, it does need sockets and a web server for administration and desktop telephony. No major Unix vendor seems to know how to properly license such a stripped down, embedded configuration.
So what are we left with? An inexpensive operating system capable of using inexpensive hardware, of running a mix of user and real-time scheduled processes, of remote management without the need for a local console, of integrated networking and of giving months of reliable service unattended. Only Linux and Free-BSD fit these criteria.