Experimenting with New Methods in Voice over IP
Being able to integrate voice and data is a strong desire of business as well as of residential users. This type of integration is inefficient, however, on the telephone legacy of channelled voice slots having time division multiplexing (TDM) with expensive associated equipment. A data network like the Internet, with its statistical TDM, uses precious bandwidth much more efficiently. Moreover, the universal presence of IP both in wide area and local area networks makes it a convenient platform to launch VoIP traffic. In this article we share our experience of testing VoIP in our Linux lab. We discuss the entire VoIP environment, including the soft switch and the telephony hardware needed to improve voice quality.
There is much more to IP telephony than simply being able to talk over Internet. To be successful, an IP telephony system must provide all the facilities that are provided by modern public switched telephone networks (PSTN). The quality of service (QoS) for an IP telephony system also should be comparable to that of PSTN. For the deployment of IP telephony, the infrastructure--that is, an IP network--is already available in many cases. It is required to implement protocols necessary for signaling, media transfer, QoS, as well as various features, such as voice mail, billing and son. Unlike PSTN, IP telephony has computers at their endpoints, which are extremely powerful. This allows much of the power and intelligence of the system to be concentrated at endpoints.
We have performed some fundamental experiments on VoIP in the Computer Communications Lab at the University of Engineering & Technology (UET) in Lahore, Pakistan. Our emphasis is on the applications and performance considerations for IP telephony. We are using Quicknet's voice telephony cards for our telephony clients. We have based our VoIP network on VOCAL, which is a soft switch from Vovida Networks. VOCAL enables advanced telephony features, applications and services to be effectively deployed on converged datacom-telecom networks. In addition, we are using ohphone, which is an H.323 client, and a PSTN gateway, both of which are from the OpenH323 project. All these applications run on Linux. Below we discuss the benefits that Linux provides when is used as a VoIP platform.
VoIP protocols are mainly divided into signaling and media protocols. Signaling means call setup, teardown and control. As the number of features and services grow, signaling becomes equally challenging. Unfortunately, a single signaling technology has not yet been defined. But Session Initiation Protocol (SIP), H.323 and Media Gateway Control Protocol (MGCP) are the front-runners. All these protocols are undergoing rapid developments in their capabilities and features. Currently, users of different signaling techniques cannot generally make calls to each other. However, a protocol translation between different signaling protocols can be used in this situation. We discuss a SIP-H323 translator below.
Regardless of the signaling protocol used to setup calls, there appears to be no controversy over the use of the Real Time Protocol (RTP) for media transport. The Real Time Control Protocol (RTCP) is an associated protocol, which provides end-to-end monitoring of data delivery and quality of service. They are independent of underlying transport and network layers. Most commonly RTP and RTCP are used on top of UDP.
SIP is an IETF standard and is described in RFC-2543. Although H.323 is in widespread use in the VoIP market, it is not necessarily the best available protocol. Its competitor, SIP, has a number of advantages that makes it a superior choice for VoIP applications. The first is advantage is SIP focuses exclusively on telephony issues, originally designed to deal with telephony as opposed to other protocols. H.323 and MGCP both support telephony functions in a manner similar to that of phone companies on their circuit switched networks, which is not the most efficient method on a packet-switched network. SIP's distributed architecture handles load surges and service interruptions efficiently in the distributed model of IP communications.
The VOCAL system is an SIP-based distributed network of servers that run on Linux. Let's see how VOCAL implements various SIP servers and associated protocols for policy, billing and management. The complete documentation on these servers can be found at www.vovida.org.
The Marshal Server (MS) is an implementation of the SIP proxy server that acts as the initial point of contact for all SIP signals that enter the VOCAL system. The MS provides authentication, forwarding and billing functions.
The Redirect Server (RS) is a combined implementation of the SIP redirect, registration and location servers. The RS stores contact and feature data for all registered subscribers, in addition to a dialing plan to enable routing for off-network calls.
The Feature Servers are another implementation of the SIP proxy server. These servers are scripted in call processing language (CPL) and provide basic system features such as call forwarding and call blocking.
The Provisioning Server (PS) stores data records about each system user and server module and distributes this information throughout the system via a subscribe-notify model. The PS provides a web-enabled graphical user interface (GUI) to permit technicians and system administrators to manage the system.
The Network Manager provides the administrator with the ability to monitor the system through the Simple Network Management Protocol (SNMP) messages. Also it allows the facility to start any server within the network.
The Policy Server is designed to use Common Open Policy Service (COPS) to provide QoS bandwidth reservation for calls or call segments transmitted over the Internet. The Policy Server is also capable of using Open Settlement Protocol (OSP) to interact with clearinghouses for reserving bandwidth and authorizing the use of a network for internetwork calls.
The Heartbeat Server monitors the flow of pulsing signals emitted by the other servers, and it provides information about to the flow of heartbeats to the SNMP GUI. This information helps the system administrator know if the server modules are up or down.
Linux is the first modern operating system with a defined kernel layer API for telephony support. Excellent open telephony software is already using this API. Linux has a number of other advantages that make it a suitable platform for mission-critical applications. Linux's high availability combined with low or nonexistent licensing cost of open source are the key business reasons to seriously consider the Linux operating system in VoIP context. Also the culture of openness and rapid development that surrounds Linux and open-source solutions are additional reasons for choosing Linux for such an application
Quicknet's Internet PhoneJack and Internet LineJack are hardware cards specifically designed for IP telephony. They implement many modern speech-coding techniques called codecs on their boards DSPs. Both of these cards have a FXS port to which we can connect a normal telephone set. LineJack also acts as a single-port IP-PSTN gateway. It has a FXO port to which a PSTN line can be connected. Regarding performance and convenience, these cards provide following benefits:
Full duplex: this simply means you can talk and hear at the same time. This is often a limitation in sound cards that requires both hardware and software support.
Ability to use normal phones: not all people feel convenient having computers to dial up their telephone calls. Most people are accustomed to ordinary telephones and are uncomfortable using a computer to make phone calls. The use of their normal telephone sets can be comforting.
Hardware-based compression: compression is highly desirable in VoIP networks, because it saves bandwidth and reduces end-to-end delay. It is done by the voice coders, which act as the engines for the creation and processing of VoIP packets. If we implement the coders as linear, software-only processes, they would break down if we try to use them on processors that do not have (a) a floating point processor or (b) sufficient processor speed to handle the coders on the CPU. Quicknet cards have used on-board DSPs that perform G.711, G.723.1, G.728, G.729a and TrueSpeech audio compression in hardware. This provides much better performance for voice. Also the CPU is off-loaded, so you can run other applications in addition to IP phone at the same time.
Echo cancellation: echoes on phone lines are caused by (1): predominantly the 2-wire to 4-wire hybrid at the remote end or (2) the acoustic echo at the remote end. The echo is caused by the leakage of your voice back towards you at the remote end. The echo becomes noticeable when there is more than about a 30-50 msec delay in the round trip transmission path--which is why we need the echo cancellation device. Quicknet cards provide echo cancellation circuit on their onboard DSPs. An echo cancellation circuit basically takes the signal on the PLAY path and subtracts it from the signal in the RECORD path.
Sound card vs. Quicknet card: sounds cards are not telephony devices and cannot generate the dial tone, rings, flash hook and caller id services that are provided by the Quicknet cards. Also, to be compatible with other VoIP applications and equipment, we must support the commonly accepted compression codecs, and we have to license them. Quicknet has pre-licensed these codecs and provides them built into the hardware. So we can now use open-source VoIP software and still use the advanced codecs. Sound cards, on the other hand, do not support hardware-based audio compression codecs.
We have deployed two separate VOCAL systems to represent two different administrative domains. We have a Pentium III 750MHz system with 256MB of RAM that runs Red Hat 7.1. VOCAL System A is deployed on two machines, while System B is an all-in-one configuration, meaning all of the VOCAL servers are installed on one machine. Both systems have a user agent MS, a gateway MS, two RSes along with rest of the VOCAL servers. The choice of multiple servers scales the system to process more users and also adds reliability to the system. In other words, if one RS is down, MS automatically contacts the other one. We have also installed an H.323 client ohphone, from OpenH323 project. A script called vocalstart is used to start or stop a vocal system. The command
creates the processes for all VOCAL servers defined in the file vocal.conf in ~/vocal/etc directory.
This script can be stored in /etc/rc.d/init.d to start the VOCAL system at boot time. After the system is started, we can add users to it with a Java-based GUI once we've logged in as a system administrator. The administrator has the authority to provide various services to the users, such as caller id, call screening and call forwarding. Each of these features is provided through the corresponding feature server. Users also have the facility to locally enable or disable any feature provided to them by the administrator. As the users subscribe to the system, their contact information is stored in the RS, which then uses this information to redirect calls for the subscribed users.
For calls heading outside the system, we have to make dial plans. These are of two types--IP plan and digital plan. The digital dialing plans are set up to handle phone numbers. These can either be dual tone multi-frequency (DTMF) tones originating from an analog phone set and translated into a SIP message by a residential gateway, or they can be numbers entered from a SIP-based device, for example, 1-408-555-1212. The IP dialing plans are set up to handle user addresses formatted as aliases or e-mail addresses, for example, email@example.com. A Java-enabled GUI based on SNMP displays the status (active or inactive) of all the servers and is updated after regular intervals of time.
SIP call flows in VOCAL: VOCAL user agents (UAs) can make calls directly to each other without involving the VOCAL servers. This can be useful for testing the RTP media flow and the functioning of Quicknet cards. As there is no RS to find the location of the called user, we have to hard code its contact in the configuration file of the caller UA. UAs can then make calls to each other from their analog phones connected to internet phone jacks by using speed-dial numbers corresponding to SIP URLs.
Figure 5: Basic SIP Call Flow in VOCAL
The calls flowing through the VOCAL system involve several of its servers. Figure 6 shows the most basic SIP call flow between two UAs when they call through VOCAL. The caller UA INVITE sends a request to MS, which retrieves the contact information from RS after authentication and then forwards INVITE's message to the MS of the called UA. Finally the called UA accepts the INVITE with SIP code 200, which means "okay". The actual media path forms after an acknowledgement from the caller. Similarly, a sequence of SIP messages is used for the teardown of a call. Also, when a user has enabled features like call forwarding, call screening and voice mail, the call flow involves the corresponding SIP feature servers.
Calls through translators: as SIP, H.323 and MGCP are all widespread; use of protocol translators can be a useful tool for calls to be made between clients of different signaling techniques. We have tested siph323csgw, which is a SIP-centric H.323-to-SIP protocol converter present in VOCAL. It acts as a call-routed gatekeeper on the H.323 side and as a SIP user agent on the SIP side. For a simple test of H.323-to-SIP translation, we set the flow to be:
ohphone -> siph323csgw -> vocal UA
For this configuration, we have to set the client property in the siph323csgw.conf file and make the sipremote property contain the IP address/port of the VOCAL UA.
For a bare bones functional system, using VOCAL, we set the flow to be:
ohphone -> siph323csgw -> UA marshal -> vocal UA
For this configuration the sipremote property in the siph323csgw.conf file points to the UA marshal.
IP-PSTN calls with LineJack: we can use Internet LineJack to have an IP-PSTN gateway in our PC. It can be used to bypass long distance tolls on PSTN calls. As VOCAL UA does not yet support the PSTN interface of LineJack, we use gateway software called PSTNGW from th OpenH323 project. It works quite well with LineJack. On machine p01 we run PSTNGW with the command
<path>/pstngw -q /dev/phone1 --aec 4 --no-gatekeeper p02
where /dev/phoneN is for the Quicknet device, LineJack in this case. The --aec options sets the echo cancellation level for the Quicknet card. p01 can now accept calls from an H.323 client and redirect any incoming call to p02.
On p02 we run ohphone as
<path>/ohphone -q /dev/phone0 -n -p p01 -l
Now we can dial a number from an analog phone that we want called from gateway p01 and receive any incoming call from PSTN. The ohphone terminal also provides several runtime commands that can be used during the call. For example, pressing the A key increases echo cancellation to the next level for the Quicknet card.
Quality of service management: QoS is, in theory, an effort to manage transmission and error rates and to minimize latency, packet loss and jitter during internetwork calls. VOCAL does admission control based on resource availability. If resources cannot be allocated, VOCAL resorts to a "best effort only" delivery. The policy server administers admission control for QoS requests and provides the internetwork MS (policy client) with the information necessary to enforce the admitted QoS requests. The policy server outsource the authorization, authentication and accounting (AAA) requests to a third-party, called clearinghouse. Clearinghouse then acts as a trusted broker among a large number of network providers. The Policy server supports two protocols, COPS and OSP. It acts as a COPS server when it communicates with the network routers and as an OSP client when it exchanges authorization requests and usage reports with the clearinghouse server. Resource Reservation Protocol (RSVP) is a companion protocol of COPS. It allows paths on the Internet to be reserved so voice conversations can be transmitted with minimal delays. Figure 6 shows the simplified operation of QoS-enabled calls.
This article presents our experience with the emerging developments in VoIP in an open-source environment, with an effort to merge data and voice into a single network. Much more work is needed in the area of engineering new VoIP networks to handle the impact of network impairment on speech quality. The real problem arises in a public network, like the Internet, where there is no centralized authority and monitoring. Also, it is important to use tools that deliver an objective measure of speech quality. Then if possible, try to map packet delivery problems to exact impact on speech quality.
The deployment of open-source VoIP applications like VOCAL is quite a daunting task. Most of the applications are launched without thorough testing. The bugs are then discovered and removed with the help of community feedback. The best places to submit your queries and report errors are the mailing lists associated with these applications.
Below we mention some of the problems we had as a reference for beginners.
VOCAL utilities like provisioning and system status are GUIs made of Java applets. When we tried to run these utilities in Netscape 4.72, the browser hung. We deduced there was some problem in the Java Virtual Machine (JVM) used by Netscape. It came out that some JVMs (both from Sun and IBM) don`t work with the new floating stack feature of the i686 version of glibc. The failures are due to programming assumptions in the JVMs that are now invalid. So we had to force glibc to use the deprecated stack model by setting the following environment variable: LD_ASSUME_KERNEL=2.2.5.
Another problem we encountered with the system status GUI was it did not show any contents for system status. When we checked for VOCAL processes with ps -aux, we did not find a process called netMgnt, which is what gets servers' data from the Heartbeat server. In fact, due to some bugs, vocalstart was not able to create this process. When we created the process from command line, the GUI successfully showed the servers' status.
We initially had a hard time loading Quicknet card drivers on Red Hat 6.2. Even when we were able to compile the code, the module did not load successfully. We were trying the latest driver, 1.0.0 at that time, which worked well with 2.4 kernels but had problems with 2.2 kernel. Switching to an older driver, like version 0.3.4, loaded the module successfully.
This article reports work carried out towards the B.Sc. degree in Electrical Engineering at the University of Engineering and Technology, Lahore, Pakistan, under the supervision of Professor Shahid H. Bokhari. We are thankful to Prof. Bokhari for his guidance and permission to experiment in lab. We are also thankful to the VOCAL Development Team at www.vovida.org for their help in debugging various problems during our work. Diagrams for this article have been provided by Vovida.org.
"IP Telephony: The Integration of Robust VOIP Services" by Bill Douskalis, 2000.
"Voice Over IP" by Uyless Black, 1999.
"The Linux Telephony Kernel API" by Greg Herlein. Linux Journal, Feb 2001.
"Vovida Network's VoIP on Linux" by Jerry Ryan. The Applied Technologies Group, 2000. www.techguide.com
Nauman Zafar Butt, Atif Nadeem and Mansoor Javed Awan are final year undergraduate students in the Department of Electrical Engineering, University of Engineering and Technology, Lahore, Pakistan.