Echo and Soft VoIP PBX Systems
Echo cancellation is a hugely CPU-intensive process. A complete echo canceller for 92 simultaneous calls, or four PRI T1 lines, consumes on the order of one GIPS. The calculations involve mainly 8-bit operations, and in other ways are not optimum for the PC architecture or CPU cache. Thus, software echo cancellation is one of the major factors limiting the performance of soft PBX systems.
In an effort to improve overall system performance, software echo cancellers are usually highly optimized to reduce the PC load. One compromise made in the interest of saving CPU cycles is that the “learning” algorithms that update the FIR estimate are not run every time a voice sample is processed, but much less frequently. So the system trains slowly. You often hear quite considerable echo well into the conversation until the echo canceller trains and the echo decreases.
Another of the trade-offs is the absence of a nonlinear processor, which often is eliminated completely in soft echo cancellers. This is why there is usually some residual echo on systems such as Asterisk, even after training.
The goal under Asterisk was to provide software echo cancellation for a full quad E1 card (120 channels) with current PC technology and still be able to do other useful voice and data processing. This is indeed possible, but as discussed, the echo canceller trains slowly and after training there is still usually some remaining echo.
You can use the old-fashioned attenuation method to reduce residual echo. The transmit and receive gain settings in Asterisk (txgain and rxgain) can be set to negative values that reduce the sound volumes, but also produce acceptable final echo performance. One limitation is the txgain and rxgain settings in Asterisk are global, meaning the gain settings are compounded for any system with bridging. For bridged TDM systems, it is hard to get the balance between voice volume and residual echo right. But for simpler systems, setting txgain = –10 or thereabouts usually produces acceptable call volume with little perceived echo after about 10 seconds.
The remaining problem under Asterisk is the slow convergence of the FIR estimation. An ingenious mechanism for dramatically improving the convergence time of the echo canceller is Asterisk's echo training option. Transmitted voice is disabled for a short time during ringing and a spike of sound is transmitted to measure the FIR directly instead of learning it iteratively over many samples. The echo training option eliminates most of the echo at the beginning of the call in many cases. But its use is restricted to simple systems where ringing can be detected. It does not function on PRI T1 or E1 lines.
Today, all long-distance calls over 600km routinely are echo-cancelled at each end. Cell-phone calls to the PSTN always are echo-cancelled. Calls originating from digital end points, such as ISDN or VoIP, should have no echo. Thus, only analog calls over distances less than 600km actually need any echo cancellation. Even local calls often are echo-cancelled by the PSTN, simply because the capacity is there.
The result is that on most VoIP-PSTN gateways, including Asterisk, a great deal of echo cancellation goes on that is unnecessary and, in fact, detrimental to voice quality. For example, a VoIP-based call center may handle mostly 1-800 calls, the majority being long-distance ones that require no echo cancellation.
Although it is complicated and computationally intensive to cancel echo, it turns out that it is quite easy to measure whether echo is present on a call (Figure 4). A simple algorithm built into a Field Programmable Gate Array can measure within a second or two of speech whether echo cancellation is required for the call. If the call has no echo, echo cancellation can be disabled. Thus, for a system using hardware echo cancellation in DSPs, it is possible to allocate DSP resources dynamically to the calls that need them. But the really dramatic improvements are seen in systems with software echo cancellation.
In software echo cancellers, the considerable CPU load that can be freed by echo detection is always immediately available to other processes, which in turn can increase the quality and capacity of the system significantly. More important, echo detection changes the optimization point of the echo canceller design. If only a fraction of calls will require any echo cancellation, the canceller itself can afford to be designed to include the additional features, such as nonlinear processing and fast convergence, that will make the audio truly toll-quality.
- Stepping into Science
- Synacor, Inc.'s Zimbra Open Source Support and Zimbra Suite Plus
- Linux Journal December 2016
- CORSAIR's Carbide Air 740
- A Better Raspberry Pi Streaming Solution
- Tyson Foods Honored as SUSE Customer of the Year
- The Tiny Internet Project, Part II
- Radio Free Linux
- FutureVault Inc.'s FutureVault
- Message for You, Sir!