ViaVoice and XVoice: Providing Voice Recognition
Conversing with a computer has long been a staple of science fiction. Such conversations are still largely in the realm of fiction, but voice recognition technology has improved significantly over the last decade. A number of voice recognition and control products are available on various platforms. Many people don't realize, however, that it is possible to control the Linux desktop by voice, and it has been possible for some time.
Voice control can provide computer access for those with overuse syndromes or other arm injuries--users who in the past had to switch platforms to find voice support. Aside from the geek factor, ordinary users can benefit from reduced arm stress and improved ease-of-use and speed for some tasks. Although the future of the software discussed in this article is somewhat in question--and does not give a completely hands-free environment--it does work. All that is required is a modest investment of time and money.
Voice control on Linux is possible by using two software packages. IBM ViaVoice for Linux supplies the basic voice recognition engine. XVoice, available under the GPL, uses the ViaVoice libraries to provide control of the desktop and applications.
IBM offers ViaVoice for Linux (for US English) in the United States and Canada. It is available for around $40, plus shipping, and includes a headset. It also can be downloaded from the IBM web site for a small discount. A slightly newer version of ViaVoice also is available as part of the Mandrake 8.0 PowerPack and ProSuite editions. The Mandrake ViaVoice apparently offers language support for both British and American English, French and German. Mandrake versions later than 8., however, no longer include ViaVoice. This article focuses solely on installing and using the version available from IBM.
ViaVoice for Linux requires a 233MHz Pentium MMX or better, with at least 128MB of RAM and a 16-bit sound card. It was designed to install on Red Hat 6.2, but I am using it successfully on Red Hat 7.3. Others also have had success installing it on non-Red Hat systems. Be prepared to experience some installation problems, though.
The first step is to install a Java Runtime Environment. ViaVoice 188.8.131.52 was tested with JRE-1.2.2 revision RC4 from blackdown.org. Using this exact revision will avoid incompatibilities with a different JRE.
After the JRE is installed, mount the CD and run vvsetup in the CD root directory as root. Once installed, run vvstartuserguru as yourself to set up as a ViaVoice user, configure the right audio levels and begin training ViaVoice for your voice. I could not get myself installed as a user until I deleted the /viavoice directory in my home directory (created during installation). I then had to rerun the user guru. This move fixed the problem, but it's rather disappointing that the installation script is so frail. Judging by the accounts of other people trying to install ViaVoice, I had an easy installation.
A base installation of ViaVoice, like other voice recognition software, does not provide great accuracy at first. Each user must train ViaVoice to better recognize his or her own idiosyncratic voice.
One training method is to read back text that ViaVoice displays in the user guru. This process is fairly easy to do, but it may not reflect the type of words and phrases that you tend to use a lot, making it less effective.
A better alternative is to use the ViaVoice Dictation Java application when working on actual documents. As you dictate, some words or phrases are recognized incorrectly. When this occurs, you use the correction facilities within Dictation to correct the errors. ViaVoice then tunes its voice models to better fit your voice. This method is more labor-intensive, but usually these corrections can be done with voice commands. A word of warning: save your work often, as Dictation is prone to crash.
An industry consultant told me that with 10 to 60 hours of training, current voice recognition technology should reach 98% accuracy. I have lost track of how much time I've spent on training, but my accuracy is only about 92-95% on arbitrary text. This may be because ViaVoice for Linux is much older than the Mac and Windows versions, or it could be for any number of other reasons. Fortunately, spoken commands are much more accurately recognized because there are fewer valid possibilities to match.
Even with only a couple of hours of training, you should notice improved accuracy. One thing I found is I needed to be more careful with my pronunciation. Bad microphones or background noise also can cause accuracy problems.