Quantcast
Username/Email:  Password: 

ViaVoice and XVoice: Providing Voice Recognition

An overview of voice recognition software for Linux, its benefits and shortcomings and its cloudy future.

Conversing with a computer has long been
a staple of science fiction. Such conversations are still largely
in the realm of fiction, but voice recognition technology has
improved significantly over the last decade. A number of voice
recognition and control products are available on various
platforms. Many people don't realize, however, that it is possible
to control the Linux desktop by voice, and it has been possible for
some time.Voice control can provide computer access for those with
overuse syndromes or other arm injuries--users who in the past had
to switch platforms to find voice support. Aside from the geek
factor, ordinary users can benefit from reduced arm stress and
improved ease-of-use and speed for some tasks. Although the future
of the software discussed in this article is somewhat in
question--and does not give a completely hands-free environment--it
does work. All that is required is a modest investment of time and
money.Voice control on Linux is possible by using two software
packages. IBM ViaVoice for Linux supplies the basic voice
recognition engine. XVoice, available under the GPL, uses the
ViaVoice libraries to provide control of the desktop and
applications.IBM offers ViaVoice for Linux (for US English) in the United
States and Canada. It is available for around $40, plus shipping,
and includes a headset. It also can be downloaded from the IBM web
site for a small discount. A slightly newer version of ViaVoice
also is available as part of the Mandrake 8.0 PowerPack and
ProSuite editions. The Mandrake ViaVoice apparently offers language
support for both British and American English, French and German.
Mandrake versions later than 8., however, no longer include
ViaVoice. This article focuses solely on installing and using the
version available from IBM.Installing ViaVoiceViaVoice for Linux requires a 233MHz Pentium MMX or better,
with at least 128MB of RAM and a 16-bit sound card. It was designed
to install on Red Hat 6.2, but I am using it successfully on Red
Hat 7.3. Others also have had success installing it on non-Red Hat
systems. Be prepared to experience some installation problems,
though.The first step is to install a Java Runtime Environment.
ViaVoice 1.0.1.1 was tested with JRE-1.2.2 revision RC4 from
blackdown.org.
Using this exact revision will avoid incompatibilities with a
different JRE.After the JRE is installed, mount the CD and run
vvsetup in the CD root directory as root. Once
installed, run vvstartuserguru as yourself to
set up as a ViaVoice user, configure the right audio levels and
begin training ViaVoice for your voice. I could not get myself
installed as a user until I deleted the /viavoice directory in my
home directory (created during installation). I then had to rerun
the user guru. This move fixed the problem, but it's rather
disappointing that the installation script is so frail. Judging by
the accounts of other people trying to install ViaVoice, I had an
easy installation.Training ViaVoiceA base installation of ViaVoice, like other voice recognition
software, does not provide great accuracy at first. Each user must
train ViaVoice to better recognize his or her own idiosyncratic
voice.One training method is to read back text that ViaVoice
displays in the user guru. This process is fairly easy to do, but
it may not reflect the type of words and phrases that you tend to
use a lot, making it less effective.A better alternative is to use the ViaVoice Dictation Java
application when working on actual documents. As you dictate, some
words or phrases are recognized incorrectly. When this occurs, you
use the correction facilities within Dictation to correct the
errors. ViaVoice then tunes its voice models to better fit your
voice. This method is more labor-intensive, but usually these
corrections can be done with voice commands. A word of warning:
save your work often, as Dictation is prone to crash.An industry consultant told me that with 10 to 60 hours of
training, current voice recognition technology should reach 98%
accuracy. I have lost track of how much time I've spent on
training, but my accuracy is only about 92-95% on arbitrary text.
This may be because ViaVoice for Linux is much older than the Mac
and Windows versions, or it could be for any number of other
reasons. Fortunately, spoken commands are much more accurately
recognized because there are fewer valid possibilities to
match.Even with only a couple of hours of training, you should
notice improved accuracy. One thing I found is I needed to be more
careful with my pronunciation. Bad microphones or background noise
also can cause accuracy problems.Installing XVoiceOnce you have ViaVoice installed and at least partially
trained, you are ready to install XVoice to allow voice control of
your desktop and applications. On its own, ViaVoice for Linux does
not give you these capabilities.XVoice can be downloaded from
xvoice.sourceforge.net.
Be sure to download and install the RPM, as the source requires a
discontinued ViaVoice for Linux SDK (more on this later).Once installed, simply type xvoice -m in a
terminal window (make sure that Dictation is not running, as they
cannot run at the same time). As a simple test, say "next window",
which should change focus to another window on your desktop.XVoice OverviewXVoice allows a user to associate a set of actions with a
predefined spoken command. A set of commands is called a grammar.
Grammars can be associated with specific applications, windows or
modes within an application. They also can be general and
accessible from any context. Actions invoked can include generated
keystrokes, mouse events, calls to external programs or any
combinations of these.XVoice uses the ViaVoice libraries to recognize commands or
regular text. Commands are defined in an xvoice.xml configuration
file. XVoice uses a standard configuration file,
/usr/share/xvoice/xvoice.xml, until you create your own in
~/.xvoice/xvoice.xml.The XVoice window displays which command grammars are active
and includes a pane showing the most recently dictated words. If
XVoice thinks that something you said was close to a command but
isn't sure, then the text shows up gray in this pane to alert you,
and the command actions will not be executed.XVoice can be in four different states for any given
application window. In command mode, XVoice listens only for
commands. In dictate mode, XVoice doesn't listen for application
specific commands (although it does listen for more general
commands) and simply types whatever it thinks you have spoken. In
idle mode, only general commands are listened for. Finally, in
command and dictate mode both can be on simultaneously, so both
dictation and commands are listened for. Commands are distinguished
from plain text by pausing slightly before and after speaking a
command.When you first focus an application, XVoice automatically
starts in command mode. To turn on dictate mode as well, you simply
say "dictate mode". To stop dictating, say "stop dictation".For optimal utility, make the XVoice window sticky in your
window manager so you are always able to see how it has interpreted
your speech. To have the XVoice start up automatically listen for
input, put xvoice -m in your window manager
start up programs.Controlling Your ApplicationsLet's look at the sample application grammar definition in
Listing 1 to understand how to define a grammar for an application.
First we define the application name for human readability, and
then we define an expression to match the window title for this
application (line 1). This is how XVoice determines which grammar
to activate. In line 1 we're looking at a special built-in
application name, so this isn't a real window title. The commands
in this special grammar are accessible from any context.Listing 1. Sample Application
Grammar Definition
An application tag also can have a dictation attribute. If
true, this places XVoice into dictation mode when first activating
this grammar. On line 2, we include some definitions that have been
defined earlier in a <define
name='numbers'>
section. Define sections let you
define your own tags for use throughout your configuration
file.Line 3 is an example of what might be included in a define
section, although here the direction tag can be used only in the
scope of this grammar. This line is associating spoken directions
with their respective arrow keys. When evaluated in a command, the
spoken direction is substituted with its corresponding key. XVoice
allows any character names from /usr/X11R6/include/X11/keysymdef.h
to be escaped in the & style. Note the closing period at the
end.The mapping of spoken commands to actions begins at line 4.
Saying "last window" produces a simulated Alt-Tab keystroke. This
is because \ is the escape sequence for the Tab key, and the Alt
key is simulated because the alt attribute is true. Control and
Shift are other possible attributes.The char attribute actually can include a string, as seen in
line 6. Commands like this really can save you time filling out
forms.Line 7 uses a more complex command expression. When
evaluated, {1} on the right side of the arrow ("->") is replaced
with the content of the first braces in the spoken command on the
left, {2} with the second and so on. So saying "move to view port
3" results in the keystroke alt-F3 (alt + &F3;), which in my
window manager configuration switches me to the third desktop view
port.Before listening for commands, custom defined tags are
substituted with their definitions. Line 8 works exactly as if the
definition of <direction> on line 3
appeared in place of the tag itself. The same is true of the
raw-number tag, which has been defined as a positive whole number
in the numbers definition mentioned above.Line 8 also introduces the repeat tag. It repeats the
enclosed events a defined number of times. Here it is repeating an
arrow key press (defined on line 3). The number of times specified
is the number spoken after the direction. In other words, saying
"go up 10" results in 10 arrow up key presses.The mouse event tag can be seen on lines 9-15. This event tag
allows you to reposition the mouse pointer and simulate mouse
clicks. The x and y attributes take pixel values. The mouse origin
attribute can be root (absolute), window (from the top left corner
of the application window), relative (to the current pointer
position) or even widget (an experimental option for
difficult-to-automate applications). The XVoice application allows
mouse events to be easily recorded for pasting into your
configuration file.Lines 11 and 12 allow horizontal voice movement of the mouse
pointer. Line 13 does the same for vertical movement but in a
single line. Notice how the sign of the pixel movement amount is
being determined: {2} will be either a + or - depending on the
direction spoken.XVoice also can execute other programs, as on lines 16-22.
What could be easier than simply saying "x term" to get a new
terminal window? I added Mozilla to the ViaVoice dictionary using
Dictation to allow it to be recognized.Look at the expr attribute on line 18. this is a window title
matching expression. If I say "pine" and a window titled Pine is
already open, focus is switched to the existing window rather than
starting a new instance. The only problem is that your window
manager (Sawfish, for example) may not switch you to the correct
view port or workspace to actually use the newly focused
window.The calls to xmms on lines 19-22 illustrate a benefit of
server-based applications. These lines allow me to control music
playback from any context--I don't need to find the xmms window. In
fact, the screen even can be locked, which could be a security
issue for you.Line 23 finishes the application grammar definition. Be sure
not to forget the period to close the <<root>> section.
Simple mistakes like adding an extra character or leaving one out
can lead to error messages of varying usefulness or to lengthy
delays at start up. Unfortunately, XVoice does not provide good
error messages. Because the heart of the configuration lives in
CDATA sections, XML validators probably cannot help you catch
errors. Be careful when changing your configuration file, and make
frequent backups.By editing your personal configuration file, you now should
be able to automate almost any task that previously required the
use of a keyboard or a mouse. Grammars for many common applications
are already included in the default configuration file, and they
provide good study examples. If you do a lot of repetitive tasks,
this can really save your muscles and your time.Issues and the FutureSome applications, mainly games such as TuxRacer, bypass X
for key presses, leaving XVoice unable to control them. Mouse-heavy
applications, such as The GIMP or Netscape, can be automated, but
it's extremely tedious to try to control the mouse by voice.
Fortunately, Mozilla 1.2a has "type ahead find", which, in
conjunction with XVoice, lets you speak text-within-a-text-link to
navigate web pages by voice.Voice recognition in general works great for commands and
fine for casual text. However, even small error rates can be quite
annoying for some uses. Be advised that it can be exasperatingly
difficult to program by voice. Another issue to be aware of is the
possibility of straining your voice, much as it is possible to
overuse your arms.While XVoice and ViaVoice put a lot of power at your control,
it is not quite possible to control entirely the Linux desktop by
voice. This is disappointing to anyone needing hands free
accessibility. Sadly, the weak link is IBM. At least with the
version shipped by IBM, Dictation requires keystrokes for
unavoidable dialog windows, for example, and Dictation is the only
way to train ViaVoice. Of course, if you don't need any additional
training, can automate all your applications and aren't concerned
about security, you're in good shape.IBM has released new versions of ViaVoice for Mac and Windows
but not for Linux. Despite all the money they're spending on other
areas of Linux, they don't even actively market ViaVoice, and their
future support is unclear. In March they pulled without comment the
ViaVoice Linux SDK, which XVoice needs to compile. With this cloud
over the future, the XVoice developers are currently trying to find
a viable open-source alternative instead of adding new features. A
group of developers and users is out there wanting to make ViaVoice
for Linux a success, but without even minor support from IBM the
opportunity will be missed.Rob Spearman is a Seattle
software architect recovering from an overuse syndrome. This
article was written using voice recognition on Linux.

email: rob@smeg.com

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

IBM ViaVoice

Anonymous's picture

I bought ViaVoice from IBM for about $40 in the Spring of 2002. It was the worst install I ever seen! if it wasn't for internet posts I would have never gotten it working. I doesn't come with the needed black down java. You need an older version because thats all its been tested with.

No manual, just a CDROM, headset and a folder sheet of paper called instructions. I had it running on SuSE 7.2, but after installing Jbuilder it broke and I never used it much anyway. Atleast you get a headset. The audio levels are very very low. I bought the headset for OS/2 many years ago and voice type dictation (pausing in betwen words) was more impressive on my old P90 compared to ViaVoice on my Athlon 1900mhz system! The joy of Java intrepretted code! When CPUs get faster maybe thell come out with a GWBASIC version! Anyway, this product is love / hate. The continious speech works nicely, but the error rates are high one error in about 10-20 works. 95% sounds good but thats one wrong word in every 20. Atleast the words are all spelled correctly.

The gui is sluggish and pretty basic. I expected more from IBM. I seen better 0.x open source projects!

--Ed March

http://www.poetworld.org/~emarch

Re: IBM ViaVoice Vs other products

Anonymous's picture

Voice recognition has greatly improved. I have ViaVoice
for os X and WinXP and think it is great but could be
much better.

How does the CMU or other offerings compare with
these two well known standards?

I appreciate knowing that the ViaVoice for Linux isn't
worth loading. Maybe that is why IBM fired all the
linux developers and took the nix offering off their
website.

Link to IBM purchase page

tjmather's picture

Here's the link to purchase the IBM Viavoice for Linux.

Also, there is a new xvoice-sphinx project.

Re: ViaVoice and XVoice: Providing Voice Recognition

Anonymous's picture

IBM took the Linux page off its web server, but if you want to buy it you can still order it over the phone at IBM Direct:

800-426-2255.

Also, if you are concerned about the future of Linux voice recognition, get involved.

Join a mailing list and keep up to date.

Harass IBM. Current law REQUIRES that businesses make reasonable accomodation for disabled employees, and VR software will soon be a mandatory offering of each business. It will not be possible for them to make inroads in the office desktop market without it.

sphinxTrain :(

Anonymous's picture

I am working on a project which is using sphinx II as speech recognition. But I am too stupid to understand the its training produre (although I have read its documentation over and over again) so if anyone who know how to train sphinx II using sphinxTrain please email me at vinhtran@ork.net

Please clarify me how the work go on step by step and how to set up things for it (the more detailed the better)

I know this will take a lot of your time but...PLEASE HELP ME. I WOULD HIGHLY APPRECIATE.

Thank in advance.

ViaVoice for Linux is still available!

Anonymous's picture

According to IBM, ViaVoice for Linux is available over the phone at 800-426-2255. For some reason they have removed it from their web store. If you're tempted to try out serious voice recognition on Linux, I suggest you purchase a copy and let IBM know that there is a market for the Linux version.

No thanks, I'm sticking with

Anonymous's picture

No thanks, I'm sticking with the idea of open source on linux, and padding IBMs pocket isn't appealing to me. Maybe sphynx will work out, or i'll just keep typing.

Re: ViaVoice and XVoice: Providing Voice Recognition

Anonymous's picture

What about CMU Sphinx?

nobody ever seems to realize that there IS an open-source continuous flow speech recognition engine out there.

actually there is two

CMU Sphinx-II, which deals with limited domain speech, and

CMU Sphinx-III, which deals with unlimited domain speech for dictation purposes.

google for them.

Re: ViaVoice and XVoice: Providing Voice Recognition

Anonymous's picture

In March they pulled without comment the ViaVoice Linux SDK, which XVoice needs to compile.

So, like, then what's the point of this article? "There's this cool technology but it's no longer available and old versions are no longer supported" is hardly a tutorial or "that it is possible to control the Linux desktop by voice, and it has been possible for some time" -- this is just Linux propaganda: The correct statement is that it was almost available under linux, but the vendor lost interest before the product was viable. For what its worth, numerous postings to the IBM ViaVoice list have pleaded for a contact address where we can purchase this mythical $40 Linux kit; if the author knows something the IBM engineers don't about where to get this, please post it -- that would be information worth publishing!

Re: ViaVoice and XVoice: Providing Voice Recognition

Anonymous's picture

If you download the XVoice RPM, you don't need the SDK. (You will need ViaVoice of course.) I have been using this set up for 8 months. With more users or more interest, IBM may have continued to support ViaVoice on Linux. It appears that the article came out just after they pulled ViaVoice for Linux. Bad timing.

Re: ViaVoice and XVoice: Providing Voice Recognition

Anonymous's picture

Not just propaganda. Check out cvoicecontrol:

It converts spoken words to commands to be executed. It's a little dated and not maintained anymore, but it's available and I've found it work nicely.

ViaVoice/XVoice isn't the only piece of software capable of controlling the computer by voice.

Re: ViaVoice and XVoice: Providing Voice Recognition

Anonymous's picture

Hi!

How easy is it to use the CVoiceControl with standard Soundblaster and pentium III 450 MHz CPU?Could it recognize the word "One" and execute the command "ls"?

IBM Does Support Linux

Anonymous's picture

IBM Linux Portal

But IBM is not a charity with unlimited funds able to afford to

keep every product going forever, even if no one buys it.

Clearly people are buying Win and Mac versions of ViaVoice.

I guess not enough people were willing to pay actual money

for ViaVoice for Linux.

Re: IBM Does Support Linux

Anonymous's picture

comon now... open source demagogy is not about "charity" it's about developement and that's why IBM should provide the SDK version at least. They ARE a megacorp but they risk getting worse...

Well done IBM! May bancrupcy hit your stands!

Re: ViaVoice and XVoice: Providing Voice Recognition

xtifr's picture

At a recent Linux tradeshow, IBM was demoing some products using ViaVoice, and they had a few copies of ViaVoice they were giving away in drawings. I asked if they had any copies for sale, and they said no. So I asked if they had any literature or sales brochures, and they said no. I asked if there was any way to get more information, and nobody in the booth knew. I walked away, bemused.

Re: ViaVoice and XVoice: Providing Voice Recognition

Anonymous's picture

IBM doen't support Linux anymore, they fire all the Linux staff, and don't sold or distribute any Linux Software.

Support Linux, don't buy IBM stuff

Re: ViaVoice and XVoice: Providing Voice Recognition

Anonymous's picture

It's true. And Microsoft hired all the ex-IBM Linux people to produce Office for Linux, which will be out next month. Also, Elvis lives in my dog Buddy's dog house! Stupid elvis! I kick him out, but he keep moving in!

ViaVoice for Linux no more

Anonymous's picture

I started to play with ViaVoice a few times. I sure don't mind paying for software, but I hate the thought of depending on proprietary stuff for anything important. It could go *poof* at any time. Looks like that time has come for ViaVoice for Linux.

I'd like to get some *simple* voice recognition for a limited (aviation) vocabulary.

Re: ViaVoice for Linux no more

Anonymous's picture

Try out CMU Sphinx...

http://fife.speech.cs.cmu.edu/sphinx/

You can create your own simple vocabulary pretty easy. With some of the ones I have made, it is so good at recognizing that it is scary...

Call up 1-877-268-7526 and give it a try.

Later,

Ryan

No ViaVoice!

Anonymous's picture

Good article, but from what I can see IBM no longer offers any version of ViaVoice for Linux. Unless they've carefully hidden it away on their site, that is.

ViaVoice

Anonymous's picture

So, where is possible to find the free ViaVoice rpm version for Linux? I try to find it on the net, but all the link are to the IBM's page.

google

Anonymous's picture

Try http://www.google.com/search?q=viavoice_dict_rtk_3.tar :)

Post new comment

  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <pre> <ul> <ol> <li> <dl> <dt> <dd> <i> <b>
  • Lines and paragraphs break automatically.
  • Use to create page breaks.

More information about formatting options