Java Speech Development Kit: A Tutorial
The Speech for Java is a Java programming package, 100% object-oriented, that uses the speech recognition and synthesis technology known as ViaVoice, which was commercialized by IBM . This API makes it possible to build application interfaces that take advantage of speech.
The recognition and synthesis operations are not processed by the Java software. The kit is just a way to built speech-oriented interfaces. Applications "throw" speech processing to the software behind them, the IBM ViaVoice (commercial or free version).
In this way, just like any other interface programming models, the Speech for Java Development Kit (SDK) is also event-oriented. Now, however, these events are not fired by the mouse or the keyboard, they are created due to human speech via a microphone.
As mentioned the software is based upon two major features, speech synthesis and recognition. The recognition can be done through the use of grammars, which are entities providing information about how the recognition will be done. There are two kinds of grammars:
dictation: it's specially designed for continuous speech, when the software tries to determine what is being spoken using a large word database that associates a sound with each word. To increase accuracy and performance, contextual data is considered in order to establish the pronounced words. It is possible to use different dictation grammars for different domains, such as medicine, engineering or computer science.
rule: these grammars use rules, user's definitions of what might be spoken and how that will be interpreted by the application. The set of rules can take any size, but usually it is limited by the context in which the application is inserted. In this way, commands might be established in execution time to enhance responsiveness.
Voice synthesis might provide the application with simple string sentences as arguments to the method speak. An improved naturalness can be achieved via the use of a special markup language--Java Speech Markup Language (JSML). Through its use, properties like voice, frequency, rhythm and volume can be dynamically altered.
Figure 1. Simplified Architecture of the IBM SDK Java Technology
In figure 1, we can see that the sound hardware is controlled by the operating system. Right above it is the Engine.exe binary application, which is automatically initialized when the voice synthesis/recognition applications are started. The Engine is the heart of the IBM ViaVoice ( in the commercial or free versions). It is responsible for accessing all the ViaVoice's features.
Also, in figure 1, we can see the basic components of voice applications: the entities Recognizer and Synthesizer, the Central class and, especially, the Engine Interface, which is extended by the Synthesizer and Recognizer Interfaces. This interface has all the basic methods for controlling and accessing the ViaVoice processing engine.
The Engine Interface is important due to the fact that Java is a multiplatform language. For that reason, the same development kit is used on the UNIX, Linux and Windows systems, but each system has its own binary implementation of the IBM ViaVoice processing engine. The Engine interface hides the details of the platform-dependent software, offering proper access for the Recognizers and Synthesizers.
Now it is necessary to describe the Central class: it is in charge of implementing the Engine interface. The Central class is, in fact, responsible for abstracting platform details by providing the correct implementation of the Engine Interface. The recognizers and synthesizers extend the Engine Interface.
Figure 2. Full Architecture of the IBM SDK Java Technology
Below is a link to a code example that illustrates the most simple way to create an application with a recognizer and a synthesizer, with no working functionality.
The synthesizers, as their name indicates, are the entities responsible for speech synthesis. They are created through the use of the Central class, which implements the Engine interface and acts as a connection to the synthesis provided by the IBM ViaVoice technology.
Creating a voice synthesizer can happen in one of two ways, and both are use the Central class static method, createSynthesizer:
1. Accessing the default synthesizer of a determined locale is the most simple and common method. It usually establishes access to the synthesizer implementation distributed with the ViaVoice software. It might be done as shown with the code below.
Locale.setDefault("en","US");
Synthesizer sintethesizer =
Central.createSynthesizer(null);
2. Accessing a synthesizer that satisfies the conditions defined through the arguments passed on by the createSynthesizer method is the second way. This method is used in cases where more than one synthesizer is available. The parameters are:
name of the engine
name of the mode in which it will be used
a locale supported by the engine
a Boolean value, a control flag of the engine
an array of objects Voice that will be used
These parameters are defined creating an object SynthesizerModeDesc that will be passed to Central.createSynthesizer, as seen here:
public SynthesizerModeDesc(String engineName, String modeName, Locale locale, Boolean running, Voice[] voices)
Remember that any of the attributes can be null, and the Central class will be responsible for identifying the best synthesizer to fit the conditions.
Synthesizing voice: Once we have created the synthesizer, we can access its functions with the speak method. A simple String argument is enough for the basic features of synthesis, but there are other possibilities to increase naturalness of computer speech. The most powerful one of them is JSML (Java Speech Markup Language, covered in the next section), which provides various techniques to make the speech more similar to human voice.
Table 1 shows all the forms of the speak method:
Table 1. Methods Used for Voice Synthesis
Method | Function |
void speakPlainText(String text, SpeakableListener listener | Speak a plain text string. The text is not interpreted as containing the Java Speech Markup Language, so JSML elements are ignored. |
void speak(Speakable JSMLtext, SpeakableListener listener) | Speak an object that implements the Speakable interface and provides text marked with the Java Speech Markup Language. |
void speak(URL JSMLurl, SpeakableListener listener) | Speak text from a URL formatted with the Java Speech Markup Language. The text is obtained from the URL, checked for legal JSML formatting and placed at the end of the speaking queue. |
void speak(String JSMLText, SpeakableListener listener) | Speak a string containing text formatted with the Java Speech Markup Language. The JSML text is checked for formatting errors, and a JSMLException is thrown if any are found. |
The speakable objects are members of classes that implement the speakable interface. This interface has only one method, getJSMLText. This method specifies a JSML String to be returned when the object is submitted to the speak method. An example can be seen in the following sample code.
The SpeakableListener: To the methods of table 1, an extra element may be attached, a SpeakableListener. It will receive specific events for each pronounced word. Different events are generated during the synthesis process, and these events can be used to take control of the speech process, enabling a more interactive application. They indicate when a new word starts to be pronounced, if its synthesis was canceled and if it is over or was paused, among other events that allow the monitoring of the synthesis process.
The events are instances of the SpeakableEvent and are thrown by the synthesizer to be caught and treated by the listener. These entities carry information about the spoken word. More detail. The listeners are optional, and may be bypassed with a null argument to the synthesis methods.
The speakableListeners might be used in two ways:
associated with the listeners through the method speak, refer to table 1. This will define a listener for each item added to the items queue of the synthesizer. One listener might be shared by any number of queued items.
associated with the Synthesizer object through the addSpeakableListener method. This way the listener will receive the events of all the queued items in a determined synthesizer.
The listeners associated via the speak method will receive the events before the ones associated via the addSpeakable Listener.
The items queue: a synthesizer implements a queue of items provided to it through the speak and speakPlainText methods. The queue is "first-in, first-out" (FIFO)--the objects are spoken in the exactly order in which they were received. The object at the top of the queue is the object that is currently being spoken or about to be spoken. The QUEUE_EMPTY and QUEUE_NOT_EMPTY states of a Synthesizer indicate the current state of the speech output queue. The state handling methods inherited from the Engine interface (getEngineState, waitEngineState and testEngineState) can be used to test the queue state. The items on the queue can be checked with the enumerateQueue method, which returns a snapshot of the queue. The cancel methods allows an application to:
stop the output of an item currently on the top of the speaking queue.
remove an arbitrary item from the queue.
remove all items from the output queue.
The Voice: as an additional function of the synthesizers, we have the ability to choose the Voice that will be used in the synthesis. The parameters that must be provided are:
gender: GENDER_MALE, GENDER_FEMALE, GENDER_NEUTRAL and GENDER_DONT_CARE.
age: AGE_CHILD, AGE_DONT_CARE, AGE_MIDDLE_ADULT, AGE_NEUTRAL, AGE_OLDER_ADULT, AGE_TEENAGER and AGE_YOUNGER_ADULT.
To determine the association of the voice and the synthesizer, it is necessary to recover a SynthesizerProperties object through the getSynthesizerProperties method and determine the voice using setVoice. This can be more clearly understood in the following example code.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Linux Systems Administrator
- New Products
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Favorite (and easily brute-forced) pw's
1 hour 26 min ago - Have you tried Boxen? It's a
7 hours 18 min ago - seo services in india
11 hours 49 min ago - For KDE install kio-mtp
11 hours 50 min ago - Evernote is much more...
13 hours 50 min ago - Reply to comment | Linux Journal
22 hours 35 min ago - Dynamic DNS
23 hours 9 min ago - Reply to comment | Linux Journal
1 day 8 min ago - Reply to comment | Linux Journal
1 day 58 min ago - Not free anymore
1 day 5 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?



Comments
Reference Links are Dead
hi there
this is a very nice article. i really liked it and i planned to implement this in my project but unfortunately all the references links are dead.
kindly help me out . u can reach me at gobicse@gmail.com....
any help will be of great help..
thanks
Reference Links are Dead
hi there
this is a very nice article. i really liked it and i planned to implement this in my project but unfortunately all the references links are dead.
kindly help me out . u can reach me at gobicse@gmail.com....
any help will be of great help..
thanks
This is really useful. I
This is really useful. I successfully completed my project voice controlled wheel chair with the help of this concept... It works well....
Re: Java Speech Development Kit: A Tutorial
Is there lib to use text to speak in portuguese language ?
Re: Java Speech Development Kit: A Tutorial
i have read your tutorial about the Java Speech Development Kit, it is truely very interesting, i would realy love to develop a programe of my own on this context. please help me im a graduate from the University of Botswana in Computer Science in Botswana, i realy ineterested in speech program but i dont know where to start and what i need. more especialy that i dont have any of the classies that i can use for statup training plz give me an advice.
my Email is matikitim2@yahoo.cu.com
thank you in advace
Moathodi Excellent Matikiti
Re: Java Speech Development Kit: A Tutorial
I am in the same situation as Moathodi, were I would like to develope a program on this subject. I would really appreciate it if you could aid me in kicking it of as I am not sure were to start.
My email is adil_rehman@hotmail.com
Many Thanks
Adil
Speech recognition engines
I have done a lot of research on the speech development architecture of java..but i am more interested in some package that might make the development of a speech to text application faster. Thanx...hope i get a reply...my emaill is mario_ramotar@yahoo.com