Java Speech Development Kit: A Tutorial

 in
The authors show how to get started developing voice-activated interfaces using the Speech for Java Development Kit.
Listeners

The IBM ViaVoice Java technology is based on the same model used in graphic user interface (GUI) programming, that is, it depends on events generated by the user, events that are intercepted by listeners. These listeners are interfaces implemented by software developers, in this way any kind of processing can be triggered by specific speech events.

All speech events are derived from the class SpeechEvent, each one with specific information of the occurrence that fired it.

EngineListener: interface defining methods to be called when state-change events for a speech engine occur. To receive engine events an application attaches a listener by calling the addEngineListener method of an Engine. A listener is removed by a call to the removeEngineListener method.

=> Associated event: EngineEvent or one of its derived classes.

The main events treated are engineAllocated, engineAllocatingResources, engineDeallocated, engineDeallocatingResources, engineError, enginePaused and engineResumed.

They can be understood by observing Figure 4.

Figure 4. Engine States

The Engine's working cycle is shown in Figure 4: first the engine is created in DEALLOCATED state, to initially use it is necessary to call the allocate method. Then the engine passes to the temporary ALLOCATING_RESOURCES state, staying in this condition for a machine-dependent time; finally it reaches the ALLOCATED state with its substates: RESUMED (default) indicating engine activity and PAUSED indicating suspended activities, the transition between these states is possible using pause and resume methods. By the end of the engine use, it is recommended that the hardware resources be explicitly freed through the use of the deallocate method. After that the engine enters the DEALLOCATING_RESOURCES state and finally gets back to its initial DEALLOCATED state. Since recognizer and synthesizers inherit from the Engine Interface, they also follow this cycle.

For each of the five possible states a unidirectional transition causes an event to be fired, in a total of six events plus an engineError possible event. Namely, engineAllocated, engineAllocatingResources, engineDeallocated, engineDeallocatingResources, engineError, enginePaused and engineResumed.

SynthesizerListener: listener dedicated to treat the events generated during the synthesizer working cycle, more precisely during the synthesis function of the engine (see Figure 5).

=> Associated event: SynthesizerEvent or one of its derived classes.

Through this listener it is possible to take control of the synthesizer's items queue, which changes its state when:

  • a new item is added to the queue via the speak method.

  • an item is removed from the queue via the cancel method.

  • an item audio output is finished.

Figure 5. Synthesizer States

The differences between the synthesizer cycle and engine cycle are the substates of the ALLOCATED state: ALLOCATED RESUMED QUEUE_EMPTY, ALLOCATED RESUMED QUEUE_NOT_EMPTY, ALLOCATED PAUSED QUEUE_EMPTY and ALLOCATED PAUSED QUEUE_NOT_EMPTY. It must be observed that the queue condition is independent of the synthesizer working condition; items might be added or removed from the queue in both states RESUMED and PAUSED, so these new substates are not part of a cycle but just indicators of the queue state.

The possible events are:

  • queueEmptied: the speaking queue of the Synthesizer has emptied and the Synthesizer has changed to the QUEUE_EMPTY state. The queue may become empty because speech output of all items in the queue is completed or because the items have been canceled.

  • queueUpdated: the speech output queue has changed. This event may indicate a change in the state of the Synthesizer from QUEUE_EMPTY to QUEUE_NOT_EMPTY. The event may also occur in the QUEUE_NOT_EMPTY state without changing state. The enumerateQueue method of the Synthesizer will return a changed list. The speech output queue changes when:

    • a new item is placed on the queue with a call to one of the speak methods, or

    • when an item is removed from the queue with one of the cancel methods (without emptying the queue), or

    • when output of the top item of the queue is completed (again, without leaving an empty queue).

This listener inherits from the EngineListener interface and therefore the same events treated by this listener might be treated by the SynthesizerListener.

SpeakableListener: listener specially designed to treat events generated during speech synthesis. How to set it up:

  • Provide a SpeakableListener object when calling one of the speak or speakPlainText methods of a Synthesizer.

  • Attach a SpeakableListener to a Synthesizer with its addSpeakableListener method.

=> Associated event: SpeakableEvent or one of its derived classes.

The events that might be treated by the listener are:

  • markerReached: issued when audio output reaches a marker contained in the JSML text of a speech output queue item. The event text is the string of the MARK attribute. The markerType indicates whether the mark is at the opening or close of a JSML element or is an attribute of an empty element (no close).

  • speakableCanceled: issued when an item on the synthesizer's speech output queue is canceled and removed from the queue. A speech output queue item may be canceled at any time following a call to speak. An item can be canceled even if it is not at the top of the speech output queue (other SpeakableEvents are issued only to the top-of-queue item). Once canceled, the listener for the canceled object receives no further SpeakableEvents.

  • speakableEnded: issued with the completion of audio output of an object on the speech output queue as the object is removed from the queue. A QUEUE_UPDATED or QUEUE_EMPTIED event also is issued when the speech output queue changes because the speech output of the item at the top of the queue is completed. The SpeakableEvent is issued prior to the SynthesizerEvent.

  • speakablePaused: issued when audio output of the item at the top of a synthesizer's speech output queue is paused. The SPEAKABLE_PAUSED SpeakableEvent is issued prior to the ENGINE_PAUSED event that is issued to the SynthesizerListener.

  • speakableResumed: issued when audio output of the item at the top of a synthesizer's speech output queue is resumed after a previous pause. The SPEAKABLE_RESUMED SpeakableEvent is issued prior to the ENGINE_RESUMED event that is issued to the SynthesizerListener.

  • speakableStarted: issued at the start of audio output of an item on the speech output queue. This event immediately follows the TOP_OF_QUEUE unless the Synthesizer is paused when the speakable text is promoted to the top of the output queue.

  • topOfQueue: issued when an item on the synthesizer's speech output queue reaches the top of the queue. If the Synthesizer is not paused, the TOP_OF_QUEUE event will be followed immediately by the SPEAKABLE_STARTED event. If the Synthesizer is paused, the SPEAKABLE_STARTED event will be delayed until the Synthesizer is resumed.

  • wordStarted: issued when a synthesis engine starts the audio output of a word in the speech output queue item. The text, wordStart and wordEnd parameters define the segment of the speakable string which is now being spoken.

It must be observed that, SynthesizerListener extends the SDK Java API's EngineListener differently from the SpeakableListener that extends JDK API's EventListener. They have different natures; the former is related to synthesizer's working events (Engine application) and the latter is related to synthesis processing events.

RecognizerListener: listener dedicated to treat events generated during the recognizer working cycle, or more precisely, during the recognizing function of the Engine. Through its use it is possible to take control of the recognizer processing events.

=> Associated event: RecognizerEvent or one of its derived classes.

Figure 6. Recognizer States

The difference between the recognizer cycle (Figure 6) and the engine cycle are the substates of the ALLOCATED state: SUSPENDED, LISTENING and PROCESSING (Figure 7) that indicate the audio input processing state. SUSPENDED is the default state, which changes to the LISTENING state due to audio input. Then the PROCESSING state takes place.

The SUSPENDED state is also reached via the recognizer's suspend method. In this state the audio input is buffered for later processing. To leave this SUSPENDED state the commitChanges method should be called, then the LISTENING state is reached again.

Following we have the recognition cycle:

Figure 7. Recognition Cycle

The states FOCUS_ON and FOCUS_OFF are reached through requestFocus and releaseFocus methods. They indicate if the application has access to the Engine in the case where more than one application is running at the same time (just one can have access at the same moment).

Besides the Engine events, the following events can be generated:

  • changesCommitted: a CHANGES_COMMITTED event is issued as a Recognizer changes from the SUSPENDED state to the LISTENING state and resumes recognition. The GRAMMAR_CHANGES_COMMITTED event is issued to the GrammarListeners of all changed grammars immediately following the CHANGES_COMMITTED event.

  • focusGained: a FOCUS_GAINED event is issued as a Recognizer changes from the FOCUS_OFF state to the FOCUS_ON state. A FOCUS_GAINED event typically follows a call to requestFocus on a Recognizer. The GRAMMAR_ACTIVATED event is issued to the GrammarListeners of all activated grammars immediately following this RecognizerEvent.

  • focusLost: a FOCUS_LOST event is issued as a Recognizer changes from the FOCUS_ON state to the FOCUS_OFF state. A FOCUS_LOST event may follow a call to releaseFocus on a Recognizer or follow a request for focus by another application. The GRAMMAR_DEACTIVATED event is issued to the GrammarListeners of all deactivated grammars immediately following this RecognizerEvent.

  • recognizerProcessing: a RECOGNIZER_PROCESSING event is issued as a Recognizer changes from the LISTENING state to the PROCESSING state.

  • recognizerSuspended: a RECOGNIZER_SUSPENDED event is issued as a Recognizer changes from either the LISTENING state or the PROCESSING state to the SUSPENDED state. A result finalization event (either a RESULT_ACCEPTED or RESULT_REJECTED event) is issued immediately following the RECOGNIZER_SUSPENDED event.

The RecognizerListener extends the EngineListener interface, and therefore, the same events treated by it might be treated by the RecognizerListener.

RecognizerAudioListener: listener used to take care of operational audio events.

=> Associated event: RecognizerAudioEvent or one of its derived classes.

We have:

  • audioLevel: the AUDIO_LEVEL event indicates a change in the volume level of the incoming audio. This volume ranges from 0.0 to 1.0.

  • speechStarted: the recognizer has detected the possible start of speech in the incoming audio. Applications may use this event to display visual feedback to a user indicating that the recognizer is listening.

  • speechStopped: the recognizer has detected the end of speech or noise in the incoming audio that it previously indicated by a SPEECH_STARTED event. This event always follows a SPEECH_STARTED event.

GrammarListener: controls the events generated due to changes in grammar objects.

=> Associated event: GrammarEvent or one of its derived classes.

Its events:

  • grammarActivated: a GRAMMAR_ACTIVATED event is issued when a grammar changes state from deactivated to activated. The isActive method of the grammar will now return true. Grammar activation changes follow one of two RecognizerEvents:

    • a CHANGES_COMMITTED event in which a grammar's enabled flag is set true.

    • a FOCUS_GAINED event.

    The full details of the activation conditions under which a grammar is activated are described in the documentation for the grammar interface.

  • grammarChangesCommitted: a GRAMMAR_CHANGES_COMMITTED event is issued when a Recognizer completes committing changes to a grammar. The event is issued immediately following the CHANGES_COMMITTED event that is issued to RecognizerListeners. That event indicates that changes have been applied to all grammars of a Recognizer. The GRAMMAR_CHANGES_COMMITTED event is specific to each individual grammar. The event is issued when the definition of the grammar is changed, when its enabled property is changed or both.

  • grammarDeactivated: a GRAMMAR_DEACTIVATED event is issued when a grammar changes state from activated to deactivated. The isActive method of the grammar will now return false. Grammar deactivation changes follow one of two RecognizerEvents:

    • a CHANGES_COMMITTED event in which a grammar's enabled flag is set false.

    • a FOCUS_LOST event.

    The full details of the activation conditions under which a grammar is deactivated are described in the documentation for the grammar interface.

ResultListener: this is the main listener of all. It is responsible for listening to the events generated by the result objects that are created by the recognizers that work together with grammar objects. Implementing this interface enables the developer to determine what processing must be done to answer a specific event. It is associated with the ResultEvent objects that carry information about the recognizer, the grammar and the event.

The result objects carry information about the grammar to which it was associated, about the recognizer that created it, the strings named ResultToken representing what was said (when possible), information about the spoken sound on the form of an AudioClip and data that can be used for training the recognizer.

The possible result object states are:

  • FINALIZED:

    • ACCEPTED: the audio item was understood and an association with one of the active grammars was determined.

    • REJECTED: the audio item was understood, but the recognizer considers a high possibility of a mistake having been made. That means the recognizer was able to understand what was said (to associate a string, or token, meaning the heard sound), but there was not enough information to be sure of the recognition due to poor sound quality, a bad pronunciation or even due to hardware problems. These results must be carefully treated by the application.

  • UNFINALIZED: the audio item was understood and it's been processed, but it was not possible to determine an association with one of the active grammars.

=> Associated event: ResultEvent or one of its derived classes.

  • audioReleased: an AUDIO_RELEASED event is issued when the audio information associated with a FinalResult object is released. The release may have been requested by an application call to releaseAudio in the FinalResult interface or may be initiated by the recognizer to reclaim memory. The FinalResult isAudioAvailable method returns false after this event. The AUDIO_RELEASED event is only issued for results in a finalized state (getResultState returns either ACCEPTED or REJECTED).

  • grammarFinalized: GRAMMAR_FINALIZED is issued when the grammar matched by a result is identified and finalized. Before this event the getGrammar method of a result returns null. Following the event it is guaranteed to return non-null, and the grammar is guaranteed not to change. The GRAMMAR_FINALIZED event only occurs for a result that is in the UNFINALIZED state. A GRAMMAR_FINALIZED event does not affect finalized or unfinalized tokens.

  • resultAccepted: a RESULT_ACCEPTED event is issued when a result is finalized successfully and indicates a state change from UNFINALIZED to ACCEPTED. In the finalization transition, zero or more tokens may be finalized, and the unfinalized tokens are set to null. The isTokenFinalized and isUnfinalizedTokensChanged flags are set appropriately.

  • resultCreated: RESULT_CREATED is issued when a new result is created. The event is received by each ResultListener attached to the Recognizer. When a result is created, it is in the UNFINALIZED state. When created the result may have zero or more finalized tokens and zero or more unfinalized tokens. The presence of finalized and unfinalized tokens is indicated by the isTokenFinalized and isUnfinalizedTokensChanged flags.

  • resultRejected: a RESULT_REJECTED event is issued when a result is unsuccessfully finalized and indicates a change from the UNFINALIZED state to the REJECTED state. In the state transition, zero or more tokens may be finalized and the unfinalized tokens are set to null. The isTokenFinalized and isUnfinalizedTokensChanged flags are set appropriately. However, because the result is rejected, the tokens are quite likely to be incorrect. Since the result is finalized (rejected), the methods of Finalresult can be used.

  • resultUpdated: a RESULT_UPDATED event has occurred because a token has been finalized and/or the unfinalized text of a result has changed. The event is issued to each ResultListener attached to the Recognizer, to each ResultListener attached to the result, and if the GRAMMAR_FINALIZED event has already been released, to each ResultListener attached to the matched grammar.

  • trainingInfoReleased: a TRAINING_INFO_RELEASED event is issued when the training information for a finalized result is released. The release may have been requested by an application call to the releaseTrainingInfo method in the FinalResult interface or may be initiated by the recognizer to reclaim memory.

Detailed information of result objects are presented in the next section.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Reference Links are Dead

Gobinath's picture

hi there
this is a very nice article. i really liked it and i planned to implement this in my project but unfortunately all the references links are dead.

kindly help me out . u can reach me at gobicse@gmail.com....

any help will be of great help..

thanks

Reference Links are Dead

Gobinath's picture

hi there
this is a very nice article. i really liked it and i planned to implement this in my project but unfortunately all the references links are dead.

kindly help me out . u can reach me at gobicse@gmail.com....

any help will be of great help..

thanks

This is really useful. I

siva's picture

This is really useful. I successfully completed my project voice controlled wheel chair with the help of this concept... It works well....

Re: Java Speech Development Kit: A Tutorial

Anonymous's picture

Is there lib to use text to speak in portuguese language ?

Re: Java Speech Development Kit: A Tutorial

Anonymous's picture

i have read your tutorial about the Java Speech Development Kit, it is truely very interesting, i would realy love to develop a programe of my own on this context. please help me im a graduate from the University of Botswana in Computer Science in Botswana, i realy ineterested in speech program but i dont know where to start and what i need. more especialy that i dont have any of the classies that i can use for statup training plz give me an advice.

my Email is matikitim2@yahoo.cu.com

thank you in advace

Moathodi Excellent Matikiti

Re: Java Speech Development Kit: A Tutorial

Anonymous's picture

I am in the same situation as Moathodi, were I would like to develope a program on this subject. I would really appreciate it if you could aid me in kicking it of as I am not sure were to start.

My email is adil_rehman@hotmail.com

Many Thanks

Adil

Speech recognition engines

Mario's picture

I have done a lot of research on the speech development architecture of java..but i am more interested in some package that might make the development of a speech to text application faster. Thanx...hope i get a reply...my emaill is mario_ramotar@yahoo.com

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState