Usefull for artificial intelligence Part-3








Artificial Intelligence

 Recognizer State Systems

                                                 
3.1     Inherited States--

As mentioned above, a Recognizer inherits the basic state systems defined in the javax.speech package, particularly through the Engine interface. The basic engine state systems are described in Section 4.4. In this section the two state systems added for recognizers are described. These two states systems represent the status of recognition processing of audio input against grammars, and the recognizer focus.

As a summary, the following state system functionality is inherited from the javax.speech package.

The basic engine state system represents the current allocation state of the engine: whether resources have been obtained for the engine. The four allocation states are ALLOCATED, DEALLOCATED, ALLOCATING_RESOURCES and DEALLOCATING_RESOURCES.
The PAUSED and RESUMED states are sub-states of the ALLOCATED state. The paused and resumed states of a recognizer indicate whether audio input is on or off. Pausing a recognizer is analogous to turning off the input microphone: input audio is lost. Section 4.4.7 describes the effect of pausing and resuming a recognizer in more detail.
The getEngineState method of the Engine interface returns a long value representing the current engine state. The value has a bit set for each of the current states of the recognizer. For example and ALLOCATED recognizer in the RESUMED state will have both the ALLOCATED and RESUMED bits set.
The testEngineState and waitEngineState methods are convenience methods for monitoring engine state. The test method tests for presence in a specified state. The wait method blocks until a specific state is reached.
An EngineEvent is issued to EngineListeners each time an engine changes state. The event class includes the new and old engine states.
The recognizer adds two sub-state systems to the ALLOCATED state: that's in addition to the inherited pause and resume sub-state system. The two new sub- state systems represent the current activities of the recognizer's internal processing (the LISTENING, PROCESSING and SUSPENDED states) and the current recognizer focus (the FOCUS_ON and FOCUS_OFF states).

These new sub-state systems are parallel states to the PAUSED and RESUMED states and operate nearly independently as shown in Figure 6-1 (an extension of Figure 4-2).

3.2     Recognizer Focus

The FOCUS_ON and FOCUS_OFF states indicate whether this instance of the Recognizer currently has the speech focus. Recognizer focus is a major determining factor in grammar activation, which, in turn, determines what the recognizer is listening for at any time. The role of recognizer focus in activation and deactivation of grammars is described in Section 6.4.3.

A change in engine focus is indicated by a RecognizerEvent (which extends EngineEvent) being issued to RecognizerListeners. A FOCUS_LOST event indicates a change in state from FOCUS_ON to FOCUS_OFF. A FOCUS_GAINED event indicates a change in state from FOCUS_OFF to FOCUS_ON.

When a Recognizer has focus, the FOCUS_ON bit is set in the engine state. When a Recognizer does not have focus, the FOCUS_OFF bit is set. The following code examples monitor engine state:


Recognizer rec;

if (rec.testEngineState(Recognizer.FOCUS_ON)) {
// we have focus so release it
rec.releaseFocus();
}
// wait until we lose it
rec.waitEngineState(Recognizer.FOCUS_OFF);

Recognizer focus is relevant to computing environments in which more than one application is using an underlying recognition. For example, in a desktop environment a user might be running a single speech recognition product (the underlying engine), but have multiple applications using the speech recognizer as a resource. These applications may be a mixture of Java and non-Java applications. Focus is not usually relevant in a telephony environment or in other speech application contexts in which there is only a single application processing the audio input stream.

The recognizer's focus should track the application to which the user is currently talking. When a user indicates that it wants to talk to an application (e.g., by selecting the application window, or explicitly saying "switch to application X"), the application requests speech focus by calling the requestFocus method of the Recognizer.

When speech focus is no longer required (e.g., the application has been iconized) it should call releaseFocus method to free up focus for other applications.

Both methods are asynchronous Þ- the methods may return before the focus is gained or lost - since focus change may be deferred. For example, if a recognizer is in the middle of recognizing some speech, it will typically defer the focus change until the result is completed. The focus events and the engine state monitoring methods can be used to determine when focus is actually gained or lost.

The focus policy is determined by the underlying recognition engine - it is not prescribed by the java.speech.recognition package. In most operating environments it is reasonable to assume a policy in which the last application to request focus gets the focus.

Well-behaved applications adhere to the following convention to maximize recognition performance, to minimize their impact upon other applications and to maintain a satisfactory user interface experience. An application should only request focus when it is confident that the user's speech focus (attention) is directed towards it, and it should release focus when it is not required.

3.3     Recognition States

The most important (and most complex) state system of a recognizer represents the current recognition activity of the recognizer. An ALLOCATED Recognizer is always in one of the following three states:

LISTENING state: The Recognizer is listening to incoming audio for speech that may match an active grammar but has not detected speech yet. A recognizer remains in this state while listening to silence and when audio input runs out because the engine is paused.
PROCESSING state: The Recognizer is processing incoming speech that may match an active grammar. While in this state, the recognizer is producing a result.
SUSPENDED state: The Recognizer is temporarily suspended while grammars are updated. While suspended, audio input is buffered for processing once the recognizer returns to the LISTENING and PROCESSING states.
This sub-state system is shown in Figure 6-1. The typical state cycle of a recognizer is triggered by user speech. The recognizer starts in the LISTENING state, moves to the PROCESSING state while a user speaks, moves to the SUSPENDED state once recognition of that speech is completed and while grammars are updates in response to user input, and finally returns to the LISTENING state.

In this first event cycle a Result is typically produced that represents what the recognizer heard. Each Result has a state system and the Result state system is closely coupled to this Recognizer state system. The Result state system is discussed in Section 6.7. Many applications (including the "Hello World!" example) do not care about the recognition state but do care about the simpler Result state system.

The other typical event cycle also starts in the LISTENING state. Upon receipt of a non-speech event (e.g., keyboard event, mouse click, timer event) the recognizer is suspended temporarily while grammars are updated in response to the event, and then the recognizer returns to listening.

Applications in which grammars are affected by more than speech events need to be aware of the recognition state system.

The following sections explain these event cycles in more detail and discuss why speech input events are different in some respects from other event types.

3.3.1     Speech Events vs. Other Events

A keyboard event, a mouse event, a timer event, a socket event are all instantaneous in time - there is a defined instant at which they occur. The same is not true of speech for two reasons.

Firstly, speech is a temporal activity. Speaking a sentence takes time. For example, a short command such as "reload this web page" will take a second or two to speak, thus, it is not instantaneous. At the start of the speech the recognizer changes state, and as soon as possible after the end of the speech the recognizer produces a result containing the spoken words.

Secondly, recognizers cannot always recognize words immediately when they are spoken and cannot determine immediately when a user has stopped speaking. The reasons for these technical constraints upon recognition are outside the scope of this guide, but knowing about them is helpful in using a recognizer. (Incidentally, the same principals are generally true of human perception of speech.)

A simple example of why recognizers cannot always respond might be listening to a currency amount. If the user says "two dollars" or says "two dollars, fifty seconds" with a short pause after the word "dollars" the recognizer can't know immediately whether the user has finished speaking after the "dollars". What a recognizer must do is wait a short period - usually less than a second Þ- to see if the user continues speaking. A second is a long time for a computer and complications can arise if the user clicks a mouse or does something else in that waiting period. (Section 6.8 explains the time-out parameters that affect this delay.)

A further complication is introduced by the input audio buffering described in Section 6.3.

Putting all this together, there is a requirement for the recognizers to explicitly represent internal state through the LISTENING, PROCESSING and SUSPENDED states.

3.3.2     Speech Input Event Cycle

The typical recognition state cycle for a Recognizer occurs as speech input occurs. Technically speaking, this cycle represents the recognition of a single Result. The result state system and result events are described in detail in Section 6.7. The cycle described here is a clockwise trip through the LISTENING, PROCESSING and SUSPENDED states of an ALLOCATED recognizer.

The Recognizer starts in the LISTENING state with a certain set of grammars enabled and active. When incoming audio is detected that may match an active grammar, the Recognizer transitions from the LISTENING state to the PROCESSING state with a RECOGNIZER_PROCESSING event.

The Recognizer then creates a new Result object and issues a RESULT_CREATED event (a ResultEvent) to provide the result to the application. At this point the result is usually empty: it does not contain any recognized words. As recognition proceeds words are added to the result along with other useful information.

The Recognizer remains in the PROCESSING state until it completes recognition of the result. While in the PROCESSING state the Result may be updated with new information.

The recognizer indicates completion of recognition by issuing a RECOGNIZER_SUSPENDED event to transition from the PROCESSING state to the SUSPENDED state. Once in that state, the recognizer issues a result finalization event to ResultListeners (RESULT_ACCEPTED or RESULT_REJECTED event) to indicate that all information about the result is finalized (words, grammars, audio etc.).

The Recognizer remains in the SUSPENDED state until processing of the result finalization event is completed. Applications will often make grammar changes during the result finalization because the result causes a change in application state or context.

In the SUSPENDED state the Recognizer buffers incoming audio. This buffering allows a user to continue speaking without speech data being lost. Once the Recognizer returns to the LISTENING state the buffered audio is processed to give the user the perception of real-time processing.

Once the result finalization event has been issued to all listeners, the Recognizer automatically commits all grammar changes and issues a CHANGES_COMMITTED event to return to the LISTENING state. (It also issues GRAMMAR_CHANGES_COMMITTED events to GrammarListeners of changed grammars.) The commit applies all grammar changes made at any point up to the end of result finalization, such as changes made in the result finalization events.

The Recognizer is now back in the LISTENING state listening for speech that matches the new grammars.

In this event cycle the first two recognizer state transitions (marked by RECOGNIZER_PROCESSING and RECOGNIZER_SUSPENDED events) are triggered by user actions: starting and stopping speaking. The third state transition (CHANGES_COMMITTED event) is triggered programmatically some time after the RECOGNIZER_SUSPENDED event.

The SUSPENDED state serves as a temporary state in which recognizer configuration can be updated without loosing audio data.

3.3.3     Non-Speech Event Cycle

For applications that deal only with spoken input the state cycle described above handles most normal speech interactions. For applications that handle other asynchronous input, additional state transitions are possible. Other types of asynchronous input include graphical user interface events (e.g., AWTEvent), timer events, multi-threading events, socket events and so on.

The cycle described here is temporary transition from the LISTENING state to the SUSPENDED.

When a non-speech event occurs which changes the application state or application data it may be necessary to update the recognizer's grammars. The suspend and commitChanges methods of a Recognizer are used to handle non- speech asynchronous events. The typical cycle for updating grammars in response to a non-speech asynchronous events is as follows.

Assume that the Recognizer is in the LISTENING state (the user is not currently speaking). As soon as the event is received, the application calls suspend to indicate that it is about to change grammars. In response, the recognizer issues a RECOGNIZER_SUSPENDED event and transitions from the LISTENING state to the SUSPENDED state.

With the Recognizer in the SUSPENDED state, the application makes all necessary changes to the grammars. (The grammar changes affected by this event cycle and the pending commit are described in Section 4.2.)

Once all grammar changes are completed the application calls the commitChanges method. In response, the recognizer applies the new grammars and issues a CHANGES_COMMITTED event to transition from the SUSPENDED state back to the LISTENING state. (It also issues GRAMMAR_CHANGES_COMMITTED events to all changed grammars.)

Finally, the Recognizer resumes recognition of the buffered audio and then live audio with the new grammars.

The suspend and commit process is designed to provide a number of features to application developers which help give users the perception of a responsive recognition system.

Because audio is buffered from the time of the asynchronous event to the time at which the CHANGES_COMMITTED occurs, the audio is processed as if the new grammars were applied exactly at the time of the asynchronous event. The user has the perception of real-time processing.

Although audio is buffered in the SUSPENDED state, applications should make grammar changes and call commitChanges as quickly as possible. This minimizes the amount of data in the audio buffer and hence the amount of time it takes for the recognizer to "catch up". It also minimizes the possibility of a buffer overrun.

Technically speaking, an application is not required to call suspend prior to calling commitChanges. If the suspend call is committed the Recognizer behaves as if suspend had been called immediately prior to calling commitChanges. However, an application that does not call suspend risks a commit occurring unexpectedly while it updates grammars with the effect of leaving grammars in an inconsistent state.

3.4     Interactions of State Systems

The three sub-state systems of an allocated recognizer normally operate independently. There are, however, some indirect interactions.

When a recognizer is paused, audio input is stopped. However, recognizers have a buffer between audio input and the internal process that matches audio against grammars, so recognition can continue temporarily after a recognizer is paused. In other words, a PAUSED recognizer may be in the PROCESSING state.

Eventually the audio buffer will empty. If the recognizer is in the PROCESSING state at that time then the result it is working on is immediately finalized and the recognizer transitions to the SUSPENDED state. Since a well-behaved application treats SUSPENDED state as a temporary state, the recognizer will eventually leave the SUSPENDED state by committing grammar changes and will return to the LISTENING state.

The PAUSED/RESUMED state of an engine is shared by multiple applications, so it is possible for a recognizer to be paused and resumed because of the actions of another application. Thus, an application should always leave its grammars in a state that would be appropriate for a RESUMED recognizer.

The focus state of a recognizer is independent of the PAUSED and RESUMED states. For instance, it is possible for a paused Recognizer to have FOCUS_ON. When the recognizer is resumed, it will have the focus and its grammars will be activated for recognition.

The focus state of a recognizer is very loosely coupled with the recognition state. An application that has no GLOBAL grammars (described in Section 4.3) will not receive any recognition results unless it has recognition focus.

Comments

Popular Posts