Usefull for artificial intelligence Part-4

Artificial Intelligence




4.     Recognition Grammars      


A grammar defines what a recognizer should listen for in incoming speech. Any grammar defines the set of tokens a user can say (a token is typically a single word) and the patterns in which those words are spoken.

The Java Speech API supports two types of grammars: rule grammars and dictation grammars. These grammars differ in how patterns of words are defined. They also differ in their programmatic use: a rule grammar is defined by an application, whereas a dictation grammar is defined by a recognizer and is built into the recognizer.

A rule grammar is provided by an application to a recognizer to define a set of rules that indicates what a user may say. Rules are defined by tokens, by references to other rules and by logical combinations of tokens and rule references. Rule grammars can be defined to capture a wide range of spoken input from users by the progressive combination of simple grammars and rules.

A dictation grammar is built into a recognizer. It defines a set of words (possibly tens of thousands of words) which may be spoken in a relatively unrestricted way. Dictation grammars are closest to the goal of unrestricted natural speech input to computers. Although dictation grammars are more flexible than rule grammars, recognition of rule grammars is typically faster and more accurate.

Support for a dictation Þgrammar is optional for a recognizer. As Section 4.2 explains, an application that requires dictation functionality can request it when creating a recognizer.

A recognizer may have many rule grammars loaded at any time. However, the current Recognizer interface restricts a recognizer to a single dictation grammar. The technical reasons for this restriction are outside the scope of this guide.

4.1     Grammar Interface

The Grammar interface is the root interface that is extended by all grammars. The grammar functionality that is shared by all grammars is presented through this interface.

The RuleGrammar interface is an extension of the Grammar interface to support rule grammars. The DictationGrammar interface is an extension of the Grammar interface to support dictation grammars.

The following are the capabilities presented by the grammar interface:

Grammar naming: Every grammar loaded into a recognizer must have a unique name. The getName method returns that name. Grammar names allow references to be made between grammars. The grammar naming convention is described in the Java Speech Grammar Format Specification Briefly, the grammar naming convention is very similar to the class naming convention for the Java programming language. For example, a grammar from Acme Corp. for dates might be called "com.acme.speech.dates".
Enabling and disabling: Grammars may be enabled or disabled using the setEnabled method. When a grammar is enabled and when specified activation conditions are met, the grammar is activated. Once a grammar is active a recognizer will listen to incoming audio for speech that matches that grammar. Enabling and activation are described in more detail below (Section 4.3).
Activation mode: This is the property of a grammar that determines which conditions need to be met for a grammar to be activated. The activation mode is managed through the getActivationMode and setActivationMode methods (described in Section 4.3). The three available activation modes are defined as constants of the Grammar interface: RECOGNIZER_FOCUS, RECOGNIZER_MODAL and GLOBAL.
Activation: the isActive method returns a boolean value that indicates whether a Grammar is currently active for recognition.
GrammarListener: the addGrammarListener and removeGrammarListener methods allow a GrammarListener to be attached to and removed from a Grammar. The GrammarEvents issued to the listener indicate when grammar changes have been committed and whenever the grammar activation state changes.
ResultListener: the addResultListener and removeResultListener methods allow a ResultListener to be attached to and removed from a Grammar. This listener receives notification of all events for any result that matches the grammar.
Recognizer: the getRecognizer method returns a reference to the Recognizer that owns the Grammar.

4.2     Committing Changes

The Java Speech API supports dynamic grammars; that is, it supports the ability for an application to modify grammars at runtime. In the case of rule grammars any aspect of any grammar can be changed at any time.

After making any change to a grammar through the Grammar, RuleGrammar or DictationGrammar interfaces an application must commit the changes. This applies to changes in definitions of rules in a RuleGrammar, to changing context for a DictationGrammar, to changing the enabled state, or to changing the activation mode. (It does not apply to adding or removing a GrammarListener or ResultListener.)

Changes are committed by calling the commitChanges method of the Recognizer. The commit is required for changes to affect the recognition process: that is, the processing of incoming audio.

The commit changes mechanism has two important properties:

Updates to grammar definitions and the enabled property take effect atomically (all changes take effect at once). There are no intermediate states in which some, but not all, changes have been applied.
The commitChanges method is a method of Recognizer so all changes to all grammars are committed at once. Again, there are no intermediate states in which some, but not all, changes have been applied.
There is one instance in which changes are committed without an explicit call to the commitChanges method. Whenever a recognition result is finalized (completed), an event is issued to ResultListeners (it is either a RESULT_ACCEPTED or RESULT_REJECTED event). Once processing of that event is completed changes are normally committed. This supports the common situation in which changes are often made to grammars in response to something a user says.

The event-driven commit is closely linked to the underlying state system of a Recognizer. The state system for recognizers is described in detail in Section 3.

4.3     Grammar Activation

A grammar is active when the recognizer is matching incoming audio against that grammar to determine whether the user is saying anything that matches that grammar. When a grammar is inactive it is not being used in the recognition process.

Applications to do not directly activate and deactivate grammars. Instead they provided methods for (1) enabling and disabling a grammar, (2) setting the activation mode for each grammar, and (3) requesting and releasing the speech focus of a recognizer (as described in Section 3.2.)

The enabled state of a grammar is set with the setEnabled method and tested with the isEnabled method. For programmers familiar with AWT or Swing, enabling a speech grammar is similar to enabling a graphical component.

Once enabled, certain conditions must be met for a grammar to be activated. The activation mode indicates when an application wants the grammar to be active. There are three activation modes: RECOGNIZER_FOCUS, RECOGNIZER_MODAL and GLOBAL. For each mode a certain set of activation conditions must be met for the grammar to be activated for recognition. The activation mode is managed with the setActivationMode and getActivationMode methods.

The enabled flag and the activation mode are both parameters of a grammar that need to be committed to take effect. As Section 4.2 described, changes need to be committed to affect the recognition processes.

Recognizer focus is a major determining factor in grammar activation and is relevant in computing environments in which more than one application is using an underlying recognition (e.g., desktop computing with multiple speech-enabled applications). Section 3.2 describes how applications can request and release focus and monitor focus through RecognizerEvents and the engine state methods.

Recognizer focus is used to turn on and off activation of grammars. The roll of focus depends upon the activation mode. The three activation modes are described here in order from highest priority to lowest. An application should always use the lowest priority mode that is appropriate to its user interface functionality.

GLOBAL activation mode: if enabled, the Grammar is always active irrespective of whether the Recognizer of this application has focus.
RECOGNIZER_MODAL activation mode: if enabled, the Grammar is always active when the application's Recognizer has focus. Furthermore, enabling a modal grammar deactivates any grammars in the same Recognizer with the RECOGNIZER_FOCUS activation mode. (The term "modal" is analogous to "modal dialog boxes" in graphical programming.)
RECOGNIZER_FOCUS activation mode (default mode): if enabled, the Grammar is active when the Recognizer of this application has focus. The exception is that if any other grammar of this application is enabled with RECOGNIZER_MODAL activation mode, then this grammar is not activated.
The current activation state of a grammar can be tested with the isActive method. Whenever a grammar's activation changes either a GRAMMAR_ACTIVATED or GRAMMAR_DEACTIVATED event is issued to each attached GrammarListener. A grammar activation event typically follows a RecognizerEvent that indicates a change in focus (FOCUS_GAINED or FOCUS_LOST), or a CHANGES_COMMMITTED RecognizerEvent that indicates that a change in the enabled setting of a grammar has been applied to the recognition process.

An application may have zero, one or many grammars enabled at any time. Thus, an application may have zero, one or many grammars active at any time. As the conventions below indicate, well-behaved applications always minimize the number of active grammars.

The activation and deactivation of grammars is independent of PAUSED and RESUMED states of the Recognizer. For instance, a grammar can be active even when a recognizer is PAUSED. However, when a Recognizer is paused, audio input to the Recognizer is turned off, so speech won't be detected. This is useful, however, because when the recognizer is resumed, recognition against the active grammars immediately (and automatically) resumes.

Activating too many grammars and, in particular, activating multiple complex grammars has an adverse impact upon a recognizer's performance. In general terms, increasing the number of active grammars and increasing the complexity of those grammars can both lead to slower recognition response time, greater CPU load and reduced recognition accuracy (i.e., more mistakes).

Well-behaved applications adhere to the following conventions to maximize recognition performance and minimize their impact upon other applications:

Never apply the GLOBAL activation mode to a Dictation Grammar (most recognizers will throw an exception if this is attempted).
Always use the default activation mode RECOGNIZER_FOCUS unless there is a good reason to use another mode.
Only use the RECOGNIZER_MODAL when it is certain that deactivating the RECOGNIZER_FOCUS grammars will not adversely affect the user interface.
Minimize the complexity and the number of RuleGrammars with GLOBAL activation mode. As a general rule, one very simple GLOBAL rule grammar should be sufficient for nearly all applications.
Only enable a grammar when it is appropriate for a user to say something matching that grammar. Otherwise disable the grammar to improve recognition response time and recognition accuracy for other grammars.
Only request focus when confident that the user's speech focus (attention) is directed to grammars of your application. Release focus when it is not required.

Comments

Popular Posts