Usefull for artificial intelligence Part-7,A I, Artificial Intelligence
Artificial Intelligence
7 Recognition Results
A recognition result is provided by a Recognizer to an application when the recognizer "hears" incoming speech that matches an active grammar. The result tells the application what words the user said and provides a range of other useful information, including alternative guesses and audio data.
In this section, both the basic and advanced capabilities of the result system in the Java Speech API are described. The sections relevant to basic rule grammar-based applications are those that cover result finalization (Section 7.1), the hierarchy of result interfaces (Section 7.2), the data provided through those interfaces (Section 7.3), and common techniques for handling finalized rule results (Section 7.9).
For dictation applications the relevant sections include those listed above plus the sections covering token finalization (Section 7.8), handling of finalized dictation results (Section 7.10) and result correction and training (Section 7.12).
For more advanced applications relevant sections might include the result life cycle (Section 7.4), attachment of ResultListeners (Section 7.5), the relationship of recognizer and result states (Section 7.6), grammar finalization (Section 7.7), result audio (Section 7.11), rejected results (Section 7.13), result timing (Section 7.14), and the loading and storing of vendor formatted results (Section 7.15).
7.1 Result Finalization
The "Hello World!" example illustrates the simplest way to handle results. In that example, a RuleGrammar was loaded, committed and enabled, and a ResultListener was attached to a Recognizer to receive events associated with every result that matched that grammar. In other words, the ResultListener was attached to receive information about words spoken by a user that is heard by the recognizer.
The following is a modified extract of the "Hello World!" example to illustrate the basics of handling results. In this case, a ResultListener is attached to a Grammar (instead of a Recognizer) and it prints out every thing the recognizer hears that matches that grammar. (There are, in fact, three ways in which a ResultListener can be attached: see Section 7.5.)
import javax.speech.*;
import javax.speech.recognition.*;
public class MyResultListener extends ResultAdapter {
// Receives RESULT_ACCEPTED event: print it
public void resultAccepted(ResultEvent e) {
Result r = (Result)(e.getSource());
ResultToken tokens[] = r.getBestTokens();
for (int i = 0; i < tokens.length; i++)
System.out.print(tokens[i].getSpokenText() + " ");
System.out.println();
}
// somewhere in app, add a ResultListener to a grammar
{
RuleGrammar gram = ...;
gram.addResultListener(new MyResultListener());
}
}
The code shows the MyResultListener class which is as an extension of the ResultAdapter class. The ResultAdapter class is a convenience implementation of the ResultListener interface (provided in the javax.speech.recognition package). When extending the ResultAdapter class we simply implement the methods for the events that we care about.
In this case, the RESULT_ACCEPTED event is handled. This event is issued to the resultAccepted method of the ResultListener and is issued when a result is finalized. Finalization of a result occurs after a recognizer completed processing of a result. More specifically, finalization occurs when all information about a result has been produced by the recognizer and when the recognizer can guarantee that the information will not change. (Result finalization should not be confused with object finalization in the Java programming language in which objects are cleaned up before garbage collection.)
There are actually two ways to finalize a result which are signalled by the RESULT_ACCEPTED and RESULT_REJECTED events. A result is accepted when a recognizer is confidently that it has correctly heard the words spoken by a user (i.e., the tokens in the Result exactly represent what a user said).
Rejection occurs when a Recognizer is not confident that it has correctly recognized a result: that is, the tokens and other information in the result do not necessarily match what a user said. Many applications will ignore the RESULT_REJECTED event and most will ignore the detail of a result when it is rejected. In some applications, a RESULT_REJECTED event is used simply to provide users with feedback that something was heard but no action was taken, for example, by displaying "???" or sounding an error beep. Rejected results and the differences between accepted and rejected results are described in more detail in Section 7.13 .
An accepted result is not necessarily a correct result. As is pointed out in Section 2.2.3, recognizers make errors when recognizing speech for a range of reasons. The implication is that even for an accepted result, application developers should consider the potential impact of a misrecognition. Where a misrecognition could cause an action with serious consequences or could make changes that can't be undone (e.g., "delete all files"), the application should check with users before performing the action. As recognition systems continue to improve the number of errors is steadily decreasing, but as with human speech recognition there will always be a chance of a misunderstanding.
7.2 Result Interface Hierarchy
A finalized result can include a considerable amount of information. This information is provided through four separate interfaces and through the implementation of these interfaces by a recognition system.
// Result: the root result interface
interface Result;
// FinalResult: info on all finalized results
interface FinalResult extends Result;
// FinalRuleResult: a finalized result matching a RuleGrammar
interface FinalRuleResult extends FinalResult;
// FinalDictationResult: a final result for a DictationGrammar
interface FinalDictationResult extends FinalResult;
// A result implementation provided by a Recognizer
public class EngineResult
implements FinalRuleResult, FinalDictationResult;
At first sight, the result interfaces may seem complex. The reasons for providing several interfaces are as follows:
The information available for a result is different in different states of the result. Before finalization, a limited amount of information is available through the Result interface. Once a result is finalized (accepted or rejected), more detailed information is available through the FinalResult interface and either the FinalRuleResult or FinalDictationResult interface.
The type of information available for a finalized result is different for a result that matches a RuleGrammar than for a result that matches a DictationGrammar. The differences are explicitly represented by having separate interfaces for FinalRuleResult and FinalDictationResult.
Once a result object is created as a specific Java class it cannot change be changed to another class. Therefore, because a result object must eventually support the final interface it must implement them when first created. Therefore, every result implements all three final interfaces when it is first created: FinalResult, FinalRuleResult and FinalDictationResult.
When a result is first created a recognizer does not always know whether it will eventually match a RuleGrammar or a DictationGrammar. Therefore, every result object implements both the FinalRuleResult and FinalDictationResult interfaces.
A call made to any method of any of the final interfaces before a result is finalized causes a ResultStateException.
A call made to any method of the FinalRuleResult interface for a result that matches a DictationGrammar causes a ResultStateException. Similarly, a call made to any method of the FinalDictationResult interface for a result that matches a RuleGrammar causes a ResultStateException.
All the result functionality is provided by interfaces in the java.speech.recognition package rather than by classes. This is because the Java Speech API can support multiple recognizers from multiple vendors and interfaces allow the vendors greater flexibility in implementing results.
The multitude of interfaces is, in fact, designed to simplify application programming and to minimize the chance of introducing bugs into code by allowing compile-time checking of result calls. The two basic principles for calling the result interfaces are the following:
If it is safe to call the methods of a particular interface then it is safe to call the methods of any of the parent interfaces. For example, for a finalized result matching a RuleGrammar, the methods of the FinalRuleResult interface are safe, so the methods of the FinalResult and Result interfaces are also safe. Similarly, for a finalized result matching a DictationGrammar, the methods of FinalDictationResult, FinalResult and Result can all be called safely.
Use type casting of a result object to ensure compile-time checks of method calls. For example, in events to an unfinalized result, cast the result object to the Result interface. For a RESULT_ACCEPTED finalization event with a result that matches a DictationGrammar, cast the result to the FinalDictationResult interface.
In the next section the different information available through the different interfaces is described. In all the following sections that deal with result states and result events, details are provided on the appropriate casting of result objects.
7.3 Result Information
As the previous section describes, different information is available for a result depending upon the state of the result and, for finalized results, depending upon the type of grammar it matches (RuleGrammar or DictationGrammar).
7.3.1 Result Interface
The information available through the Result interface is available for any result in any state - finalized or unfinalized - and matching any grammar.
Result state: The getResultState method returns the current state of the result. The three possible state values defined by static values of the Result interface are UNFINALIZED, ACCEPTED and REJECTED. (Result states are described in more detail in Section 6.7.4.)
Grammar: The getGrammar method returns a reference to the matched Grammar, if it is known. For an ACCEPTED result, this method will return a RuleGrammar or a DictationGrammar. For a REJECTED result, this method may return a grammar, or may return null if the recognizer could not identify the grammar for this result. In the UNFINALIZED state, this method returns null before a GRAMMAR_FINALIZED event, and non-null afterwards.
Number of finalized tokens: The numTokens method returns the total number of finalized tokens for a result. For an unfinalized result this may be zero or greater. For a finalized result this number is always greater than zero for an ACCEPTED result but may be zero or more for a REJECTED result. Once a result is finalized this number will not change.
Finalized tokens: The getBestToken and getBestTokens methods return either a specified finalized best-guess token of a result or all the finalized best-guess tokens. The ResultToken object and token finalization are described in the following sections.
Unfinalized tokens: In the UNFINALIZED state, the getUnfinalizedTokens method returns a list of unfinalized tokens. An unfinalized token is a recognizer's current guess of what a user has said, but the recognizer may choose to change these tokens at any time and any way. For a finalized result, the getUnfinalizedTokens method always returns null.
In addition to the information detailed above, the Result interface provides the addResultListener and removeResultListener methods which allow a ResultListener to be attached to and removed from an individual result. ResultListener attachment is described in more detail in Section 7.5.
7.3.2 FinalResult Interface
The information available through the FinalResult interface is available for any finalized result, including results that match either a RuleGrammar or DictationGrammar.
Audio data: a Recognizer may optionally provide audio data for a finalized result. This data is provided as AudioClip for a token, a sequence of tokens, or for the entire result. Result audio and its management are described in more detail in Section 7.11.
Training data: many recognizer's have the ability to be trained and corrected. By training a recognizer or correcting its mistakes, a recognizer can adapt its recognition processes so that performance (accuracy and speed) improve over time. Several methods of the FinalResult interface support this capability and are described in detail in Section 7.12.
7.3.3 FinalDictationResult Interface
The FinalDictationResult interface contains a single method.
Alternative tokens: The getAlternativeTokens method allows an application to request a set of alternative guesses for a single token or for a sequence of tokens in that result. In dictation systems, alternative guesses are typically used to facilitate correction of dictated text. Dictation recognizers are designed so that when they do make a misrecognition, the correct word sequence is usually amongst the best few alternative guesses. Section 7.10
7.3.4 FinalRuleResult Interface
Like the FinalDictationResult interface, the FinalRuleResult interface provides alternative guesses. The FinalRuleResult interface also provides some additional information that is useful in processing results that match a RuleGrammar.
Alternative tokens: The getAlternativeTokens method allows an application to request a set of alternative guesses for the entire result (not for tokens). The getNumberGuesses method returns the actual number of alternative guesses available.
Alternative grammars: The alternative guesses of a result matching a RuleGrammar do not all necessarily match the same grammar. The getRuleGrammar method returns a reference to the RuleGrammar matched by an alternative.
Rulenames: When a result matches a RuleGrammar, it matches a specific defined rule of that RuleGrammar. The getRuleName method returns the rulename for the matched rule. Section 7.9 RuleGrammar results.
Tags: A tag is a string attached to a component of a RuleGrammar definition. Tags are useful in simplifying the software for processing results matching a RuleGrammar (explained in Section 7.9). The getTags method returns the tags for the best guess for a FinalRuleResult.
7.4 Result Life Cycle
A Result is produced in response to a user's speech. Unlike keyboard input, mouse input and most other forms of user input, speech is not instantaneous . As a consequence, a speech recognition result is not produced instantaneously. Instead, a Result is produced through a sequence of events starting some time after a user starts speaking and usually finishing some time after the user stops speaking.
Figure 6-2 shows the state system of a Result and the associated ResultEvents. As in the recognizer state diagram (Figure 6-1), the blocks represent states, and the labelled arcs represent transitions that are signalled by ResultEvents.
Every result starts in the UNFINALIZED state when a RESULT_CREATED event is issued. While unfinalized, the recognizer provides information including finalized and unfinalized tokens and the identity of the grammar matched by the result. As this information is added, the RESULT_UPDATED and GRAMMAR_FINALIZED events are issued
Once all information associated with a result is finalized, the entire result is finalized. As Section 7.1 explained, a result is finalized with either a RESULT_ACCEPTED or RESULT_REJECTED event placing it in either the ACCEPTED or REJECTED state. At that point all information associated with the result becomes available including the best guess tokens and the information provided through the three final result interfaces (see Section 7.3).
Once finalized the information available through all the result interfaces is fixed. The only exceptions are for the release of audio data and training data. If audio data is released, an AUDIO_RELEASED event is issued . If training information is released, an TRAINING_INFO_RELEASED event is issued .
Applications can track result states in a number of ways. Most often, applications handle result in ResultListener implementation which receives ResultEvents as recognition proceeds.
As Section 7.3 explains, a recognizer conveys a range of information to an application through the stages of producing a recognition result. However, as the example in Section 7.1 shows, many applications only care about the last step and event in that process - the RESULT_ACCEPTED event.
The state of a result is also available through the getResultState method of the Result interface. That method returns one of the three result states: UNFINALIZED, ACCEPTED or REJECTED.
7.5 ResultListener Attachment
A ResultListener can be attached in one of three places to receive events associated with results: to a Grammar, to a Recognizer or to an individual Result. The different places of attachment give an application some flexibility in how they handle results.
To support ResultListeners the Grammar, Recognizer and Result interfaces all provide the addResultListener and removeResultListener methods.
Depending upon the place of attachment a listener receives events for different results and different subsets of result events.
Grammar: A ResultListener attached to a Grammar receives all ResultEvents for any result that has been finalized to match that grammar. Because the grammar is known once a GRAMMAR_FINALIZED event is produced, a ResultListener attached to a Grammar receives that event and subsequent events. Since grammars are usually defined for specific functionality it is common for most result handling to be done in the methods of listeners attached to each grammar.
Result: A ResultListener attached to a Result receives all ResultEvents starting at the time at which the listener is attached to the Result. Note that because a listener cannot be attached until a result has been created with the RESULT_CREATED event, it can never receive that event.
Recognizer: A ResultListener attached to a Recognizer receives all ResultEvents for all results produced by that Recognizer for all grammars. This form of listener attachment is useful for very simple applications (e.g., "Hello World!") and when centralized processing of results is required. Only ResultListeners attached to a Recognizer receive the RESULT_CREATED event.
7.6 Recognizer and Result States
The state system of a recognizer is tied to the processing of a result. Specifically, the LISTENING, PROCESSING and SUSPENDED state cycle described in Section 3.3 and shown in Figure 6-1 follows the production of an event.
The transition of a Recognizer from the LISTENING state to the PROCESSING state with a RECOGNIZER_PROCESSING event indicates that a recognizer has started to produce a result. The RECOGNIZER_PROCESSING event is followed by the RESULT_CREATED event to ResultListeners.
The RESULT_UPDATED and GRAMMAR_FINALIZED events are issued to ResultListeners while the recognizer is in the PROCESSING state.
As soon as the recognizer completes recognition of a result, it makes a transition from the PROCESSING state to the SUSPENDED state with a RECOGNIZER_SUSPENDED event. Immediately following that recognizer event, the result finalization event (either RESULT_ACCEPTED or RESULT_REJECTED) is issued. While the result finalization event is processed, the recognizer remains suspended. Once result finalization event is completed, the recognizer automatically transitions from the SUSPENDED state back to the LISTENING state with a CHANGES_COMMITTED event. Once back in the LISTENING state the recognizer resumes processing of audio input with the grammar committed with the CHANGES_COMMITTED event.
7.6.1 Updating Grammars
In many applications, grammar definitions and grammar activation need to be updated in response to spoken input from a user. For example, if speech is added to a traditional email application, the command "save this message" might result in a window being opened in which a mail folder can be selected. While that window is open, the grammars that control that window need to be activated. Thus during the event processing for the "save this message" command grammars may need be created, updated and enabled. All this would happen during processing of the RESULT_ACCEPTED event.
For any grammar changes to take effect they must be committed (see Section 6.4.2). Because this form of grammar update is so common while processing the RESULT_ACCEPTED event (and sometimes the RESULT_REJECTED event), recognizers implicitly commit grammar changes after either result finalization event has been processed.
This implicit is indicated by the CHANGES_COMMITTED event that is issued when a Recognizer makes a transition from the SUSPENDED state to the LISTENING state following result finalization and the result finalization event processing .
One desirable effect of this form of commit becomes useful in component systems. If changes in multiple components are triggered by a finalized result event, and if many of those components change grammars, then they do not each need to call the commitChanges method. The downside of multiple calls to the commitChanges method is that a syntax check be performed upon each. Checking syntax can be computationally expensive and so multiple checks are undesirable. With the implicit commit once all components have updated grammars computational costs are reduced.
7.7 Grammar Finalization
At any time during processing a result a GRAMMAR_FINALIZED event can be issued for that result indicating the Grammar matched by the result has been determined. This event is issued is issued only once. It is required for any ACCEPTED result, but is optional for result that is eventually rejected.
As Section 7.5 describes, the GRAMMAR_FINALIZED event is the first event received by a ResultListener attached to a Grammar.
The GRAMMAR_FINALIZED event behaves the same for results that match either a RuleGrammar or a DictationGrammar.
Following the GRAMMAR_FINALIZED event, the getGrammar method of the Result interface returns a non-null reference to the matched grammar. By issuing a GRAMMAR_FINALIZED event the Recognizer guarantees that the Grammar will not change.
Finally, the GRAMMAR_FINALIZED event does not change the result's state. A GRAMMAR_FINALIZED event is issued only when a result is in the UNFINALIZED state, and leaves the result in that state.
7.8 Token Finalization
A result is a dynamic object a it is being recognized. One way in which a result can be dynamic is that tokens are updated and finalized as recognition of speech proceeds. The result events allow a recognizer to inform an application of changes in the either or both the finalized and unfinalized tokens of a result.
The finalized and unfinalized tokens can be updated on any of the following result event types: RESULT_CREATED, RESULT_UPDATED, RESULT_ACCEPTED, RESULT_REJECTED.
Finalized tokens are accessed through the getBestTokens and getBestToken methods of the Result interface. The unfinalized tokens are accessed through the getUnfinalizedTokens method of the Result interface.
A finalized token is a ResultToken in a Result that has been recognized in the incoming speech as matching a grammar. Furthermore, when a recognizer finalizes a token it indicates that it will not change the token at any point in the future. The numTokens method returns the number of finalized tokens.
Many recognizers do not finalize tokens until recognition of an entire result is complete. For these recognizers, the numTokens method returns zero for a result in the UNFINALIZED state.
For recognizers that do finalize tokens while a Result is in the UNFINALIZED state, the following conditions apply:
The Result object may contain zero or more finalized tokens when the RESULT_CREATED event is issued.
The recognizer issues RESULT_UPDATED events to the ResultListener during recognition each time one or more tokens are finalized.
Tokens are finalized strictly in the order in which they are spoken (i.e., left to right in English text).
A result in the UNFINALIZED state may also have unfinalized tokens. An unfinalized token is a token that the recognizer has heard, but which it is not yet ready to finalize. Recognizers are not required to provide unfinalized tokens, and applications can safely choose to ignore unfinalized tokens.
For recognizers that provide unfinalized tokens, the following conditions apply:
The Result object may contain zero or more unfinalized tokens when the RESULT_CREATED event is issued.
The recognizer issues RESULT_UPDATED events to the ResultListener during recognition each time the unfinalized tokens change.
For an unfinalized result, unfinalized tokens may be updated at any time and in any way. Importantly, the number of unfinalized tokens may increase, decrease or return to zero and the values of those tokens may change in any way the recognizer chooses.
Unfinalized tokens always represent a guess for the speech following the finalized tokens.
Unfinalized tokens are highly changeable, so why are they useful? Many applications can provide users with visual feedback of unfinalized tokens - particularly for dictation results. This feedback informs users of the progress of the recognition and helps the user to know that something is happening. However, because these tokens may change and are more likely than finalized tokens to be incorrect, the applications should visually distinguish the unfinalized tokens by using a different font, different color or even a different window.
The following is an example of finalized tokens and unfinalized tokens for the sentence "I come from Australia". The lines indicate the token values after the single RESULT_CREATED event, the multiple RESULT_UPDATED events and the final RESULT_ACCEPTED event. The finalized tokens are in bold, the unfinalized tokens are in italics.
RESULT_CREATED: I come
RESULT_UPDATED: I come from
RESULT_UPDATED: I come from
RESULT_UPDATED: I come from a strange land
RESULT_UPDATED: I come from Australia
RESULT_ACCEPTED: I come from Australia
Recognizers can vary in how they support finalized and unfinalized tokens in a number of ways. For an unfinalized result, a recognizer may provide finalized tokens, unfinalized tokens, both or neither. Furthermore, for a recognizer that does support finalized and unfinalized tokens during recognition, the behavior may depend upon the number of active grammars, upon whether the result is for a RuleGrammar or DictationGrammar, upon the length of spoken sentences, and upon other more complex factors. Fortunately, unless there is a functional requirement to display or otherwise process intermediate result, an application can safely ignore all but the RESULT_ACCEPTED event.
7.9 Finalized Rule Results
The are some common design patterns for processing accepted finalized results that match a RuleGrammar. First we review what we know about these results.
It is safe to cast an accepted result that matches a RuleGrammar to the FinalRuleResult interface. It is safe to call any method of the FinalRuleResult interface or its parents: FinalResult and Result.
The getGrammar method of the Result interface return a reference to the matched RuleGrammar. The getRuleGrammar method of the FinalRuleResult interface returns references to the RuleGrammars matched by the alternative guesses.
The getBestToken and getBestTokens methods of the Result interface return the recognizer's best guess of what a user said.
The getAlternativeTokens method returns alternative guesses for the entire result.
The tags for the best guess are available from the getTags method of the FinalRuleResult interface.
Result audio and training information are optionally available.
7.9.1 Result Tokens
A ResultToken in a result matching a RuleGrammar contains the same information as the RuleToken object in the RuleGrammar definition. This means that the tokenization of the result follows the tokenization of the grammar definition including compound tokens. For example, consider a grammar with the following Java Speech Grammar Format fragment which contains four tokens:
<rule> = I went to "San Francisco";
If the user says "I went to New York" then the result will contain the four tokens defined by JSGF: "I", "went", "to", "San Francisco".
The ResultToken interface defines more advanced information. Amongst that information the getStartTime and getEndTime methods may optionally return time-stamp values (or -1 if the recognizer does not provide time-alignment information).
The ResultToken interface also defines several methods for a recognizer to provide presentation hints. Those hints are ignored for RuleGrammar results Þ- they are only used for dictation results .
Furthermore, the getSpokenText and getWrittenText methods will return an identical string which is equal to the string defined in the matched grammar.
7.9.2 Alternative Guesses
In a FinalRuleResult, alternative guesses are alternatives for the entire result, that is, for a complete utterance spoken by a user. (A FinalDictationResult can provide alternatives for single tokens or sequences of tokens.) Because more than one RuleGrammar can be active at a time, an alternative token sequence may match a rule in a different RuleGrammar than the best guess tokens, or may match a different rule in the same RuleGrammar as the best guess. Thus, when processing alternatives for a FinalRuleResult, an application should use the getRuleGrammar and getRuleName methods to ensure that they analyze the alternatives correctly.
Alternatives are numbered from zero up. The 0th alternative is actually the best guess for the result so FinalRuleResult.getAlternativeTokens(0) returns the same array as Result.getBestTokens(). (The duplication is for programming convenience.) Likewise, the FinalRuleResult.getRuleGrammar(0) call will return the same result as Result.getGrammar().
The following code is an implementation of the ResultListener interface that processes the RESULT_ACCEPTED event. The implementation assumes that a Result being processed matches a RuleGrammar.
class MyRuleResultListener extends ResultAdapter
{
public void resultAccepted(ResultEvent e)
{
// Assume that the result matches a RuleGrammar.
// Cast the result (source of event) appropriately
FinalRuleResult res = (FinalRuleResult) e.getSource();
// Print out basic result information
PrintStream out = System.out;
out.println("Number guesses: " + res.getNumberGuesses());
// Print out the best result and all alternatives
for (int n=0; n < res.getNumberGuesses(); n++) {
// Extract the n-best information
String gname = res.getRuleGrammar(n).getName();
String rname = res.getRuleName(n);
ResultToken[] tokens = res.getAlternativeTokens(n);
out.print("Alt " + n + ": ");
out.print("<" + gname + "." + rname + "> :");
for (int t=0; t < tokens.length; t++)
out.print(" " + tokens[t].getSpokenText());
out.println();
}
}
}
For a grammar with commands to control a windowing system (shown below), a result might look like:
Number guesses: 3
Alt 0: <com.acme.actions.command>: move the window to the back
Alt 1: <com.acme.actions.command>: move window to the back
Alt 2: <com.acme.actions.command>: open window to the front
If more than one grammar or more than one public rule was active, the <grammarName.ruleName> values could vary between the alternatives.
7.9.3 Result Tags
Processing commands generated from a RuleGrammar becomes increasingly difficult as the complexity of the grammar rises. With the Java Speech API, speech recognizers provide two mechanisms to simplify the processing of results: tags and parsing.
A tag is a label attached to an entity within a RuleGrammar. The Java Speech Grammar Format and the RuleTag class define how tags can be attached to a grammar. The following is a grammar for very simple control of windows which includes tags attached to the important words in the grammar.
grammar com.acme.actions;
public <command> = <action> <object> [<where>]
<action> = open {ACT_OP}| close {ACT_CL} | move {ACT_MV};
<object> = [a | an | the] (window {OBJ_WIN} | icon {OBJ_ICON});
<where> = [to the] (back {WH_BACK} | front {WH_FRONT});
This grammar allows users to speak commands such as
open window
move the icon
move the window to the back
move window back
The italicized words are the ones that are tagged in the grammar - these are the words that the application cares about. For example, in the third and fourth example commands, the spoken words are different but the tagged words are identical. Tags allow an application to ignore trivial words such as "the" and "to".
The com.acme.actions grammar can be loaded and enabled using the code in the "Hello World!" example. Since the grammar has a single public rule, <command>, the recognizer will listen for speech matching that rule, such as the example results given above.
The tags for the best result are available through the getTags method of the FinalRuleResult interface. This method returns an array of tags associated with the tokens (words) and other grammar entities matched by the result. If the best sequence of tokens is "move the window to the front", the list of tags is the following String array:
String tags[] = {"ACT_MV", "OBJ_WIN", "WH_FRONT"};
Note how the order of the tags in the result is preserved (forward in time). These tags are easier for most applications to interpret than the original text of what the user said.
Tags can also be used to handle synonyms - multiple ways of saying the same thing. For example, "programmer", "hacker", "application developer" and "computer dude" could all be given the same tag, say "DEV". An application that looks at the "DEV" tag will not care which way the user spoke the title.
Another use of tags is for internationalization of applications. Maintaining applications for multiple languages and locales is easier if the code is insensitive to the language being used. In the same way that the "DEV" tag isolated an application from different ways of saying "programmer", tags can be used to provide an application with similar input irrespective of the language being recognized.
The following is a grammar for French with the same functionality as the grammar for English shown above.
grammar com.acme.actions.fr;
public <command> = <action> <object> [<where>]
<action> = ouvrir {ACT_OP}| fermer {ACT_CL} | deplacer {ACT_MV};
<object> = fenetre {OBJ_WIN} | icone {OBJ_ICON};
<where> = au-dessous {WH_BACK} | au-dessus {WH_FRONT};
For this simple grammar, there are only minor differences in the structure of the grammar (e.g. the "[to the]" tokens in the <where> rule for English are absent in French). However, in more complex grammars the syntactic differences between languages become significant and tags provide a clearer improvement.
Tags do not completely solve internationalization problems. One issue to be considered is word ordering. A simple command like "open the window" can translate to the form "the window open" in some languages. More complex sentences can have more complex transformations. Thus, applications need to be aware of word ordering, and thus tag ordering when developing international applications.
7.9.4 Result Parsing
More advanced applications parse results to get even more information than is available with tags. Parsing is the capability to analyze how a sequence of tokens matches a RuleGrammar. Parsing of text against a RuleGrammar is discussed in Section 5.5 .
Parsing a FinalRuleResult produces a RuleParse object. The getTags method of a RuleParse object provides the same tag information as the getTags method of a FinalRuleResult. However, the FinalRuleResult provides tag information for only the best-guess result, whereas parsing can be applied to the alternative guesses.
An API requirement that simplifies parsing of results that match a RuleGrammar is that for a such result to be ACCEPTED (not rejected) it must exactly match the grammar - technically speaking, it must be possible to parse a FinalRuleResult against the RuleGrammar it matches. This is not guaranteed, however, if the result was rejected or if the RuleGrammar has been modified since it was committed and produced the result.
7.10 Finalized Dictation Results
The are some common design patterns for processing accepted finalized results that match a DictationGrammar. First we review what we know about these results.
It is safe to cast an accepted result that matches a DictationGrammar to the FinalDictationResult interface. It is safe to call any method of the FinalDictationResult interface or its parents: FinalResult and Result.
The getGrammar method of the Result interface return a reference to the matched DictationGrammar.
The getBestToken and getBestTokens methods of the Result interface return the recognizer's best guess of what a user said.
The getAlternativeTokens method of the FinalDictationResult interface returns alternative guesses for any token or sequence of tokens.
Result audio and training information are optionally available.
The ResultTokens provided in a FinalDictationResult contain specialized information that includes hints on textual presentation of tokens. Section 7.10.2 discusses the presentation hints in detail. In this section the methods for obtaining and using alternative tokens are described.
7.10.1 Alternative Guesses
Alternative tokens for a dictation result are most often used by an application for display to users for correction of dictated text. A typical scenario is that a user speaks some text - perhaps a few words, a few sentences, a few paragraphs or more. The user reviews the text and detects a recognition error. This means that the best guess token sequence is incorrect. However, very often the correct text is one of the top alternative guesses. Thus, an application will provide a user the ability to review a set of alternative guesses and to select one of them if it is the correct text. Such a correction mechanism is often more efficient than typing the correction or dictating the text again. If the correct text is not amongst the alternatives an application must support other means of entering the text.
The getAlternativeTokens method is passed a starting and an ending ResultToken. These tokens must have been obtained from the same result either through a call to getBestToken or getBestTokens in the Result interface, or through a previous call to getAlternativeTokens.
ResultToken[][] getAlternativeTokens(
ResultToken fromToken,
ResultToken toToken,
int max);
To obtain alternatives for a single token (rather than alternatives for a sequence), set toToken to null.
The int parameter allows the application to specify the number of alternatives it wants. The recognizer may choose to return any number of alternatives up to the maximum number including just one alternative (the original token sequence). Applications can indicate in advance the number of alternatives it may request by setting the NumResultAlternatives parameter through the recognizer's RecognizerProperties object.
The two-dimensional array returned by the getAlternativeTokens method is the most difficult aspect of dictation alternatives to understand. The following example illustrates the major features of the return value.
Let's consider a dictation example where the user says "he felt alienated today" but the recognizer hears "he felt alien ate Ted today". The user says four words but the recognizer hears six words. In this example, the boundaries of the spoken words and best-guess align nicely: "alienated" aligns with "alien ate Ted" (incorrect tokens don't always align smoothly with the correct tokens).
Users are typically better at locating and fixing recognition errors than recognizers or applications - they provided the original speech. In this example, the user will likely identify the words "alien ate Ted" as incorrect (tokens 2 to 4 in the best-guess result). By an application-provided method such as selection by mouse and a pull-down menu, the user will request alternative guesses for the three incorrect tokens. The application calls the getAlternativeTokens method of the FinalDictationResult to obtain the recognizer's guess at the alternatives.
// Get 6 alternatives for for tokens 2 through 4.
FinalDictationResult r = ...;
ResultToken tok2 = r.getBestToken(2);
ResultToken tok4 = r.getBestToken(4);
String[][] alt = r.getAlternativeTokens(tok2, tok4, 6);
The return array might look like the following. Each line represents a sequence of alternative tokens to "alien ate Ted". Each word in each alternative sequence represents a ResultToken object in an array.
alt[0] = alien ate Ted // the best guess
alt[1] = alienate Ted // the 1st alternative
alt[2] = alienated // the 2nd alternative
alt[3] = alien hated // the 3rd alternative
alt[4] = a lion ate Ted // the 4th alternative
The points to note are:
The first alternative is the best guess. This is usually the case if the toToken and fromToken values are from the best-guess sequence. (From an user perspective it's not really an alternative.)
Only five alternative sequences were returned even though six were requested. This is because a recognizer will only return alternatives it considers to reasonable guesses. It is legal for this call to return only the best guess with no alternatives if can't find any reasonable alternatives.
The number of tokens is not the same in all the alternative sequences (3, 2, 1, 2, 4 tokens respectively). This return array is known as a ragged array. From a speech perspective is easy to see why different lengths are needed, but application developers do need to be careful processing a ragged array.
The best-guess and the alternatives do not always make sense to humans.
A complex issue to understand is that the alternatives vary according to how the application (or user) requests them. The 1st alternative to "alien ate Ted" is "alienate Ted". However, the 1st alternative to "alien" might be "a lion", the 1st alternative to "alien ate" might be "alien eight", and the 1st alternative to "alien ate Ted today" might be "align ate Ted to day".
Fortunately for application developers, users learn to select sequences that are likely to give reasonable alternatives, and recognizers are developed to make the alternatives as useful and accurate as possible.
7.10.2 Result Tokens
A ResultToken object represents a single token in a result. A token is most often a single word, but multi-word tokens are possible (e.g., "New York") as well as formatting characters and language-specific constructs. For a DictationGrammar the set of tokens is built into the recognizer.
Each ResultToken in a FinalDictationResult provides the following information.
The spoken form of the token which provides a transcript of what the user says (getSpokenText method). In a dictation system, the spoken form is typically used when displaying unfinalized tokens.
The written form of the token which indicates how to visually present the token (getWrittenText method). In a dictation system, the written form of finalized tokens is typically placed into the text edit window after applying the following presentation hints.
A capitalization hint indicating whether the written form of the following token should be capitalized (first letter only), all uppercase, all lowercase, or left as-is (getCapitalizationHint method).
An spacing hint indicating how the written form should be spaced with the previous and following tokens.
The presentation hints in a ResultToken are important for the processing of dictation results. Dictation results are typically displayed to the user, so using the written form and the capitalization and spacing hints for formatting is important. For example, when dictation is used in word processing, the user will want the printed text to be correctly formatted.
The capitalization hint indicates how the written form of the following token should be formatted. The capitalization hint takes one of four mutually exclusive values. CAP_FIRST indicates that the first character of the following token should be capitalized. The UPPERCASE and LOWERCASE values indicate that the following token should be either all uppercase or lowercase. CAP_AS_IS indicates that there should be no change in capitalization of the following token.
The spacing hint deals with spacing around a token. It is an int value containing three flags which are or'ed together (using the '|' operator). If none of the three spacing hint flags are set true, then getSpacingHint method returns the value SEPARATE which is the value zero.
The ATTACH_PREVIOUS bit is set if the token should be attached to the previous token: no space between this token and the previous token. In English, some punctuation characters have this flag set true. For example, periods, commas and colons are typically attached to the previous token.
The ATTACH_FOLLOWING bit is set if the token should be attached to the following token: no space between this token and the following token. For example, in English, opening quotes, opening parentheses and dollar signs typically attach to the following token.
The ATTACH_GROUP bit is set if the token should be attached to previous or following tokens if they also have the ATTACH_GROUP flag set to true. In other words, tokens in an attachment group should be attached together. In English, a common use of the group flag is for numbers, digits and currency amounts. For example, the sequence of four spoken-form tokens, "3" "point" "1" "4", should have the group flag set true, so the presentation form should not have separating spaces: "3.14".
Every language has conventions for textual representation of a spoken language. Since recognizers are language-specific and understand many of these presentation conventions, they provide the presentation hints (written form, capitalization hint and spacing hint) to simplify applications. However, applications may choose to override the recognizer's hints or may choose to do additional processing.
Table 6-6 shows examples of tokens in which the spoken and written forms are different:
Table 6-6 Spoken and written forms for some English tokens
Spoken Form Written Form Capitalization Spacing
twenty 20 CAP_AS_IS SEPARATE
new line '\n' '\u000A' CAP_FIRST ATTACH_PREVIOUS & ATTACH_FOLLOWING
new paragraph '\u2029' CAP_FIRST ATTACH_PREVIOUS & ATTACH_FOLLOWING
no space null CAP_AS_IS ATTACH_PREVIOUS & ATTACH_FOLLOWING
Space bar ' ' '\u0020' CAP_AS_IS ATTACH_PREVIOUS & ATTACH_FOLLOWING
Capitalize next null CAP_FIRST SEPARATE
Period '.' '\u002E' CAP_FIRST ATTACH_PREVIOUS
Comma ',' '\u002C' CAP_AS_IS ATTACH_PREVIOUS
Open parentheses '(' '\u0028' CAP_AS_IS ATTACH_FOLLOWING
Exclamation mark '!' '\u0021' CAP_FIRST ATTACH_PREVIOUS
dollar sign '$' '\u0024' CAP_AS_IS ATTACH_FOLLOWING & ATTACH_GROUP
pound sign '£' '\u00A3' CAP_AS_IS ATTACH_FOLLOWING & ATTACH_GROUP
yen sign '¥' '\u00A5' CAP_AS_IS ATTACH_PREVIOUS & ATTACH_GROUP
"New line", "new paragraph", "space bar", "no space" and "capitalize next" are all examples of conversion of an implicit command (e.g. "start a new paragraph"). For three of these, the written form is a single Unicode character. Most programmers are familiar with the new-line character '\n' and space ' ', but fewer are familiar with the Unicode character for new paragraph '\u2029'. For convenience and consistency, the ResultToken includes static variables called NEW_LINE and NEW_PARAGRAPH.
Some applications will treat a paragraph boundary as two new-line characters, others will treat it differently. Each of these commands provides hints for capitalization. For example, in English the first letter of the first word of a new paragraph is typically capitalized.
The punctuation characters, "period", "comma", "open parentheses", "exclamation mark" and the three currency symbols convert to a single Unicode character and have special presentation hints.
An important feature of the written form for most of the examples is that the application does not need to deal with synonyms (multiple ways of saying the same thing). For example, "open parentheses" may also be spoken as "open paren" or "begin paren" but in all cases the same written form is generated.
The following is an example sequence of result tokens.
Table 6-7 Sample sequence of result tokens
Spoken Form Written Form Capitalization Spacing
new line "\n" CAP_FIRST ATTACH_PREVIOUS & ATTACH_FOLLOWING
the "the" CAP_AS_IS SEPARATE
uppercase next null UPPERCASE SEPARATE
index "index" CAP_AS_IS SEPARATE
is "is" CAP_AS_IS SEPARATE
seven "7" CAP_AS_IS ATTACH_GROUP
dash "-" CAP_AS_IS ATTACH_GROUP
two "2" CAP_AS_IS ATTACH_GROUP
period "." CAP_FIRST ATTACH_PREVIOUS
This sequence of tokens should be converted to the following string:
"\nThe INDEX is 7-2."
Conversion of spoken text to a written form is a complex task and is complicated by the different conventions of different languages and often by different conventions for the same language. The spoken form, written form and presentation hints of the ResultToken interface handle most simple conversions. Advanced applications should consider filtering the results to process more complex patterns, particularly cross-token patterns. For example "nineteen twenty eight" is typically converted to "1928" and "twenty eight dollars" to "$28" (note the movement of the dollar sign to before the numbers).
7.11 Result Audio
If requested by an application, some recognizers can provide audio data for results. Audio data has a number of uses. In dictation applications, providing audio feedback to users aids correction of text because the audio reminds users of what they said (it's not always easy to remember exactly what you dictate, especially in long sessions). Audio data also allows storage for future evaluation and debugging.
Audio data is provided for finalized results through the following methods of the FinalResult interface.
Table 6-8 FinalResult interface: audio methods
Name Description
getAudio Get an AudioClip for a token, a sequence of tokens or for an entire result.
isAudioAvailable Tests whether audio data is available for a result.
releaseAudio Release audio data for a result.
There are two getAudio methods in the FinalResult interface. One method accepts no parameters and returns an AudioClip for an entire result or null if audio data is not available for this result. The other getAudio method takes a start and end ResultToken as input and returns an AudioClip for the segment of the result including the start and end token or null if audio data is not available.
In both forms of the getAudio method, the recognizer will attempt to return the specified audio data. However, it is not always possible to exactly determine the start and end of words or even complete results. Sometimes segments are "clipped" and sometimes surrounding audio is included in the AudioClip.
Not all recognizers provide access to audio for results. For recognizers that do provide audio data, it is not necessarily provided for all results. For example, a recognizer might only provide audio data for dictation results. Thus, applications should always check for a null return value on a getAudio call.
The storage of audio data for results potentially requires large amounts of memory, particularly for long sessions. Thus, result audio requires special management. An application that wishes to use result audio should:
Set the ResultAudioProvided parameter of RecognizerProperties to true. Recognizers that do not support audio data ignore this call.
Test the availability of audio for a result using the isAudioAvailable method of the FinalResult interface.
Use the getAudio methods to obtain audio data. These methods return null if audio data is not available.
Once the application has finished use of the audio for a Result, it should call the releaseAudio method of FinalResult to free up resources.
A recognizer may choose to release audio data for a result if it is necessary to reclaim memory or other system resources.
When audio is released by either a call to releaseAudio or by the recognizer a AUDIO_RELEASED event is issued to the audioReleased method of the ResultListener.
7.12 Result Correction
Recognition results are not always correct. Some recognizers can be trained by informing of the correct tokens for a result - usually when a user corrects a result.
Recognizers are not required to support correction capabilities. If a recognizer does support correction, it does not need to support correction for every result. For example, some recognizers support correction only for dictation results.
Applications are not required to provide recognizers with correction information. However, if the information is available to an application and the recognizer supports correction then it is good practice to inform the recognizer of the correction so that it can improve its future recognition performance.
The FinalResult interface provides the methods that handle correction.
Table 6-9 FinalResult interface: correction methods
Name Description
tokenCorrection Inform the recognizer of a correction in which zero or more tokens replace a token or sequence of tokens.
MISRECOGNITION
USER_CHANGE
DONT_KNOW Indicate the type of correction.
isTrainingInfoAvailable Tests whether the recognizer has information available to allow it to learn from a correction.
releaseTrainingInfo Release training information for a result.
Often, but certainly not always, a correction is triggered when a user corrects a recognizer by selecting amongst the alternative guesses for a result. Other instances when an application is informed of the correct result are when the user types a correction to dictated text, or when a user corrects a misrecognized command with a follow-up command.
Once an application has obtained the correct result text, it should inform the recognizer. The correction information is provided by a call to the tokenCorrection method of the FinalResult interface. This method indicates a correction of one token sequence to another token sequence. Either token sequence may contain one or more tokens. Furthermore, the correct token sequence may contain zero tokens to indicate deletion of tokens.
The tokenCorrection method accepts a correctionType parameter that indicates the reason for the correction. The legal values are defined by constants of the FinalResult interface:
MISRECOGNITION indicates that the new tokens are known to be the tokens actually spoken by the user: a correction of a recognition error. Applications can be confident that a selection of an alternative token sequence implies a MISRECOGNITION correction.
USER_CHANGE indicates that the new tokens are not the tokens originally spoken by the user but instead the user has changed his/her mind. This is a "speako" (a spoken version of a "typo"). A USER_CHANGE may be indicated if a user types over the recognized result, but sometimes the user may choose to type in the correct result.
DONT_KNOW the application does not know whether the new tokens are correcting a recognition error or indicating a change by the user. Applications should indicate this type of correction whenever unsure of the type of correction.
Why is it useful to tell a recognizer about a USER_CHANGE? Recognizers adapt to both the sounds and the patterns of words of users. A USER_CHANGE correction allows the recognizer to learn about a user's word patterns. A MISRECOGNITION correction allows the recognizer to learn about both the user's voice and the word patterns. In both cases, correcting the recognizer requests it to re-train itself based on the new information.
Training information needs to be managed because it requires substantial memory and possibly other system resources to maintain it for a result. For example, in long dictation sessions, correction data can begin to use excessive amounts of memory.
Recognizers maintain training information only when the recognizer's TrainingProvided parameter is set to true through the RecognizerProperties interface. Recognizers that do not support correction will ignore calls to the setTrainingProvided method.
If the TrainingProvided parameter is set to true, a result may include training information when it is finalized. Once an application believes the training information is no longer required for a specific FinalResult, it should call the releaseTrainingInfo method of FinalResult to indicate the recognizer can release the resources used to store the information.
At any time, the availability of training information for a result can be tested by calling the isTrainingInfoAvailable method.
Recognizers can choose to release training information even without a request to do so by the application. This does not substantially affect an application because performing correction on a result which does not have training information is not an error.
A TRAINING_INFO_RELEASED event is issued to the ResultListener when the training information is released. The event is issued identically whether the application or recognizer initiated the release.
7.13 Rejected Results
First, a warning: ignore rejected results unless you really understand them!
Like humans, recognizers don't have perfect hearing and so they make mistakes (recognizers still tend to make more mistakes than people). An application should never completely trust a recognition result. In particular, applications should treat important results carefully, for example, "delete all files".
Recognizers try to determine whether they have made a mistake. This process is known as rejection. But recognizers also make mistakes in rejection! In short, a recognizer cannot always tell whether or not it has made a mistake.
A recognizer may reject incoming speech for a number of reasons:
Detected a non-speech event (e.g. cough, laughter, microphone click).
Detected speech that only partially matched an active grammar (e.g. user spoke only half a command).
Speech contained "um", "ah", or some other speaking error that the recognizer could not ignore.
Speech matched an active grammar but the recognizer was not confident that it was an accurate match.
Rejection is controlled by the ConfidenceLevel parameter of RecognizerProperties (see Section 6.8). The confidence value is a floating point number between 0.0 and 1.0. A value of 0.0 indicates weak rejection - the recognizer doesn't need to be very confident to accept a result. A value of 1.0 indicates strongest rejection, implying that the recognizer will reject a result unless it is very confident that the result is correct. A value of 0.5 is the recognizer's default.
7.13.1 Rejection Timing
A result may be rejected with a RESULT_REJECTED event at any time while it is UNFINALIZED: that is, any time after a RESULT_CREATED event but without a RESULT_ACCEPTED event occurring.
This means that the sequence of result events that produce a REJECTED result:
A single RESULT_CREATED event to issue a new result in the UNFINALIZED state.
While in the UNFINALIZED state, zero or more RESULT_UPDATED events may be issued to update finalized and/or unfinalized tokens. Also, a single optional GRAMMAR_FINALIZED event may be issued to indicate that the matched grammar has been identified.
A single RESULT_REJECTED event moves the result to the REJECTED state.
When a result is rejected, there is a strong probability that the information about a result normally provided through Result, FinalResult, FinalRuleResult and FinalDictationResult interfaces is inaccurate, or more typically, not available.
Some possibilities that an application must consider:
There are no finalized tokens (numTokens returns 0).
The GRAMMAR_FINALIZED event was not issued, so the getGrammar method returns null. In this case, all the methods of the FinalRuleResult and FinalDictationResult interfaces throw exceptions.
Audio data and training information may be unavailable, even when requested.
All tokens provided as best guesses or alternative guesses may be incorrect.
If the result does match a RuleGrammar, there is not a guarantee that the tokens can be parsed successfully against the grammar.
Finally, a repeat of the warning. Only use rejected results if you really know what you are doing!
7.14 Result Timing
Recognition of speech is not an instant process. There are intrinsic delays between the time the user starts or ends speaking a word or sentence and the time at which the corresponding result event is issued by the speech recognizer.
The most significant delay for most applications is the time between when the user stops speaking and the RESULT_ACCEPTED or RESULT_REJECTED event that indicates the recognizer has finalized the result.
The minimum finalization time is determined by the CompleteTimeout parameter that is set through the RecognizerProperties interface. This time-out indicates the period of silence after speech that the recognizer should process before finalizing a result. If the time-out is too long, the response of the recognizer (and the application) is unnecessarily delayed. If the time-out is too short, the recognizer may inappropriately break up a result (e.g. finalize a result while the user is taking a quick breath). Typically values are less than a second, but not usually less than 0.3sec.
There is also an IncompleteTimeout parameter that indicates the period of silence a recognizer should process if the user has said something that may only partially matches an active grammar. This time-out indicates how long a recognizer should wait before rejecting an incomplete sentence. This time-out also indicates how long a recognizer should wait mid-sentence if a result could be accepted, but could also be continued and accepted after more words. The IncompleteTimeout is usually longer than the complete time-out.
Latency is the overall delay between a user finishing speaking and a result being produced. There are many factors that can affect latency. Some effects are temporary, others reflect the underlying design of speech recognizers. Factors that can increase latency include:
The CompleteTimeout and IncompleteTimeout properties discussed above.
Computer power (especially CPU speed and memory): less powerful computers may process speech slower than real-time. Most systems try to catch up while listening to background silence (which is easier to process than real speech).
Grammar complexity: larger and more complex grammars tend to require more time to process. In most cases, rule grammars are processed more quickly than dictation grammars.
Suspending: while a recognizer is in the SUSPENDED state, it must buffer of incoming audio. When it returns to the LISTENING state it must catch up by processing the buffered audio. The longer the recognizer is suspended, the longer it can take to catch up to real time and the more latency increases.
Client/server latencies: in client/server architectures, communication of the audio data, results, and other information between the client and server can introduce delays.
7.15 Storing Results
Result objects can be stored for future processing. This is particularly useful for dictation applications in which the correction information, audio data and alternative token information is required in future sessions on the same document because that stored information can assist document editing.
The Result object is recognizer-specific. This is because each recognizer provides an implementation of the Result interface. The implications are that (a) recognizers do not usually understand each other's results, and (b) a special mechanism is required to store and load result objects (standard Java object serialization is not sufficient).
The Recognizer interface defines the methods writeVendorResult and readVendorResult to perform this function. These methods write to an OutputStream and read from an InputStream respectively. If the correction information and audio data for a result are available, then they will be stored by this call. Applications that do not need to store this extra data should explicitly release it before storing a result.
{
Recognizer rec;
OutputStream stream;
Result result;
...
try {
rec.writeVendorResult(stream, result);
} catch (Exception e) {
e.printStackTrace();
}
}
A limitation of storing vendor-specific results is that a compatible recognizer must be available to read the file. Applications that need to ensure a file containing a result can be read, even if no recognizer is available, should wrap the result data when storing it to the file. When re-loading the file at a later time, the application will unwrap the result data and provide it to a recognizer only if a suitable recognizer is available. One way to perform the wrapping is to provide the writeVendorResult method with a ByteArrayOutputStream to temporarily place the result in a byte array before storing to a file.
Comments
Post a Comment