Usefull for artificial intelligence Part-1

Artificial Intelligence


Speech Recognition: javax.speech.recognition 


A speech recognizer is a speech engine that converts speech to text. The javax.speech.recognition package defines the Recognizer interface to support speech recognition plus a set of supporting classes and interfaces. The basic functional capabilities of speech recognizers, some of the uses of speech recognition and some of the limitations of speech recognizers are described in Section 2.2.

As a type of speech engine, much of the functionality of a Recognizer is inherited from the Engine interface in the javax.speech package and from other classes and interfaces in that package. The javax.speech package and generic speech engine functionality are described in Chapter 4.

The Java Speech API is designed to keep simple speech applications simple Þ and to make advanced speech applications possible for non-specialist developers. This chapter covers both the simple and advanced capabilities of the javax.speech.recognition package. Where appropriate, some of the more advanced sections are marked so that you can choose to skip them. We begin with a simple code example, and then review the speech recognition capabilities of the API in more detail through the following sections:

"Hello World!": a simple example of speech recognition
Recognizer as an Engine
Recognizer State Systems
Recognition Grammars
Rule Grammars
Dictation Grammars
Recognition Results
Recognizer Properties
Speaker Management
Recognizer Audio




1.     "Hello World!"

The following example shows a simple application that uses speech recognition. For this application we need to define a grammar of everything the user can say, and we need to write the Java software that performs the recognition task.

A grammar is provided by an application to a speech recognizer to define the words that a user can say, and the patterns in which those words can be spoken. In this example, we define a grammar that allows a user to say "Hello World" or a variant. The grammar is defined using the Java Speech Grammar Format. This format is documented in the Java Speech Grammar Format Specification.

Place this grammar into a file.


grammar javax.speech.demo;

public <sentence> = hello world | good morning |
                                      hello mighty computer;

This trivial grammar has a single public rule called "sentence". A rule defines what may be spoken by a user. A public rule is one that may be activated for recognition.

The following code shows how to create a recognizer, load the grammar, and then wait for the user to say something that matches the grammar. When it gets a match, it deallocates the engine and exits.


import javax.speech.*;
import javax.speech.recognition.*;
import java.io.FileReader;
import java.util.Locale;

public class HelloWorld extends ResultAdapter {
static Recognizer rec;

// Receives RESULT_ACCEPTED event: print it, clean up, exit
public void resultAccepted(ResultEvent e) {
Result r = (Result)(e.getSource());
ResultToken tokens[] = r.getBestTokens();

for (int i = 0; i < tokens.length; i++)
System.out.print(tokens[i].getSpokenText() + " ");
System.out.println();

// Deallocate the recognizer and exit
rec.deallocate();
System.exit(0);
}

public static void main(String args[]) {
try {
// Create a recognizer that supports English.
rec = Central.createRecognizer(
new EngineModeDesc(Locale.ENGLISH));

// Start up the recognizer
rec.allocate();

// Load the grammar from a file, and enable it
FileReader reader = new FileReader(args[0]);
RuleGrammar gram = rec.loadJSGF(reader);
gram.setEnabled(true);

// Add the listener to get results
rec.addResultListener(new HelloWorld());

// Commit the grammar
rec.commitChanges();

// Request focus and start listening
rec.requestFocus();
rec.resume();
} catch (Exception e) {
e.printStackTrace();
}
}
}

This example illustrates the basic steps which all speech recognition applications must perform. Let's examine each step in detail.

Create: The Central class of javax.speech package is used to obtain a speech recognizer by calling the createRecognizer method. The EngineModeDesc argument provides the information needed to locate an appropriate recognizer. In this example we requested a recognizer that understands English (since the grammar is written for English).
Allocate: The allocate methods requests that the Recognizer allocate all necessary resources.
Load and enable grammars: The loadJSGF method reads in a JSGF document from a reader created for the file that contains the javax.speech.demo grammar. (Alternatively, the loadJSGF method can load a grammar from a URL.) Next, the grammar is enabled. Once the recognizer receives focus (see below), an enabled grammar is activated for recognition: that is, the recognizer compares incoming audio to the active grammars and listens for speech that matches those grammars.
Attach a ResultListener: The HelloWorld class extends the ResultAdapter class which is a trivial implementation of the ResultListener interface. An instance of the HelloWorld class is attached to the Recognizer to receive result events. These events indicate progress as the recognition of speech takes place. In this implementation, we process the RESULT_ACCEPTED event, which is provided when the recognizer completes recognition of input speech that matches an active grammar.
Commit changes: Any changes in grammars and the grammar enabled status needed to be committed to take effect (that includes creation of a new grammar). The reasons for this are described in Section 6.4.2.
Request focus and resume: For recognition of the grammar to occur, the recognizer must be in the RESUMED state and must have the speech focus. The requestFocus and resume methods achieve this.
Process result: Once the main method is completed, the application waits until the user speaks. When the user speaks something that matches the loaded grammar, the recognizer issues a RESULT_ACCEPTED event to the listener we attached to the recognizer. The source of this event is a Result object that contains information about what the recognizer heard. The getBestTokens method returns an array of ResultTokens, each of which represents a single spoken word. These words are printed.
Deallocate: Before exiting we call deallocate to free up the recognizer's resources.


Comments

Popular Posts