Speech dialogue systems offer the advantages of the most natural human–machine interface: they allow controlling devices in an efficient and intuitive manner. The discourse processing required for these natural-language applications is straightforward but far from simple. At the system’s front end, the speech recognition, it is rather the exception than the norm to find an entirely correctly transcribed utterance, especially if the recognizer needs to handle a large vocabulary and a noisy environment further complicates matters. It returns a number of hypotheses of what might have been said, making it very hard for the next component—the natural language understanding module—to find the true information and extract the utterance’s meaning.
The work presented here demonstrates how the analysis of acoustical features which are usually ignored in the speech recognition process can help to find these true hypotheses and improve language understanding. Neural networks are used to investigate stress patterns to find an utterance’s most important semantic objects. Spotting these content words and phrases then allows the natural language unit and the following processing elements to understand a user’s intention faster and in a more robust fashion.
Chief topics of this book are:
- Dialogue systems: architecture and components, domain configuration with object/action ontologies to prepare dialogue systems for arbitrary tasks, evaluating dialogue hypotheses in a statistical framework, combining probability measures such as the prosodic content word prediction and speech recognition confidence
- Prosody and signal processing: accentuation, quantifying perceived speech loudness with a psychoacoustic intensity model, syllable segmentation, robust pitch tracking, vowel detection, dimensions of prosody
- Pattern recognition: neural networks, predicting the position of important semantic items, integrating content word prediction and natural language processing, simulated annealing for model output maximization, generating realistic training data with a Wizard of Oz experiment