New, commercially available, automatic speech recognition (ASR) systems work adequately in quiet surroundings for medium-sized vocabularies, but performance plummets rapidly in noisy environments. Achieving noise robustness is a must before ASR is widely deployed by the AF or becomes accepted in the consumer electronics market. Current approaches to noise-robustness modify traditional front ends and back ends, such as cepstra and hidden Markov models (HMMs). Only incremental improvements in performance have been obtained. During Phase I, we proved the feasibility of several novel, noise-robust approaches at all stages of the recognition pipeline. We will build and deliver a continuous-speech, noise-robust ASR system that works in the presence of high levels of F-16 cockpit noise. The system will adapt to a new speaker using very short training sessions. It will aim for a vocabulary of 500-1000 words and a grammar perplexity of 20. Recognition performance will be evaluated in a noise room with talkers wearing oxygen masks. This ASR system builds on very successful results from the Phase I effort: several front-end preprocessors derived from the Auditory Image Model (a functional model of the human auditory system), a novel distance metric that outperformans the Euclidean (L2) norm, and a paradigm for combining neural networks with HMMs.
Keywords: NEURAL NETWORKS HIDDEN MARKOV MODELS COCHLEAR MODELS CONTINUOUS SPEECH NOISE ROBUSTNESS NON-EUCLIDE