SBIR-STTR Award

Noise-Robust Speech Recognition for the F-16 Cockpit
Award last edited on: 10/16/2002

Sponsored Program
SBIR
Awarding Agency
DOD : AF
Total Award Amount
$849,350
Award Phase
2
Solicitation Topic Code
AF97-012
Principal Investigator
Ojvind Bernander

Company Information

Vocal Point Inc

847 Howard Street
San Francisco, CA 94103
   (415) 563-5000
   contact@vocalpoint.com
   www.vocalpoint.com
Location: Single
Congr. District: 11
County: San Francisco

Phase I

Contract Number: F41624-97-C-6018
Start Date: 4/25/1997    Completed: 1/25/1998
Phase I year
1997
Phase I Amount
$99,933
Existing automatic speech recognition (ASR) systems work well in quiet surroundings for small- and medium-sized vocabularies, but performance degrades rapidly in noisy environments. Achieving noise robustness is amust before ASR is widely deployed by the Air Force or becomes accepted in the consumer electronics market.Current approaches to noise-robustness rechniques modify traditional front ends and back ends, such as cepstra and hidden Markov models (HMMs). Only incremental improvements in performance have been obtained.We will adapt two recent inventions for use with ASR. The first is a front end (preprocessor): the Stabilized AuditoryImage (SAI) is a state-of-the-art model of the human auditory system that faithfully replicates human perception in many cases where traditional spectral models fail. It will be used as a basis for more robust feature extraction. The seecond is a back end (recognizer): the method of manifold linearization has been applied to handwritten character recognition (HCR), outperforming other methods while keeping the computational load low. A number of similarities between HCR and ASR make this approach very promising for use in ASR.The processors we develop will be evaluated by comparison with traditional implementations (in-house simulations), as well as to state-of-the-art systems described in the literature.

Phase II

Contract Number: F41624-98-C-6004
Start Date: 4/24/1998    Completed: 4/24/2000
Phase II year
1998
Phase II Amount
$749,417
New, commercially available, automatic speech recognition (ASR) systems work adequately in quiet surroundings for medium-sized vocabularies, but performance plummets rapidly in noisy environments. Achieving noise robustness is a must before ASR is widely deployed by the AF or becomes accepted in the consumer electronics market. Current approaches to noise-robustness modify traditional front ends and back ends, such as cepstra and hidden Markov models (HMMs). Only incremental improvements in performance have been obtained. During Phase I, we proved the feasibility of several novel, noise-robust approaches at all stages of the recognition pipeline. We will build and deliver a continuous-speech, noise-robust ASR system that works in the presence of high levels of F-16 cockpit noise. The system will adapt to a new speaker using very short training sessions. It will aim for a vocabulary of 500-1000 words and a grammar perplexity of 20. Recognition performance will be evaluated in a noise room with talkers wearing oxygen masks. This ASR system builds on very successful results from the Phase I effort: several front-end preprocessors derived from the Auditory Image Model (a functional model of the human auditory system), a novel distance metric that outperformans the Euclidean (L2) norm, and a paradigm for combining neural networks with HMMs.

Keywords:
NEURAL NETWORKS HIDDEN MARKOV MODELS COCHLEAR MODELS CONTINUOUS SPEECH NOISE ROBUSTNESS NON-EUCLIDE