SBIR-STTR Award

Advanced Speech Production Models
Award last edited on: 2/26/2007

Sponsored Program
SBIR
Awarding Agency
DOD : AF
Total Award Amount
$99,979
Award Phase
1
Solicitation Topic Code
AF01-065
Principal Investigator
Brian Womack

Company Information

Speech Technologies Corporation

2414 NE Java Way
Hillsboro, OR 97124
   (503) 648-9822
   N/A
   N/A
Location: Single
Congr. District: 01
County: Washington

Phase I

Contract Number: ----------
Start Date: ----    Completed: ----
Phase I year
2001
Phase I Amount
$99,979
Current speech processing systems address the effects of channel noise and assume that all speakers are always in the same stress state. Neutral speech occurs when the speaker has no task obligation other than to speak. Perceptually or physiologically induced speaker stress occurs in non-neutral conditions such as G-force, vibration, environmental noise, or emotion. Depending upon the type and degree of perceptual or physiological stress, the glottal source, vocal tract frequency structure, fundamental frequency, and intensity or duration are all affected in different ways. In addition, for a given speech utterance, a speaker's production can shift from one type of stress state to another dynamically. There are two main approaches to addressing the problem of speaker induced stress: (i) stress robust speech production models and (ii) stress robust speech processing algorithms. This study will focus on the first approach. By understanding the effects of speaker induced stress, it is possible to create models that are less sensitive to stress induced variability. These new features will then be applied as an example to the stressed speech recognition problem to determine if they result in improved performance.Current speech processing algorithms work best with neutral speech produced in quiet environments. In many settings, speech can be produced in environments that cause perceptual speaker stress such as emotion, task workload, or perceived background noise. During perceptual stress, the speaker will modify their production of speech to assist the listener's ability to receive the intent of the speech or to simply express emotion. Alternately, the human body can be exposed to physiological stress induced by vibration, acceleration, deceleration, or changes in the makeup of the air supply. Both of these types of stress create a significant change on the speech signal before it leaves the vocal tract. With better speech production models that seek to minimize the variability introduced by stress and noise, a wide range of speech processing applications will benefit. This is a foundational piece of technology that is required to take speech processing to the next stage in performance and reliability. Commercial applications will benefit from more robust speech recognition, for example, because speakers will be able to express emotion naturally in the real world environments that are part of their every day experience. This is an often quoted complaint with current speech recognition systems currently available commercially.

Phase II

Contract Number: ----------
Start Date: ----    Completed: ----
Phase II year
----
Phase II Amount
----