The greatest challenge to auditory communication is background noise, especially in the complex acoustic environments that are experienced in daily life. This project proposes to develop a novel speech-enhancement algorithm that is robust in the presence of everyday environmental interference. The algorithm is inspired by recent findings related to the neural coding of vowels in the mid-brain. Recent work from our group has shown that the formant frequencies of voiced sounds are encoded by the brain on the basis of changes in low-frequency fluctuations, related to voice pitch. These responses are established in the auditory periphery, and they are transformed into a robust representation of formant frequencies at the level of the auditory midbrain, wherein neurons are exquisitely sensitive to rate fluctuations in the voice-pitch frequency range. This representation of formants is degraded in the presence of background noise by the inherent fluctuations introduced by the noise masker. It is possible, however, to detect and track formants in the presence of noise using a strategy based on these pitch-related fluctuations using a streamlined auditory model. We are taking advantage of this finding to develop an algorithm that restores the pattern of fluctuations across frequency channels, even in the presence of noise, for listeners with or without hearing loss. This restoration is accomplished by identifying and manipulating the rate fluctuations across the population of frequency channels in a manner that enhances the representation of speech. The strategy involves identifying the low formant frequencies, F1, F2, and F3, identifying the pitch (F0), and amplifying a single harmonic of F0 near each formant peak. This amplification restores the neural representation of noisy speech toward the response to speech in quiet. The speech enhancement algorithm requires a pitch-extraction mechanism that operates reliably within background noise. For this project the Carney Lab has teamed with Omnispeech, Inc., which is developing such a pitch-extraction mechanism. This project is focused upon 1) refinement of the formant- tracking and harmonic identification algorithm (using speech with known F0) and testing preference and intelligibility of processed vs. unprocessed speech in the presence of noise, 2) refinement of the pitch-extraction algorithm in the presence of noise, and 3) the combination of the new pitch-extraction and speech- enhancement mechanisms. Tests will include listeners with normal hearing and listeners with mild to moderate sensorineural hearing loss. The overall goal of this Phase I project is a proof-of-concept test of the novel speech-enhancement algorithm. Phase II will leverage the experience OmniSpeech is gaining in bringing speech technology to market in both cloud and embedded devices.
Public Health Relevance Statement: Public Health Relevance: The public health significance of the proposed work is that it will develop a speech-enhancement algorithm that will assist listeners in challenging acoustic environments. Difficulty hearing in noise is the most significant problem for all listeners, including listeners with hearing loss. In this study, we will develop a novel speech enhancement algorithm and test its feasibility in listeners with and without hearing loss.
Project Terms: Acoustics; Address; Algorithms; Auditory; base; Brain; Code; Collaborations; college; Communication; Communication Aids for Disabled; Complex; Computer software; Detection; Devices; Ensure; Environment; experience; Frequencies (time pattern); Goals; Hearing; hearing impairment; improved; innovation; Instruction; Life; Marketing; meetings; Memory; Midbrain structure; Modeling; Music; Neurons; Noise; novel; Pattern; Performance; Phase; Population; preference; Process; public health medicine (field); public health relevance; relating to nervous system; response; restoration; Running; Sensorineural Hearing Loss; signal processing; Signal Transduction; sound; Speech; System; Technology; Testing; Time; trafficking; Translating; Voice; Work