SBIR-STTR Award

Video-to-speech software application to provide real-time, noninvasive, natural voice restoration for voiceless individuals
Award last edited on: 3/10/23

Sponsored Program
SBIR
Awarding Agency
NSF
Total Award Amount
$255,997
Award Phase
1
Solicitation Topic Code
HC
Principal Investigator
Dina Yamaleyeva

Company Information

Lira Inc

119 Hollow Oak Drive
Durham, NC 27713
   (215) 350-6614
   N/A
   N/A
Location: Single
Congr. District: 04
County: Durham

Phase I

Contract Number: 2136629
Start Date: 9/1/22    Completed: 5/31/23
Phase I year
2022
Phase I Amount
$255,997
The broader impact of this Small Business Innovation Research (SBIR) Phase I project seeks to enable one million Americans that suffer with the loss of ability to speak through disease of or damage to the larynx or mouth (aphonia). The inability to fluently communicate with other people has severe consequences. Voiceless individuals are three times more likely to suffer a preventable adverse event in medical settings than speaking patients, and this can lead to health problems and even life-threatening situations. Up to 50% of these adverse events could be avoided with adequate communication between patients and clinicians. The proposed solution is a video-to-speech software application that provides voiceless people with real-time communication assistance, especially geared towards medical settings. The technology could help prevent hundreds of thousands of adverse health events each year (costing $6.8 billion annually), with benefits for the voiceless population and the healthcare system in general. The innovation may improve voice restoration by providing real-time translation with no training needed and allowing complex messages to be expressed while looking eye-to-eye (an important part of human communication). Moreover, the technology does not require invasive installations nor complex equipment, is readily accessible, and has maintenance requirements that are marginal.This Small Business Innovation Research (SBIR) Phase I project aims to address the intellectual challenge of overcoming the ambiguity of visemes when trying to automate lip-reading. Visemes (the gestures made when talking) and phonemes (the sounds produced with these gestures) do not share a one-to-one correspondence. This makes accurately predicting the intended speech based on visual information challenging. Previous researchers have failed to reach acceptable accuracy levels in the interpretation of visemes, while other tools only work with a few dozen words that must be structured according to pre-defined, fixed rules that are impractical. The main goal of this effort is to develop a combination of convolutional neural networks and recurrent neural network transducers that is capable of accurately differentiating visemes and permits real-time, reliable voice assistance for voiceless people. Project objectives include: (1) pre-training an algorithm to detect phonemes using publicly available speech video, (2) optimizing the phoneme-trained algorithm against healthcare relevant vocabulary, and (3) alpha-testing of the lip-reading algorithm against real-time speech.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Phase II

Contract Number: ----------
Start Date: 00/00/00    Completed: 00/00/00
Phase II year
----
Phase II Amount
----