SBIR-STTR Award

VocaliD - Infusing Unique Vocal Identities into Synthesized Speech
Award last edited on: 6/16/2017

Sponsored Program
SBIR
Awarding Agency
NSF
Total Award Amount
$1,290,749
Award Phase
2
Solicitation Topic Code
-----

Principal Investigator
Rupal Patel

Company Information

VocaliD Inc

50 Leonard Street
Belmont, MA 02478
   (339) 368-0416
   hello@vocalid.co
   www.vocalid.ai
Location: Single
Congr. District: 05
County: Middlesex

Phase I

Contract Number: 1447995
Start Date: 1/1/2015    Completed: 6/30/2015
Phase I year
2015
Phase I Amount
$150,000
The broader impact/commercial potential of this Small Business Innovation Research (SBIR) Phase I project is to create custom crafted voices for text-to-speech applications that empower recipients to engage in conversation and be heard in his/her own voice. The company's technology blends a recipient's residual vocal abilities and a matched donor's speech database to craft a personalized voice that combines the recipient's vocal identity with the clarity of the donor's speech. In the United States alone, there are over 2.5 million individuals with speech impairment and 3-5 million individuals with low vision who rely on a limited set of generic, mechanical sounding voices for assisted communication. It is not uncommon for several children in a classroom or adults in a workplace to use the same synthetic voice. Each one of us has a unique voiceprint that conveys our age, gender, race, size, and personality. Until now, this variety and flexibility of voice has not been afforded to users of speech synthesis technology. The company's goal is to give the gift of voice to all those who need and want it to enhance how they learn, work and play.

This Small Business Innovation Research (SBIR) Phase I project aims to engineer personalized synthetic voices that convey the recipient's unique vocal identity. The company's innovation is grounded in the source-filter theory of speech production which divides the speech signal into a source component (the vocal folds) and a filter component (the rest of the vocal tract) that are largely independent. Because source and filter characteristics both contribute to speaker identity, the key challenge is to create an authentic, yet understandable voice by extracting as much identity information from recipient vocalizations as possible and combining it with speech clarity information from the donor. Standard voice conversion methods require large amounts of spoken data from donors and recipients as well as parallel corpora, which are not available for the target applications. This Phase I work will make significant advances toward the design and implementation of a novel automated voice matching and transformation process that leverages a large database of healthy donors' speech to generate personalized synthetic voices from sparse samples of target recipients' vocalizations. Algorithms will be validated using both quantitative and perceptual metrics to assess intelligibility, similarity and naturalness.

Phase II

Contract Number: 1555608
Start Date: 4/1/2016    Completed: 9/30/2018
Phase II year
2016
(last award dollars: 2019)
Phase II Amount
$1,140,749

The broader impact/commercial potential of this Small Business Innovation Research (SBIR) Phase II project is to offer custom crafted digital voices for text-to-speech applications. Each one of us has a unique voiceprint - an essential part of our self-identity. Though the quality of text-to-speech technology has improved, voice options remain limited. For the 2.5 million Americans (and tens of millions worldwide) living with voicelessness who rely on devices to talk, access to a custom digital voice is a game changer. It's the difference between a functional solution and being heard, uniquely, as oneself. Enhanced opportunities for social connection increase quality of life, independence, and access to educational and vocational resources that can narrow the gap between those with and without disability. This immediate unmet societal need, coupled with the increasing proliferation of devices that speak to us and for us, creates a compelling, timely and significant commercial opportunity for high quality, personalized digital voices that can be produced at scale. By leveraging the company's crowdsourced human voicebank and proprietary voice matching and blending algorithms the technology has the potential to empower everyone to express themselves through their own voice.This Small Business Innovation Research Phase II project builds on the company's NSF-funded research and Phase I results that support feasibility and commercialization of a customized voice building technology. The text-to-speech market, encompassing assistive technologies, enterprise and consumer applications, is currently valued at around $1B and is rapidly growing and ripe for innovation. To create custom voices, the company leverages the source-filter theory of speech production. From those who are unable or unwilling to record several hours of speech the company extracts a brief vocal sample - even a single vowel contains enough 'vocal DNA' to seed the personalization process. Identity cues of the source are then combined with filter properties of a demographically and acoustically matched donor in the company's voicebank. The result is a voice that captures the vocal identity of the recipient but the clarity of the donor. Phase II technical objectives address the need for 1) customer-driven voice customization, 2) quality assurance of crowdsourced recordings, 3) voice aging algorithms, and 4) targeted donor recruitment algorithms. These advances will help secure the assistive technology beachhead and spur innovations for broader applications such as virtual reality, personal robotics, and digital persona for the Internet of Things.