SBIR-STTR Award

Robust Speech-to-Text Messaging
Award last edited on: 9/4/2007

Sponsored Program
SBIR
Awarding Agency
NSF
Total Award Amount
$600,000
Award Phase
2
Solicitation Topic Code
-----

Principal Investigator
Ashwin Rao

Company Information

TravellingWave Inc

216 Broadway East Suite 203
Seattle, WA 98102
   (206)-328-6431
   bizdev@travellingwave.com
   www.travellingwave.com
Location: Single
Congr. District: 07
County: King

Phase I

Contract Number: ----------
Start Date: ----    Completed: ----
Phase I year
2006
Phase I Amount
$100,000
This Small Business Innovation Research (SBIR) Phase I research project addresses the fundamental problem of inputting text, using speech, into embedded devices like cellular phones. This technology has immediate applications for Text Messaging (short messaging service or SMS). Existing interfaces for Text-Messaging input broadly include the 9-digit keypad and the miniature keyboards. It is widely acknowledged that these interfaces are clumsy and lack the speed and user friendliness of a full-size keyboard. This project's objective is to develop a highly robust, complementary speech-to-text messaging interface, with a goal of near 100% task-completion accuracy (TCA) in real-world noisy environments. Using this, a mobile user will be able to speak messages into a device and have that device type the same. Currently, TravellingWave (TW) has developed (based on the company's patent-pending predictive speech-to-text technologies) speech-to-text messaging software; in clean environments, this product yields the desired 100% TCA. The proposed research involves developing novel front-end signal-processing algorithms (based on adaptive filter banks), optimized to TW's predictive speech-to-text technologies. Specifically, a bank of simple adaptive filters will be developed, each of which estimates and tracks the frequency location of a dominant spectral peak and its amplitude, while discriminating against background noise and interference. It is anticipated that the algorithms resulting from Phase I research will enable its current technology to work under real-world noisy environments and reduce the processing power requirements of the company's overall software application; increasing its overall adoption. The technology is relevant to speech-to-text messaging applications for mobile devices. However, more broadly, the underlying technology may be viewed as an enhanced multi-modal user-interface for the ever-shrinking mobile device: users can now input text using their own voice. The socioeconomic impact of such a rich user-interface technology may be envisioned using the following examples: (a) a user driving an automobile can dictate an email to a mobile device which then sends it across a wireless network, (b) an enterprise executive can access the wealth of information (while on the go) residing on the Internet using a mobile device, (c) a disabled person may communicate in a hands-free-eyes-free mode using text messaging, (c) a warehouse industry worker may input text into a remote database while working in a hands-busy-eyes-busy environment. When adopted in the consumer market this technology will increase the understanding of the language semantics people use, the expectations, the overall use of this new mode of interface, and hence will broaden the overall understanding of several concepts underlying human-machine interface technology

Phase II

Contract Number: ----------
Start Date: ----    Completed: ----
Phase II year
2007
Phase II Amount
$500,000
This Small Business Innovation Research (SBIR) Phase II research project proposes to develop techniques for the hands-free input of text to mobile devices. Specifically, this project extends the results of the Phase I effort to produce a speech-recognition system for mobile devices and personal appliances that is robust in the presence of background noise. To increase the speech recognition accuracy, four techniques are employed: 1) Spellation where the users have to speak and partially spell the words as they dictate, 2) VoiceTap which requires that, for each character, the user says that character and the following character in the alphabet, 3) Voice Predict where the user has to say the word and input the first character of the word using the keyboard or VoiceTap, and 4) multi-modal speech to text, where the user speaks and uses the keyboard simultaneously. The research effort will focus on developing modules that allow speech to be dictated using a combination of whole words and spelled words. The outcome of the proposed research has significant commercial potential. Because the front end or client-side can be ported to a variety of operating systems and processors, the flexibility of this technology should enable wide licensing of the technology to telecommunication device manufacturers. The mobile wireless industry is very large and growing industry, and multi-modal input technology is important to mobile customers who demand more efficient and accurate methods for communication. Improvements in accuracy could be very significant and would potentially have widespread applicability