SBIR-STTR Award

Rapid Development Techniques for Spoken Language Translation
Award last edited on: 6/22/2012

Sponsored Program
SBIR
Awarding Agency
DOD : AF
Total Award Amount
$843,849
Award Phase
2
Solicitation Topic Code
AF071-041
Principal Investigator
Wei Wang

Company Information

Language Weaver Inc (AKA: SDL Language Weaver)

6060 Center Drive Suite 150
Los Angeles, CA 90045
   (310) 437-7300
   info@languageweaver.com
   www.languageweaver.com
Location: Multiple
Congr. District: 36
County: Los Angeles

Phase I

Contract Number: ----------
Start Date: ----    Completed: ----
Phase I year
2007
Phase I Amount
$99,976
We propose to build on recent work that introduces syntactic processing into statistical machine translation models, to develop algorithms and techniques that will create improved performance levels with a given amount of training data, and reduce the amount of training data required to achieve a given performance level. Projected advances will enable development of translation capability for a much wider variety of languages, subject domains and application areas than are currently feasible with data intensive statistical approaches.

Keywords:
Statistical Machine Translation, Syntax-Based Machine Translation, Bilingual Data, Dictionary

Phase II

Contract Number: ----------
Start Date: ----    Completed: ----
Phase II year
2008
Phase II Amount
$743,873
The Phase I project demonstrated the effectiveness of utilizing syntactic information and bilingual dictionaries to yield high quality translation systems in the Statistical Machine Translation (SMT) paradigm with limited bilingual corpus data for Chinese-English. Phase II extends this work with techniques that will enable similar success for morphologically complex languages, and a genuinely "resource poor" language, Urdu. This work seeks to overcome a key limitation of SMT commercially - the requirement for large scale data resources. Success in this project will be immediately applied to the many circumstances where only limited bilingual data is available: for many specialized domains, spoken translation applications, and the many resource poor languages that are important for military, intelligence and humanitarian operations.

Keywords:
Statistical Machine Translation, Chinese, Urdu, Arabic, Morphology, Syntax