SBIR-STTR Award

Near-Real Time Arabic/English Machine Translation by Integrated Statistical and Linguistic Learning Methods
Award last edited on: 2/9/2007

Sponsored Program
SBIR
Awarding Agency
DOD : Army
Total Award Amount
$69,998
Award Phase
1
Solicitation Topic Code
A03-088
Principal Investigator
Evelyne Tzoukermann

Company Information

StreamSage

1133 15th Street NW 10th Floor
Washington, DC 20005
   (202) 722-2440
   comments@streamsage.com
   www.streamsage.com
Location: Single
Congr. District: 00
County: District of Columbia

Phase I

Contract Number: ----------
Start Date: ----    Completed: ----
Phase I year
2004
Phase I Amount
$69,998
StreamSage proposes an approach to the automatic translation of Arabic and Arabic dialect texts to and from English that significantly extends the state-of-the-art in regards to the integration of statistical and traditional machine translation techniques. This research will greatly increase translation accuracy while decreasing the need for domain-specific training. The proposed near real time translation system will use automatically induced transfer rules between English and Arabic syntactic structures that have been statistically trained on a feature set that is of unprecedented sophistication. This feature set will be automatically generated through the use of tools that have not before been applied to Arabic machine translation, such as language-wide noun and verb sense disambiguation, a TAG-Based Stochastic Parser, and a hierarchical representation of Arabic dialect morphology, lexical features, and syntactic structures. Additional innovations include the application of state-of-the-art Arabic morphological analysis throughout the translation process, from word sense disambiguation to transfer rule induction to generation, and the automatic induction of syntactic-structure to target language generation rules. This research will make use of past work in machine tranlation, Arabic parsing, Arabic dialect analysis, and word sense disambiguation by StreamSage, Columbia University, and CoGenTex.

Benefits:
The proposed effort is designed to produce a dramatic leap in translation accuracy between Arabic and English while greatly reducing the training burden for new domains and new dialects. This will be of particular benefit to government and military applications, in which the current inadequacy of commercial-grade machine translation systems represents a tremendous inneficiency. StreamSage plans to market the resulting technology to government, military, and international development and aid organizations, as well as commercial customers that conduct business in Arabic-speaking contexts.

Keywords:
machine translation, Arabic, Arabic dialects, word sense disambiguation, morphology, transfer rule induction

Phase II

Contract Number: ----------
Start Date: ----    Completed: ----
Phase II year
----
Phase II Amount
----