The Phase I project demonstrated the effectiveness of utilizing syntactic information and bilingual dictionaries to yield high quality translation systems in the Statistical Machine Translation (SMT) paradigm with limited bilingual corpus data for Chinese-English. Phase II extends this work with techniques that will enable similar success for morphologically complex languages, and a genuinely "resource poor" language, Urdu. This work seeks to overcome a key limitation of SMT commercially - the requirement for large scale data resources. Success in this project will be immediately applied to the many circumstances where only limited bilingual data is available: for many specialized domains, spoken translation applications, and the many resource poor languages that are important for military, intelligence and humanitarian operations.
Keywords: Statistical Machine Translation, Chinese, Urdu, Arabic, Morphology, Syntax