SBIR-STTR Award

Improving Recall in Domain Independent Information
Award last edited on: 4/2/2008

Sponsored Program
SBIR
Awarding Agency
DOD : DARPA
Total Award Amount
$834,847
Award Phase
2
Solicitation Topic Code
SB001-012
Principal Investigator
Svetlana Sheremetyeva

Company Information

Onyx Consulting Inc

1010 Edgewood Road Suite 107
Edgewood, MD 21040
   (410) 252-8969
   N/A
   N/A
Location: Single
Congr. District: 02
County: Harford

Phase I

Contract Number: ----------
Start Date: ----    Completed: ----
Phase I year
2000
Phase I Amount
$89,698
This project is devoted to enhancing the recall of general-purpose domain-independent information retrieval systems. Its unique contribution is the incorporation of four different sources of knowledge for evaluating the match of a particular document to a query: the broad-coverage lists of proper names ("onomastica"); the knowledge of the syntax of the text in the documents; the knowledge of the ontological-semantic properties of words in the text; and knowledge to help resolve problems with anaphoric reference as well as metonymy and other tropes in the input text. These individual sources have been researched in academia and are available to Onyx Consulting for integration and incorporation in a working proof-of-concept system.

Phase II

Contract Number: ----------
Start Date: ----    Completed: ----
Phase II year
2001
Phase II Amount
$745,149
This project develops an information extraction system that demonstrates higher levels of recall than current systems, seeking not to jeopardize the levels of precision. Our recall enhancing algorithms use more linguistic and world knowledge than most current systems. Four crucial avenues of work that will lead to the improvement of recall are: disambiguation of input text terms through ontological semantic processing; processing reference; processing non-literal language; and assigning semantic features to new, unattested word and phrase occurrences. All the above activities rely on a unique battery of resources and processes developed by or available to Onyx. These include an ontological world model, a fact database, a comprehensive NLP lexicon of English and an onomasticon, or lexicon of proper names. In addition, we use special routines for resolving reference, processing non-literal language through controlled constraint relaxation and treating unattested inputs using expectations recorded in the ontology, the fact database and in special orthographic, morphological and syntactic rules. Architecturally, we will combine in a single system a variety of approaches and processes as above. Unlike most current systems, ours will be geared not only at information extraction for a given set of templates (and, therefore, typically, working in a single domain) but will also include facilities for modifying templates and defining new templates for new types of questions and, orthogonally, new domains. Thus, our product will be the first general-purpose, configurable information extraction system, which will in multiple domains and with multiple text genres. Additional resources and linguistic expertise for this project are supplied by consultants at New Mexico State University's Computing Research Laboratory, a premier academic R&D institution.

Keywords:
Text Extraction, Onomastica, Semantics, Anaphoric Reference, Natural Language Processing, Synta