Search and Indexing of Syntactically and Semantically Related Concepts
Award last edited on: 4/2/02

Sponsored Program
Awarding Agency
Total Award Amount
Award Phase
Solicitation Topic Code

Principal Investigator
Yves Schabes

Company Information

Teragram Corporation

10 Fawcett Street
Cambridge, MA 02138
   (617) 576-6800
Location: Single
Congr. District: 05
County: Middlesex

Phase I

Contract Number: 9760986
Start Date: 00/00/00    Completed: 00/00/00
Phase I year
Phase I Amount
This Small Business Innovation Research (SBIR) Phase I project submitted by Teragram Corporation will design and prototype critical theoretical and algorithmic elements of a linguistic indexing and search software using finite-state technology. Prototyping the architecture of a search engine will enable searching and indexing syntactically and semantically related concepts. This research project embodies linguistic and finite-state techniques for the purpose of search and indexing of textual data. This prototype could lead to a new generation of search and indexing technologies, which will enable better search by accessing linguistically related concepts, while being flexible, fast and scalable to huge amounts of textual information. The company's linguistic technology is based on formal and algorithmic properties of finite-state automata and finite-state transducers and enables linguistic processing at very high speeds while achieving very high data compression rates. Finite-state technologies have tremendous potentials in linguistic processing for indexing purposes. If the proposed research meets its goals, the resulting software will have numerous commercial applications such as file system indexing, document indexing, and document retrieval and Internet search engines.

Phase II

Contract Number: 9901804
Start Date: 00/00/00    Completed: 00/00/00
Phase II year
Phase II Amount
This Small Business Innovation Research Phase II project from Teragram Corporation embodies the research and development in information retrieval and natural language processing that will locate within free text the precise answers to direct English questions, without relying on the traditional and unnatural search methods of keywords and Boolean search strings. The project will yield a new generation of searching and indexing technologies, one that will empower users to target existing answers with intuitive full sentence queries. In Phase II, Teragram will build a Question and Answer search engine whose performance will be inherently better than the generalized search engines in use today. Strategies to capture English question-and-answer pairings will be automatic, precise, flexible, and scaleable to enable the mastery of the very large information resources now common on the internetl and many intranets as well. The engine's indexing and matching processes will be uniquely sensitive to Q&A content so as to ensure the richest lookups possible. The engine will comb the entire World Wide Web or just one site of Frequently Asked Questions (FAQS) with equal ease. The effect will be to make the expert behind any good FAQs list, or similarly structured knowledge, come marvelously to life for the user's immediate benefits. Teragram Corporation developed natural language search tools, which have already yielded a large customer base. These powerful functions have been successfully deployed by a major Internet search engine. The research and development of Phase II for a Question-Answer search and indexing engine important for fact retrieval will have enormous commercial implications for Internet search engines, corporate Intranets, technical support systems, and other large scale documentation environments.