Reading text-based communications and documents comprises a substantial portion of the daily activity of knowledge workers such as operations personnel, engineers, managers, and other NASA staff. A growing percentage of these text documents are available in electronic formats. This situation, coupled with recent advances in computational linguistics, presents an opportunity for major productivity enhancements for NASA knowledge workers by providing software tools which can automatically extract information from text, enabling the user to analyze, evaluate, and respond to the information more quickly. We propose to build just such a tool, called Text Analyst, which will be designed to minimize the user-time required to train and customize the crucial information extraction subsystem. The system will employ a semi-automated approach to building a semantic lexicon of domain-relevant words and phrases through user seeding and feedback. The key innovation of Text Analyst is a new method of automatically creating information extraction templates. The method minimizes the user-time required to train the information extraction system by eliminating the need to annotate or classify a training corpus of texts. Thus, Text Analyst will be readily applied as a cognitive prosthetic and provide major cognitive assistance as addressed by Topic 24.02.
Potential Commercial Applications:The benefits to NASA and the commercial potential of Text Analyst are enormous, because Text Analyst is a major productivity booster for knowledge workers in general, regardless of their industry or discipline. Text Analyst extends the number of texts within the user's information reach and saves time in reading while still harvesting all the information needed. This applies to high technology industries, insurance (review of applications and claims), healthcare, financial services (Big 6 consulting firms), legal services, business intelligence gathering, and all levels of government (review of applications and reports), to name only a few. A conservative estimate is that three percent of the PC's sold each year in the United States would represent a potential buyer of Text Analyst. The market can be increased even further by either of two strategies, described in Section 10, both of which Naftware intends to pursue. With those strategies, the market for Text Analyst could soar to 5, 10, even 25 percent of the PC market. Such numbers are plausible given the fact that many millions of people regularly use web search engines.