In the broadest terms, the goal of the proposed work is to make it easier for researchers to apply robust, scalable, entity-centered, heterogeneous data access to the biomedical literature. 'Entity centered' means that information is indexed irrespective of what a surface mention looks like in any given data source. For example, there is a gene in FlyBase with synonyms in text as diverse as 'Foil" and "Mel(3)10", generic norminal referring expressions like 'The gene", pronouns like "it", as well as a FlyBase database id of CG5490. The Phase I proposal breaks down into two major efforts. First, extend the existing LingPipe suite of linguistic processing tools to the challenges of bioinformatics resulting in LingPipe-Bio. This will be distributed as an open source suite of tools to the research and entrepreneurial community with dual open source/commercial licensing. Second, it is proposed to adapt a current interface for entity centered data access (ThreatTracker for intelligence analysts) to BioTracker, based on the needs of biomedical researchers.
Thesaurus Terms: bioinformatics, computer program /software, computer system design /evaluation, indexing, information retrieval, nomenclature information system, publication