SBIR-STTR Award

Improved Manuscript Search Through Pubseq
Award last edited on: 6/7/11

Sponsored Program
SBIR
Awarding Agency
NIH : NLM
Total Award Amount
$97,198
Award Phase
1
Solicitation Topic Code
-----

Principal Investigator
Yana Bromberg

Company Information

Biosof LLC

138 West 25th Street 10th Floor
New York, NY 10001
   (917) 378-5026
   main@bio-sof.com
   www.bio-sof.com
Location: Single
Congr. District: 12
County: New York

Phase I

Contract Number: 1R43LM010156-01
Start Date: 9/15/09    Completed: 9/14/10
Phase I year
2009
Phase I Amount
$97,198
Most top-level searches of scientific literature include querying of structured fields such as author, subject, or affiliation. A free-text search of abstracts or full texts entries would be more flexible allowing queries with any word combination including ranges of names and identifiers. Unfortunately, free text searches usually yield incomplete and often erroneous results since the naming of biologically important molecules (genes, proteins, substrates) is not standardized. Unless a specific query issued to a retrieval service (e.g. PubMed) covers all possible aliases of a given protein or gene the results may be insufficient or simply wrong. The system proposed here translates the problem of looking up literature pertaining to a certain protein to the sequence level. By correlating existing identifiers, names, and synonyms of proteins with their sequences this lookup increases the accuracy and coverage of the results. A particular challenge that our system will uniquely address is the following. Increasingly structural and functional genomics projects bring up proteins for which nothing is known. If someone published some new experimental that will actually name such a protein, this important knowledge will likely be lost to the genomics investigator because PubMed alarms need to be activated by keywords and names. Our system could fill in the gap: users will be able to deposit sequences corresponding to proteins of unknown function/name. If experimental information will be published for the same or a related sequence the original investigator will be notified.

Public Health Relevance:
The experimental and computational data appearing daily in publications is critical to the advancement of biological research. However, the sheer quantity and high frequency in which new data is published turns bench scientists into research librarians trying to sift through the flood of information while searching for relevant and reliable data. Furthermore, as biological research is increasingly driven by the study of proteins and genes that mostly lack annotations, or even an identifiers, there is a need to access the literature by using sequence data alone. By automating the process of searching and discovering relevant information as it becomes available, the proposed system promises to save time and increase the coverage of relevant and reliable data retrieved by a given search in an intuitive and "easy to consume" format.

Public Health Relevance:
This Public Health Relevance is not available.

Thesaurus Terms:
There Are No Thesaurus Terms On File For This Project.

Phase II

Contract Number: ----------
Start Date: 00/00/00    Completed: 00/00/00
Phase II year
----
Phase II Amount
----