NIH 2009 Improved Manuscript Search Through Pubseq

Improved Manuscript Search Through Pubseq
Award last edited on: 6/7/11

Awarding Agency

NIH : NLM

Total Award Amount

$97,198

Award Phase

Solicitation Topic Code

-----

Principal Investigator

Yana Bromberg

Biosof LLC

138 West 25th Street 10th Floor
New York, NY 10001

(917) 378-5026

main@bio-sof.com

www.bio-sof.com

Location: Single
Congr. District: 12
County: New York

Phase I

Contract Number: 1R43LM010156-01
Start Date: 9/15/09 Completed: 9/14/10

Phase I year

2009

Phase I Amount

$97,198

Most top-level searches of scientific literature include querying of structured fields such as author, subject, or affiliation. A free-text search of abstracts or full texts entries would be more flexible allowing queries with any word combination including ranges of names and identifiers. Unfortunately, free text searches usually yield incomplete and often erroneous results since the naming of biologically important molecules (genes, proteins, substrates) is not standardized. Unless a specific query issued to a retrieval service (e.g. PubMed) covers all possible aliases of a given protein or gene the results may be insufficient or simply wrong. The system proposed here translates the problem of looking up literature pertaining to a certain protein to the sequence level. By correlating existing identifiers, names, and synonyms of proteins with their sequences this lookup increases the accuracy and coverage of the results. A particular challenge that our system will uniquely address is the following. Increasingly structural and functional genomics projects bring up proteins for which nothing is known. If someone published some new experimental that will actually name such a protein, this important knowledge will likely be lost to the genomics investigator because PubMed alarms need to be activated by keywords and names. Our system could fill in the gap: users will be able to deposit sequences corresponding to proteins of unknown function/name. If experimental information will be published for the same or a related sequence the original investigator will be notified.

Public Health Relevance:
The experimental and computational data appearing daily in publications is critical to the advancement of biological research. However, the sheer quantity and high frequency in which new data is published turns bench scientists into research librarians trying to sift through the flood of information while searching for relevant and reliable data. Furthermore, as biological research is increasingly driven by the study of proteins and genes that mostly lack annotations, or even an identifiers, there is a need to access the literature by using sequence data alone. By automating the process of searching and discovering relevant information as it becomes available, the proposed system promises to save time and increase the coverage of relevant and reliable data retrieved by a given search in an intuitive and "easy to consume" format.

Public Health Relevance:
This Public Health Relevance is not available.

Thesaurus Terms:
There Are No Thesaurus Terms On File For This Project.

Phase II

Contract Number: ----------
Start Date: 00/00/00 Completed: 00/00/00

Phase II year

----

Phase II Amount

----

SBIR-STTR Award

Improved Manuscript Search Through Pubseq
Award last edited on: 6/7/11

Sponsored Program

Awarding Agency

Total Award Amount

Award Phase

Solicitation Topic Code

Principal Investigator

Company Information

Biosof LLC

Phase I

Phase I year

Phase I Amount

Phase II

Phase II year

Phase II Amount

New To Inknowvation.com?

SBIR-STTR Award

Improved Manuscript Search Through PubseqAward last edited on: 6/7/11

Sponsored Program

Awarding Agency

Total Award Amount

Award Phase

Solicitation Topic Code

Principal Investigator

Company Information

Biosof LLC

Phase I

Phase I year

Phase I Amount

Phase II

Phase II year

Phase II Amount

Improved Manuscript Search Through Pubseq
Award last edited on: 6/7/11