SBIR-STTR Award

Second Generation DNA Sequence Management Tools
Award last edited on: 8/31/04

Sponsored Program
SBIR
Awarding Agency
NIH : NCHGR
Total Award Amount
$1,381,875
Award Phase
2
Solicitation Topic Code
-----

Principal Investigator
Todd M Smith

Company Information

Geospiza Inc

100 West Harrison North Tower Suite 330
Seattle, WA 98119
   (206) 633-4403
   info@geospiza.com
   www.geospiza.com
Location: Single
Congr. District: 07
County: King

Phase I

Contract Number: 1R43HG002244-01
Start Date: 00/00/00    Completed: 00/00/00
Phase I year
2000
Phase I Amount
$98,238
The tremendous amounts of sequence data made available in recent time have increased the need to re-engineer existing bioinformatics algorithms for better performance. Our ability to organize human and mouse genomic, cDNA, and EST (expressed sequence tag) data, rapidly assemble microbial genomes, and compare sequences within and between organisms depends on programs that can operate on large amounts of data and be easily incorporated into scientific applications. In the case of the popular assembly program Phrap (P. Green, unpublished), performance improvements include the ability to perform incremental assemblies, where new sequence data are added to already assembled sequences, better memory management to accommodate larger data sets, and running the algorithm as a parallel process to reduce assembly times. Further Improvements include developing an API (Application Programming Interface) so that Phrap can be better incorporated into bioinformatics applications. In this project a prototype of Phrap will be developed that performs incremental assemblies and has improved memory management. New versions of Phrap will be structured to run as parallel processes. Finally, we will develop specifications for an API and an XML-DTD (eXtensible Markup Language - Data Type Definition) that will allow Phrap to be more efficiently incorporated into bioinformatics applications. PROPOSED COMMERCIAL APPLICATIONS: Phrap is widely used in industry and academia for applications involving DNA sequences. There are over 100 commercial sites that would benefit from new versions of Phrap that support incremental assemblies and utilize computer resources better. An API for Phrap will encourage application development creating additional commercialization possibilities for algorithm and application developers.

Phase II

Contract Number: 2R44HG002244-02
Start Date: 00/00/00    Completed: 00/00/00
Phase II year
2002
(last award dollars: 2004)
Phase II Amount
$1,283,637

The human genome project spurred the development of high throughput technologies, especially in the area of DNA sequencing. Not only has this effort produced a draft of the human genome, it's catalyzed development of an entire industry based on DNA sequencing and genomics. Since these technologies produce enormous amounts of data they depend on bioinformatics programs for data management. Phrap, Cross_Match, RepeatMasker and Consed are four programs that played an integral role in the human genome project and became accepted as standard. However, as the technology for sequencing has evolved, so too, have the applications. These new applications include sequencing additional genomes, EST cluster analysis, and genotyping and they have highlighted the need to update standard bioinformatics programs to meet the current needs of a broader community. In this project we will re-engineer Phrap, Cross_Match and Repeat Masker to improve performance by optimizing these algorithms and developing a hierarchical data file to store and manipulate assembled sequence data. Phrap and Cross_Match will also be modified to use XML-formatted data allowing users to apply constraints to sequence assembly. Lastly, we will develop a new program to review, edit, and manipulate sequences, thus giving users unprecedented control over their data. PROPOSED COMMERCIAL APPLICATION: Phrap is widely used in industry and academia for applications involving DNA sequences. There are over 100 commercial sites that would benefit from new versions of Phrap that support incremental assemblies and utilize computer resources better. An API for Phrap will encourage application development creating additional commercialization possibilities for algorithm and application developers.

Thesaurus Terms:
computer program /software, computer system design /evaluation, data management, informatics, nucleic acid sequence artificial intelligence, computer data analysis, genotype, mathematical model