SBIR-STTR Award

Computational Tools for Genome Analysis
Award last edited on: 6/6/08

Sponsored Program
SBIR
Awarding Agency
NIH : NCHGR
Total Award Amount
$943,290
Award Phase
2
Solicitation Topic Code
-----

Principal Investigator
David Kulp

Company Information

Neomorphic Software Inc

2612 B 8th Street
Berkeley, CA 94710
   N/A
   N/A
   www.neomorphic.com
Location: Single
Congr. District: 13
County: Alameda

Phase I

Contract Number: 1R43HG001801-01
Start Date: 00/00/00    Completed: 00/00/00
Phase I year
1998
Phase I Amount
$99,500
The primary goal of this application is to develop and market software, that will employ hidden Markov models to predict gene structure from human EST sequence, genomic sequence, or a combination of both. The PI intends to use an iterative refinement approach. This technique is loosely modeled after a maximization methodology called the EM algorithm. This application proposes two variations to that approach. First, the use of stochastic models of gene structure and Bayesain statistics to estimate meters. Second, the use of user involvement in the iteration process. In a strict EM approach the algorithm itself iterates until a convergent to a solution is reached. The PI proposes to have a user involved in each step of the iteration. The intent of this is to ensure that the algorithm can convergent on a more optimal solution. Whenever possible, this tool will also identify protein motifs and sequence similarity in other sequences. Finally, it proposes to provide a Java based graphical user interface to visualize and integrate the results of the gene structure analysis

Phase II

Contract Number: 2R44HG001801-02
Start Date: 00/00/00    Completed: 00/00/00
Phase II year
1999
(last award dollars: 2000)
Phase II Amount
$843,790

Neomorphic Software, Inc. intends to develop new algorithmic analysis methods, software implementations, and annotated biosequence data sets for the elucidation of new genes and genomic and proteomic relationships. The motivation behind this SBIR grant is to develop new methods for the improved annotation of genomic, cDNA, and protein data given some or all of these types of data in consort. Faced with the massive sequencing efforts of expressed sequence tag (EST) and genomic data in both the public and private sector, Neomorphic's goal is to derive knowledge from data through the use of new statistical analysis techniques. An SBIR Phase II research project will continue with the success of phase I in which new Hidden Markov Model (HMM) based algorithmic methods were invented for the alignment, error correction, and homology identification of ESTs and the identification of genes in genomic DNA. Phase II research will focus on the annotation of nucleic acid sequences with specific emphasis on: 1. the identification of protein motifs, domains and remote homologies that would aid in the classification of ESTsequences that include relatively high rates of indels and substitutions. and are currently unclassified and 2.the identification and functional characterization of new genes using EST and protein homology information from preliminary consensus genomic DNA obtained from low coverage shotgun sequencing. The new analysis methods will aid scientists in assimilating evidence for the precise annotation of genomic or transcriptional (cDNA) data including intron/exon boundaries, UTR regions, transcription start sites and other regulatory elements, codon structure, frame shifts, base call corrections, single nucleotide polymorphisms (SNPs), alternative splicing, putative protein prediction, and associations with homologous protein sequences, families, and motifs. PROPOSED COMMERCIAL APPLICATIONS: Our software will allow biotechnology and pharmaceutical companies to mine EST databases for critical.new lead targets, and as further human genomic sequence becomes available and new functional genomics platforms are developed, to fully characterize human genes involved in critical disease pathways. We will contribute substantial value-added information to both public and private biosequence databases, greatly enhancing the value of this vital data