SBIR-STTR Award

Learning Systems for Biological Sequence analysis
Award last edited on: 6/2/09

Sponsored Program
SBIR
Awarding Agency
NIH : NIDDK
Total Award Amount
$565,225
Award Phase
2
Solicitation Topic Code
-----

Principal Investigator
Yves Chauvin

Company Information

NET ID Inc

4225 Via Arbolada Suite 500
Los Angeles, CA 90042
   (213) 222-1151
   N/A
   www.netid.com
Location: Single
Congr. District: 34
County: Los Angeles

Phase I

Contract Number: 1R43DK11499-01
Start Date: 00/00/00    Completed: 00/00/00
Phase I year
1994
Phase I Amount
$65,173
We will provide a computer software medium able to represent and analyze a number of problems in computational biology. The simulation tool will allow biologists to design models of protein families, compute multiple alignments, perform simulations of protein interactions, analyze DNA (or RNA) sequences, etc. The methodology consists in utilizing recent progress in machine learning algorithms to extract pertinent information from biological data. The interest of this approach stems from the richness and availability of sequence data bases and from the lack of complete theories covering all the underlying biological mechanisms. Specifically, we use learning systems such as Neural Networks or Hidden Markov Models (HMMs) to parse DNA sequences and to construct models of protein families with clear medical interest. These families include immunoglobulins, kinases (involved in the regulation of basic cellular processes), G-coupled receptors (involved in the transduction of signals carried by hormones and neurotransmitters), growth factors, and several retroviral proteins (such as HIV membrane proteins). These models provide new solutions for several computational problems such as multiple sequence alignments, motif detections, data base searches, protein classifications and genome parsing. Direct commercial applications of such software tools are immediate in biological laboratories and biotech industries.Awardee's statement of the potential commercial applications of the research:The product GnomeView Interface will provide access to work accomplished on the human genome project to researchers at genome centers, medical schools, pharmaceutical companies and universities. The emphasis on scientific visualization also gives GnomeView significant potential as a teaching tool.National Library of Medicine (NLM)

Phase II

Contract Number: 9R44AA11499-02
Start Date: 00/00/00    Completed: 00/00/00
Phase II year
1996
(last award dollars: 1997)
Phase II Amount
$500,052

The long-term objective of this research is to provide computer system's that are useful for the analysis and solution of a number of problems in computational biology. The simulation tools will allow biologists to design models of protein families, compute multiple alignments, search through large data bases, and analyze DNA sequences and gene structure (exons/introns/promoters/...). Protein families under study, with clear medical interest, include immunoglobins, kinases, G-protein-coupled receptors, growth factors and retroviral proteins. The software methodology consists in taking advantage of recent progress in machine leaning algorithms, such as Hidden Markov Models (HMMs), to automatically extract pertinent information from massive amounts of biological data produced by genome and other sequencing efforts. At the same time, the research effort is aimed at optimizing the implementation of the software simulator developed during Phase I on various hardware architectures. Such Systems have widespread commercial applications in the biotechnology industry. For instance, sequence analysis often represents a key step towards the systematic detection of genetic defects responsible for hereditary forms of cancer, and other complex diseases. We intend to provide a line of fast and cost efficient computational systems tailored to the various needs of both academic and industrial organizations.Proposed Commercial Applications:In the short term, the simulator will be licensed to biological laboratories as a tool to perform multiple alignments, motif detections, data base searches, etc. Licensing will be made available under various system hardware configurations. In a longer term, a library of high quality HMM models of protein families and genomic functional elements will be constructed and made available to clients through electronic networks.