Phase II year
1996
(last award dollars: 1997)
The long-term objective of this research is to provide computer system's that are useful for the analysis and solution of a number of problems in computational biology. The simulation tools will allow biologists to design models of protein families, compute multiple alignments, search through large data bases, and analyze DNA sequences and gene structure (exons/introns/promoters/...). Protein families under study, with clear medical interest, include immunoglobins, kinases, G-protein-coupled receptors, growth factors and retroviral proteins. The software methodology consists in taking advantage of recent progress in machine leaning algorithms, such as Hidden Markov Models (HMMs), to automatically extract pertinent information from massive amounts of biological data produced by genome and other sequencing efforts. At the same time, the research effort is aimed at optimizing the implementation of the software simulator developed during Phase I on various hardware architectures. Such Systems have widespread commercial applications in the biotechnology industry. For instance, sequence analysis often represents a key step towards the systematic detection of genetic defects responsible for hereditary forms of cancer, and other complex diseases. We intend to provide a line of fast and cost efficient computational systems tailored to the various needs of both academic and industrial organizations.Proposed Commercial Applications:In the short term, the simulator will be licensed to biological laboratories as a tool to perform multiple alignments, motif detections, data base searches, etc. Licensing will be made available under various system hardware configurations. In a longer term, a library of high quality HMM models of protein families and genomic functional elements will be constructed and made available to clients through electronic networks.