This project will develop an integrated desktop application to combine data from expression array, RNA transcript array, proteomics, SNP array (for polymorphism an analysis, as well as LOH and copy number determination), methylation array, histone modification array, promoter array, and microRNA array and metabolomics technologies. Current approaches to analysis of individual `omic' technologies suffer from problems of fragmentation, that present an incomplete view of the workings of the cell. However, effective integration into a single analytic platform is non-trivial. There is a need for a consistent approach, infrastructure, and interface between array types, to maximize ease of use, while recognizing and accommodating the specific computational and statistical requirements, and biological context, of each array. A central challenge is the need to create and work with lists of genomic regions of interest (GROIs) for each sample: we propose three novel approaches to aid in identification of GROIs. These lists must then be integrated with rectangular (sample by feature) data arrays to facilitate statistical analysis. Integration between array types occurs at the computational level, through a unified software package, statistically, through tools that seek statistical relationships between features from different arrays, biologically, through use of annotations (particularly gene ontology, protein- protein and protein-DNA interactions, and pathway membership) that document functional relationships between features, and through genomic interactions that suggest relationships between features that map to the same regions of the genome. The end product will support analysis of each platform separately, with a comprehensive suite of data management, statistical and heuristic analytic tools and the means to place findings of interest into a meaningful biological context through cross-reference to extensive biobases. Beyond that, a range of methods - statistical, biological and genomic - will be available to explore interactions and associations between platforms. PDF created with PDF Factory trial version www.pdffactory.com.
Public Health Relevance: While the large-scale array technologies have provided an unprecedented capability to model cellular processes, both in normal functioning and disease states, this capability is utterly dependent on the availability of complex data management, computational, statistical and informatic software tools. The utility of the next generation of arrays - which focus on critical regulation and control functions of the cell - will be stymied by an initial lack of suitable bioinformatic tools. This proposal initiates an accelerated development of an integrated software package intended to empower biologists in the application and analysis of these powerful new technologies, with broadly reaching impact at all levels of biological and clinical research, and across every discipline.
Thesaurus Terms: Algorithms; Allelic Loss; Alternate Splicing; Alternative Splicing; Articulation; Binding; Binding (Molecular Function); Bio-Informatics; Bioinformatics; Biological; Biological Neural Networks; Bite; Cell Function; Cell Process; Cell Physiology; Cells; Cellular Function; Cellular Physiology; Cellular Process; Classification; Clinical Data; Clinical Research; Clinical Study; Complex; Computer Programs; Computer Software Tools; Computer Software; Dna Copy Number; Dna-Protein Interaction; Data; Data Linkages; Development; Discipline; Disease; Disorder; Documentation; Evaluation; Gwas; Gene Expression; Gene Products, Rna; Genes; Genome; Genomic Segment; Genomics; Goals; Heating; Imagery; Individual; Informatics; Infrastructure; Internet; Joints; Learning, Machine; Link; Linkages, Data; Loss Of Heterozygosity; Machine Learning; Maps; Methylation; Micro Rna; Micrornas; Modeling; Molecular Interaction; Ontology; Pathway Interactions; Phase; Polymorphism Analysis; Polymorphism Detection; Process; Promoter; Promoters (Genetics); Promotor; Promotor (Genetics); Protein Methylation; Proteins; Proteomics; Rna; Rna Splicing, Alternative; Rna, Non-Polyadenylated; Record Linkage Study; Regulation; Research Infrastructure; Research Resources; Resources; Ribonucleic Acid; Sampling; Software; Software Tools; Sorting - Cell Movement; Statistical Methods; Structure; Subcellular Process; Systematics; Systems Biology; Txt; Technology; Testing; Text; Tools, Software; Transcript; Visualization; Www; Work; Base; Computer Program/Software; Data Management; Disease/Disorder; Empowered; Gene Product; Genome Wide Association Scan; Genome Wide Association Studies; Genome Wide Association Study; Genome-Wide Analysis; Genome-Wide Scan; Genomewide Association Scan; Genomewide Association Studies; Genomewide Association Study; Genomewide Scan; Heuristics; High Throughput Technology; Histone Modification; Interest; Kernel Methods; Measurement Of Metabolism; Metabolomics; Mirna; Neural Network; New Approaches; New Technology; Next Generation; Novel; Novel Approaches; Novel Strategies; Novel Strategy; Pathway; Prognostic; Public Health Relevance; Sorting; Statistical Learning; Support Vector Machine; Tool; Tool Development; Web; Whole Genome Association Studies; Whole Genome Association Study; World Wide Web