Multiple sequence alignment (MSA) is a core element of bioinformatics, comparative genomics, phylogenetics, molecular, and structural biology. Several excellent software tools for MSA construction are heavily used by the biomedical community; however, essentially all automatically constructed MSAs require significant editing (error corrections) prior to their analysis and utilization by downstream applications. The editing and analysis process is carried out using specialized software tools - MSA editors. Although several MSA editors are currently available, none of them offer a comprehensive set of features that are essential for effective MSA manipulation, editing, analysis, and export. The goal of the proposed research is to build a commercially-viable MSA editor, termed AlignShop, which streamlines the production of high-quality MSAs and thus accelerates the derivation of biological knowledge from sequence data. The central idea is to integrate several recently developed approaches to improve MSA quality and develop advanced visualization and handling in a single tool. This will be achieved by accomplishing the following Specific Aims: 1) Develop an approach to rapidly predict secondary structure for each protein sequence in MSA. 2) Incorporate analytical capabilities to facilitate extracting higher-order knowledge encoded within MSA. 3) Develop a comprehensive, modern interface for interacting with MSA. During Phase I we will develop AlignShop into a fully-functional, robust MSA editor and analysis tool that outperforms existing free and commercial editors. AlignShop is not intended to be an MSA building tool or a comprehensive bioinformatics workbench. Rather, we are solely focusing on building the best stand-alone tool for editing, analyzing, utilizing, and publishing MSAs, which can be used with all popular MSA building programs and downstream applications. This program will be useful to the broad community of computational and experimental biomedical scientists.
Public Health Relevance: The results of this research will provide scientists with the most capable and effective tool for editing and analyzing multiple sequence alignments, which are foundational for understanding the human genome, genetic diseases, properties of important microbial organisms including human pathogens and agents of infectious disease. Because of the critical role of multiple sequence alignments in virtually every area in biomedical science, this project will significantly impact and accelerate biomedical discovery.
Public Health Relevance Statement:The results of this research will provide scientists with the most capable and effective tool for editing and analyzing multiple sequence alignments, which are foundational for understanding the human genome, genetic diseases, properties of important microbial organisms including human pathogens and agents of infectious disease. Because of the critical role of multiple sequence alignments in virtually every area in biomedical science, this project will significantly impact and accelerate biomedical discovery.
Project Terms:Accounting; Algorithms; Amino Acid Sequence; Amino Acids; Area; Bio-Informatics; Bioinformatics; Biological; Blast Cell; Blasts; Classification; Communicable Diseases; Communities; Computer Software Tools; Data; Data Banks; Data Bases; Databank, Electronic; Databanks; Database, Electronic; Databases; Derivation; Derivation procedure; Elements; Ferrata cell; Generations; Genetic Algorithm; Genetic Condition; Genetic Diseases; Genetic Programming; Goals; Graphical interface; Hematohistioblast; Hemocytoblast; Hemohistioblast; Hereditary Disease; Human; Human Genome; Human, General; Imagery; Infectious Disease Pathway; Infectious Diseases; Infectious Diseases and Manifestations; Infectious Disorder; Investigators; Knowledge; Learning; Man (Taxonomy); Man, Modern; Manuals; Methods; Molecular; Molecular Biology, Protein Sequencing; Molecular Disease; Organism; Peptide Sequence Determination; Phase; Phylogenetic Analysis; Phylogenetics; Position; Positioning Attribute; Process; Production; Programs (PT); Programs [Publication Type]; Property; Property, LOINC Axis 2; Protein Family; Protein Sequencing; Protein Structure, Primary; Publications; Publishing; Research; Research Design; Research Personnel; Research Resources; Researchers; Resources; Role; Running; Science; Scientific Publication; Scientist; Secondary Protein Structure; Sequence Alignment; Sequence Determinations, Amino Acid; Sequence Determinations, Protein; Software Tools; Solutions; Source; Stress; Structure; Study Type; Systematics; Time; Tools, Software; Translations; Trees; Update; Visualization; aminoacid; base; biomedical scientist; clinical data repository; clinical data warehouse; comparative genomics; computational tools; computerized tools; data repository; design; designing; genetic disorder; graphic user interface; graphical user interface; hereditary disorder; improved; instructor; living system; markov model; member; microbial; novel; operation; pathogen; programs; protein sequence; protein structure prediction; public health relevance; relational database; social role; structural biology; study design; tool