Identifying disease causing genetic variants is a complex process, often involving the analysis of the genomes of affected patients and their immediate families. When employing these state of the art techniques, however, the number of patients receiving a successful diagnosis is still only at approximately 30%. One reason for this low diagnosis rate is the presence of localised data quality problems, which are often missed in otherwise high quality data; for example, a single exon in a gene skipped in a sequencing experiment. This proposal aims to develop an easy-to-use, web-based application to assess the quality of sequencing data in localized regions, e.g. human genes, built on the IOBIO platform. An prototype algorithm has already been developed to generate the quality information, and has already identified a number of problems in high-quality sequencing data; problems that have led to institutional bioinformatic pipelines being amended, and sequencing providers to resequence samples. The proposed application will be built on top of this algorithm and will be complementary to other IOBIO quality control apps that assess large-scale data quality metrics, and together, provide a comprehensive assessment of the data across multiple scales. The app will be designed to allow rapid and intuitive assessment of multiple samples simultaneously, since common analysis projects involve multiple related individuals, all of whom must have high-quality sequencing data available. Core IOBIO functionality to integrate IOBIO apps together will be developed, so users can easily jump between the IOBIO apps most relevant to the problem at hand. As users move between different apps, necessary information will be shared between them to provide a seamless experience for the user. The objective of this proposal is to develop a commercially viable product to significantly improve quality control of sequencing data, based on proven quality control metrics. Ultimately, this will improve the rate of diagnosing patients, and improve the cost and time efficiency of sequencing analysis projects.
Public Health Relevance Statement: Project narrative Research into the genetic basis of disease, and successfully diagnosing sick patients is increasingly becoming dependent on DNA sequencing. This project will develop an intuitive web-based application to ensure sequencing data in high impact genomic regions (e.g. genes) is of a high enough quality to enable successful diagnosis to be achieved. This project will additionally reduce the cost of analysis by ensuring problems with data are identified at the outset of a project, without requiring experienced bioinformaticians, and suggesting necessary actions prior to embarking on length analyses.
Project Terms: Address; Affect; Algorithms; Award; Binding Sites; Bioinformatics; Biological; blind; Child; citizen science; Clinical; Clinical Data; clinical diagnostics; Complex; Computer software; cost; Cost Analysis; Data; Data Quality; Data Set; Databases; design; Development; Diagnosis; Diagnostic; Disease; DNA; DNA Library; DNA sequencing; Ensure; exome; exome sequencing; Exons; experience; experimental study; Family; Future; gene discovery; Gene Targeting; Genes; Genetic; genetic panel test; genetic pedigree; genetic variant; Genome; genome analysis; genome sequencing; genome-wide; genomic data; Genomic Segment; Genomics; Hand; Human; Imagery; improved; Incidental Findings; Individual; infrastructure development; Intuition; Investigation; laptop; Length; Medical; Mendelian disorder; Natural regeneration; novel; Nucleic Acid Regulatory Sequences; Online Systems; Parents; Pathway interactions; Patients; Pattern; Phase; Phenotype; Preparation; Process; prototype; Provider; Quality Control; Research; Research Infrastructure; Sampling; Small Business Technology Transfer Research; Specific qualifier value; statistics; Techniques; Technology; Time; tool; transcription factor; Variant; Vendor; web app; whole genome