In November 2008, The Scientist opened an on-line opinion piece with the following quote: "After tens of billions of US federal dollars (plus billions more from private sources) and nearly 40 years of aggressive research, the war on cancer is depressingly far from over. Cancer will soon become the leading cause of death in America, passing heart disease. At some point in their lives, 43% of the public will get some form of cancer." While much progress has been made over the years, effective treatments for many forms of cancer are still lacking. Until the many forms of cancer are better understood, treatment options will continue to lag behind. Next generation DNA sequencing (NGS) technologies hold great promise as tools for building a new understanding of cancer and its origins. Deep sequencing provides more sensitive ways to detect the germline and somatic mutations that cause different types of cancer as well as identify new mutations within small subpopulations of tumor cells that can be prognostic indicators of tumor growth or drug resistance. The ultimate goal is to use NGS technologies in the clinic. Before this vision can be realized, many obstacles must be overcome. Assay costs must be significantly lowered and sample throughput must be substantially increased relative to today's capabilities. Achieving this goal will require that we have streamlined procedures for sample preparation and laboratory processes, a complete understanding of NGS systems, error profiles, and assay dynamics, and robust validatable software systems to support diagnostic tests in the clinical enterprise. Geospiza's FinchLab software platform addresses a large number of issues related to operating NGS instruments and laboratory processes in clinical environments. However, our understanding of NGS errors and how to completely characterize NGS datasets, with respect to their potential to deliver high quality information, is incomplete. Through the proposed research, Geospiza and collaborators at the Mayo Clinic will remove many of the obstacles that keep this vision of cancer diagnostics from becoming reality. In the Phase I project, we will test the feasibility of developing clinical systems by characterizing a limited number of NGS datasets for true variants, false positive, and false negative errors by cataloging discrepant bases relative to control sequences, with respect to sequence contexts, random noise, laboratory steps, and instrument artifacts. The catalogs will then be used to develop statistical algorithms that can analyze large numbers of aligned reads and assign variant detection probabilities to individual bases, as well as calculate summary statistics that can be used to assign descriptive values to datasets from individual samples, and subsequently identify sample artifacts and issues related to sample processing. Geospiza will combine the insights gained, and new software tools developed, into the FinchLab system to give researchers better ways to work with NGS data and more clear-cut methods for visualizing genetic assay results presented in web-based interfaces. In addition, Geospiza will promote community involvement by making many of the core algorithms available through BioConductor.
Public Health Relevance: The SBIR project "Software Systems for Detecting Rare Mutations" will deliver new software technologies to further advance the applications for deep DNA sequencing in personalized medicine by improving methods for detecting rare mutations that define cancer types and determine how a cancer cell may grow and respond to, or resist, treatment. In addition to improving cancer research and diagnostics, the software developed will have general use for any application where DNA sequencing is used to understand the genetic basis of human health, disease, and response to drug therapies.
Public Health Relevance Statement: Software Systems for Detecting Rare Mutations T. Smith, Geospiza, Inc. Project Narrative The SBIR project "Software Systems for Detecting Rare Mutations" will deliver new software technologies to further advance the applications for deep DNA sequencing in personalized medicine by improving methods for detecting rare mutations that define cancer types and determine how a cancer cell may grow and respond to, or resist, treatment. In addition to improving cancer research and diagnostics, the software developed will have general use for any application where DNA sequencing is used to understand the genetic basis of human health, disease, and response to drug therapies.
NIH Spending Category: Bioengineering; Cancer; Genetics; Human Genome; Networking and Information Technology R&D
Project Terms: Address; Algorithms; Americas; Artifacts; Assay; Basic Research; Basic Science; Bioassay; Bioconductor; Biologic Assays; Biological Assay; Biology; Cancer Diagnostics; Cancers; Cardiac Diseases; Cardiac Disorders; Cataloging; Catalogs; Cause of Death; Cells; Clinic; Clinical; Communities; Community Developments; Complex; Complex Mixtures; Computer Programs; Computer Software Tools; Computer software; DNA; DNA Sequence; Data; Data Set; Dataset; Deoxyribonucleic Acid; Detection; Development; Diagnostic; Diagnostic tests; Disease; Disorder; Drug Therapy; Drug resistance; Environment; Error Sources; Experimental Designs; Future; Gene variant; Genetic; Genetic Alteration; Genetic Change; Genetic Diversity; Genetic Variation; Genetic defect; Genome; Goals; Health; Heart Diseases; Human; Human, General; Individual; Investigators; Laboratories; Leadership; Life; Link; Malignant Cell; Malignant Neoplasms; Malignant Tumor; Man (Taxonomy); Man, Modern; Maps; Measures; Medicine; Methods; Methods and Techniques; Methods, Other; Morphologic artifacts; Mutation; Mutation Detection; Noise; On-Line Systems; Online Systems; Output; Pancreas Neoplasms; Pancreatic Tumor; Pharmacotherapy; Phase; Phenotype; Play; Population; Position; Positioning Attribute; Preparation; Probability; Procedures; Process; R01 Mechanism; R01 Program; RPG; Reading; Relative; Relative (related person); Research; Research Grants; Research Personnel; Research Project Grants; Research Projects; Research Projects, R-Series; Research Resources; Researchers; Resources; Role; SBIR; SBIRS (R43/44); Sampling; Sampling Biases; Sampling Studies; Science of Medicine; Science of Statistics; Scientist; Sensitivity and Specificity; Sequence Alignment; Sight; Small Business Innovation Research; Small Business Innovation Research Grant; Software; Software Tools; Somatic Mutation; Source; Statistics; System; System, LOINC Axis 4; Techniques; Technology; Testing; Time; Tools, Software; Tumor Cell; Tumor of the Pancreas; Variant; Variation; Variation (Genetics); Vision; War; Work; allelic variant; anticancer research; base; cancer cell; cancer research; cancer type; clinical applicability; clinical application; computational tools; computer program/software; computerized tools; cost; data management; design; designing; develop software; developing computer software; disease/disorder; drug resistant; effective therapy; experiment; experimental research; experimental study; genome mutation; heart disorder; improved; insight; instrument; malignancy; neoplasm/cancer; neoplastic cell; new technology; next generation; novel; online computer; open source; pancreatic neoplasm; prognostic; prognostic indicator; public health relevance; research study; resistance to Drug; resistant to Drug; response; single molecule; social role; software development; software systems; statistics; tool; trend; tumor; tumor growth; web based; web based interface