SBIR-STTR Award

Software Systems For Detecting Rare Mutations
Award last edited on: 10/9/12

Sponsored Program
SBIR
Awarding Agency
NIH : NHGRI
Total Award Amount
$1,275,396
Award Phase
2
Solicitation Topic Code
-----

Principal Investigator
Todd M Smith

Company Information

Geospiza Inc

100 West Harrison North Tower Suite 330
Seattle, WA 98119
   (206) 633-4403
   info@geospiza.com
   www.geospiza.com
Location: Single
Congr. District: 07
County: King

Phase I

Contract Number: 1R43HG005297-01
Start Date: 00/00/00    Completed: 00/00/00
Phase I year
2009
Phase I Amount
$110,000
In November 2008, The Scientist opened an on-line opinion piece with the following quote: "After tens of billions of US federal dollars (plus billions more from private sources) and nearly 40 years of aggressive research, the war on cancer is depressingly far from over. Cancer will soon become the leading cause of death in America, passing heart disease. At some point in their lives, 43% of the public will get some form of cancer." While much progress has been made over the years, effective treatments for many forms of cancer are still lacking. Until the many forms of cancer are better understood, treatment options will continue to lag behind. Next generation DNA sequencing (NGS) technologies hold great promise as tools for building a new understanding of cancer and its origins. Deep sequencing provides more sensitive ways to detect the germline and somatic mutations that cause different types of cancer as well as identify new mutations within small subpopulations of tumor cells that can be prognostic indicators of tumor growth or drug resistance. The ultimate goal is to use NGS technologies in the clinic. Before this vision can be realized, many obstacles must be overcome. Assay costs must be significantly lowered and sample throughput must be substantially increased relative to today's capabilities. Achieving this goal will require that we have streamlined procedures for sample preparation and laboratory processes, a complete understanding of NGS systems, error profiles, and assay dynamics, and robust validatable software systems to support diagnostic tests in the clinical enterprise. Geospiza's FinchLab software platform addresses a large number of issues related to operating NGS instruments and laboratory processes in clinical environments. However, our understanding of NGS errors and how to completely characterize NGS datasets, with respect to their potential to deliver high quality information, is incomplete. Through the proposed research, Geospiza and collaborators at the Mayo Clinic will remove many of the obstacles that keep this vision of cancer diagnostics from becoming reality. In the Phase I project, we will test the feasibility of developing clinical systems by characterizing a limited number of NGS datasets for true variants, false positive, and false negative errors by cataloging discrepant bases relative to control sequences, with respect to sequence contexts, random noise, laboratory steps, and instrument artifacts. The catalogs will then be used to develop statistical algorithms that can analyze large numbers of aligned reads and assign variant detection probabilities to individual bases, as well as calculate summary statistics that can be used to assign descriptive values to datasets from individual samples, and subsequently identify sample artifacts and issues related to sample processing. Geospiza will combine the insights gained, and new software tools developed, into the FinchLab system to give researchers better ways to work with NGS data and more clear-cut methods for visualizing genetic assay results presented in web-based interfaces. In addition, Geospiza will promote community involvement by making many of the core algorithms available through BioConductor.

Public Health Relevance:
The SBIR project "Software Systems for Detecting Rare Mutations" will deliver new software technologies to further advance the applications for deep DNA sequencing in personalized medicine by improving methods for detecting rare mutations that define cancer types and determine how a cancer cell may grow and respond to, or resist, treatment. In addition to improving cancer research and diagnostics, the software developed will have general use for any application where DNA sequencing is used to understand the genetic basis of human health, disease, and response to drug therapies.

Public Health Relevance Statement:
Software Systems for Detecting Rare Mutations T. Smith, Geospiza, Inc. Project Narrative The SBIR project "Software Systems for Detecting Rare Mutations" will deliver new software technologies to further advance the applications for deep DNA sequencing in personalized medicine by improving methods for detecting rare mutations that define cancer types and determine how a cancer cell may grow and respond to, or resist, treatment. In addition to improving cancer research and diagnostics, the software developed will have general use for any application where DNA sequencing is used to understand the genetic basis of human health, disease, and response to drug therapies.

NIH Spending Category:
Bioengineering; Cancer; Genetics; Human Genome; Networking and Information Technology R&D

Project Terms:
Address; Algorithms; Americas; Artifacts; Assay; Basic Research; Basic Science; Bioassay; Bioconductor; Biologic Assays; Biological Assay; Biology; Cancer Diagnostics; Cancers; Cardiac Diseases; Cardiac Disorders; Cataloging; Catalogs; Cause of Death; Cells; Clinic; Clinical; Communities; Community Developments; Complex; Complex Mixtures; Computer Programs; Computer Software Tools; Computer software; DNA; DNA Sequence; Data; Data Set; Dataset; Deoxyribonucleic Acid; Detection; Development; Diagnostic; Diagnostic tests; Disease; Disorder; Drug Therapy; Drug resistance; Environment; Error Sources; Experimental Designs; Future; Gene variant; Genetic; Genetic Alteration; Genetic Change; Genetic Diversity; Genetic Variation; Genetic defect; Genome; Goals; Health; Heart Diseases; Human; Human, General; Individual; Investigators; Laboratories; Leadership; Life; Link; Malignant Cell; Malignant Neoplasms; Malignant Tumor; Man (Taxonomy); Man, Modern; Maps; Measures; Medicine; Methods; Methods and Techniques; Methods, Other; Morphologic artifacts; Mutation; Mutation Detection; Noise; On-Line Systems; Online Systems; Output; Pancreas Neoplasms; Pancreatic Tumor; Pharmacotherapy; Phase; Phenotype; Play; Population; Position; Positioning Attribute; Preparation; Probability; Procedures; Process; R01 Mechanism; R01 Program; RPG; Reading; Relative; Relative (related person); Research; Research Grants; Research Personnel; Research Project Grants; Research Projects; Research Projects, R-Series; Research Resources; Researchers; Resources; Role; SBIR; SBIRS (R43/44); Sampling; Sampling Biases; Sampling Studies; Science of Medicine; Science of Statistics; Scientist; Sensitivity and Specificity; Sequence Alignment; Sight; Small Business Innovation Research; Small Business Innovation Research Grant; Software; Software Tools; Somatic Mutation; Source; Statistics; System; System, LOINC Axis 4; Techniques; Technology; Testing; Time; Tools, Software; Tumor Cell; Tumor of the Pancreas; Variant; Variation; Variation (Genetics); Vision; War; Work; allelic variant; anticancer research; base; cancer cell; cancer research; cancer type; clinical applicability; clinical application; computational tools; computer program/software; computerized tools; cost; data management; design; designing; develop software; developing computer software; disease/disorder; drug resistant; effective therapy; experiment; experimental research; experimental study; genome mutation; heart disorder; improved; insight; instrument; malignancy; neoplasm/cancer; neoplastic cell; new technology; next generation; novel; online computer; open source; pancreatic neoplasm; prognostic; prognostic indicator; public health relevance; research study; resistance to Drug; resistant to Drug; response; single molecule; social role; software development; software systems; statistics; tool; trend; tumor; tumor growth; web based; web based interface

Phase II

Contract Number: 2R44HG005297-02
Start Date: 9/30/09    Completed: 12/31/12
Phase II year
2011
(last award dollars: 2012)
Phase II Amount
$1,165,396

Next generation DNA sequencing (NGS) technologies hold great promise as tools for building a new understanding of health and disease. In the case of understanding cancer, deep sequencing provides more sensitive ways to detect the germline and somatic mutations that cause different types of cancer as well as identify new mutations within small subpopulations of tumor cells that can be prognostic indicators of tumor growth or drug resistance. Completing the transition from proof of principal applications to practical applications, however, requires that many basic and clinical research groups to be able to effectively utilize NGS. Ongoing technical developments and intense vendor competition amongst NGS platform and service providers are commoditizing data collection costs making systems more assessable. However, the single greatest impediment to the adoption of NGS technology is the lack of systems that create easy access to the immense bioinformatics and IT infrastructures needed to work with the data. In the case of variant analysis, such systems will need to process very large datasets, and accurately predict common, rare, and de novo levels of variation. Genetic variation must be presented in an annotation-rich, biological context to determine the clinical utility, frequency, and putative biological impact. Software systems used for this work must integrate data from many samples together with resources ranging from core analysis algorithms to application specific datasets to annotations, all woven into computational systems with interactive user interfaces (UIs). Such end-to-end systems currently do not exist. In this project, Geospiza will create integrated methods for robust detection and rich contextualization of genetic variants. Using variation analysis in cancer genomics as a model system, we will conduct research to improve assay sensitivity by deeply characterizing data from existing and emerging NGS platforms, quality value (QV) recalibration tools, and alignment algorithms, to understand the systematic artifacts that create errors in the data. To improve how researchers understand a variant's biological context, function and potential clinical utility, we will develop methods to combine assay results from many samples with de novo NGS datasets for assays like RNA-Seq and existing data such as those in GEO and SRA, and information resources from dbSNP, cancer genome databases, and ENCODE. Finally, we will develop the necessary scalable computing infrastructure and novel UI's needed to organize and process the data and explore and annotate the results. Through this work, and follow on product development, we will produce integrated sensitive assay systems that harness NGS for identifying very low (1:1000) levels of changes between DNA sequences to detect cancerous mutations and emerging drug resistance. Our tools and infrastructure can be later applied in assays designed to follow viral epidemics, and understand autoimmune disorders.

Public Health Relevance:
The SBIR project ""Software Systems for Detecting Rare Mutations"" will deliver new software technologies to further advance the applications for deep DNA sequencing in personalized medicine by improving methods for detecting rare mutations that define cancer types and determine how a cancer cell may grow and respond to, or resist, treatment. In addition to improving cancer research and diagnostics, the software developed will have general use for any application where DNA sequencing is used to understand the genetic basis of human health, disease, and response to drug therapies.

Thesaurus Terms:
Adoption;Algorithms;Artifacts;Assay;Autoimmune Diseases;Base Sequence;Basic Research;Basic Science;Bio-Informatics;Bioassay;Bioinformatics;Biologic Assays;Biologic Models;Biological;Biological Assay;Biological Models;Birth;Cancer Diagnostics;Cancerous;Cancers;Clinical;Clinical Research;Clinical Study;Collection;Computer Software Tools;Computer Software;Dna Sequence;Data;Data Banks;Data Bases;Data Collection;Data Set;Databanks;Databases;Dataset;Detection;Development;Diagnostic;Disease;Disorder;Documentation;Drug Therapy;Drug Resistance;Electronic Databank;Electronic Database;Environment;Epidemic;Frequencies (Time Pattern);Frequency;Future;Gene Variant;Genetic;Genetic Alteration;Genetic Change;Genetic Diversity;Genetic Variation;Genetic Defect;Genome;Genomics;Germ-Line Mutation;Germline Mutation;Goals;Health;Hereditary Mutation;Human;Imagery;Individual;Informatics;Information Resources;Infrastructure;Investigators;Knowledge;Loinc Axis 4 System;Laboratories;Life;Malignant Cell;Malignant Neoplasms;Malignant Tumor;Man (Taxonomy);Marketing;Measurement;Measures;Medicine;Methods;Model System;Modeling;Modern Man;Morphologic Artifacts;Mutation;Non-Polyadenylated Rna;Nucleotide Sequence;Nucleotides;Parturition;Patients;Pharmacotherapy;Process;Proteins;Provider;Rna;Rna Gene Products;Reporting;Research;Research Infrastructure;Research Personnel;Research Resources;Researchers;Resources;Ribonucleic Acid;Sbir;Sbirs (R43/44);Sampling;Sensitivity And Specificity;Services;Sight;Small Business Innovation Research;Small Business Innovation Research Grant;Software;Software Tools;Somatic Mutation;System;Systematic Bias;Technology;Translating;Tumor Cell;Variant;Variation;Variation (Genetics);Vendor;Viral;Vision;Visualization;Work;Allelic Variant;Anticancer Research;Application In Practice;Autoimmune Disorder;Base;Cancer Cell;Cancer Genome;Cancer Genomics;Cancer Research;Cancer Type;Clinical Data Repository;Clinical Relevance;Clinically Relevant;Commercialization;Computer Program/Software;Computerized Data Processing;Cost;Data Integration;Data Processing;Data Reduction;Data Repository;Design;Designing;Develop Software;Developing Computer Software;Developmental;Disease/Disorder;Drug Resistant;Functional Genomics;Gene Product;Genetic Variant;Genome Database;Genome Mutation;Improved;Information Resource;Innovate;Innovation;Innovative;Insight;Knowledge Resource;Knowledge Resources;Malignancy;Neoplasm/Cancer;Neoplastic Cell;Next Generation;Novel;Nucleic Acid Sequence;Oncogenomics;Pathogen;Practical Application;Product Development;Prognostic Indicator;Prototype;Resistance To Drug;Resistant To Drug;Response;Signal Processing;Software Development;Software Systems;Tool;Tumor Growth;User-Friendly;Visual Function;Web-Enabled