We propose to develop a novel, cost-effective, cloud-based data and analytics platform that will provideefficient data storage solutions and enhanced analytics, annotation and reporting capabilities for supportingand accelerating clinical and molecular research in the treatment of substance use disorders (SUD). This opensource platform, which leverages existing BioDX technology, will provide a centralized, multi-user environmentthat enables and encourages collaborative research and information dissemination among team members. One of the unmet infrastructural challenges of modern molecular research is the availability ofcomputational platforms that allow the management of large databases, easy access to data, the availability ofpowerful customizable tools for data mining, analysis and visualization, and integration of different datasources to allow successful analysis of complex data problems. Such problems are commonplace in high-throughput molecular research. This proposal aims to fill this gap by developing a robust platform thatintegrates state-of-the-art open-source technologies for data storage, data access, data mining and analysis,annotation, visualization and reporting. We previously developed a cloud-based BioDatomics platform for Next Generation Sequencing (NGS),BioDX, which has been successful and has been used commercially by several clients. This proposal aims todevelop a new platform leveraging our experience with the BioDX platform that integrates: data storage andreal-time data querying using Cloudera Impala; powerful and customizable analytics tools using R and itsderivative Bioconductor suite of programs for bioinformatics; annotation integration and reporting which is anexisting feature of BioDX; and a visual programming interface that will simplify and enhance the developmentand maintenance of reproducible analytics workflows. We believe this powerful integrated data platform, ifsuccessful, will enable real-time collaboration, dramatically reduce data repository costs, and increase theefficiency and efficacy of data analyses for translating experimental data into actionable research products. We are committed to analyzing stakeholder needs and optimizing hardware, software and informationtechnology systems to meet their demands. This platform will enhance stakeholder capabilities for developing,implementing and testing various models for substance addiction, risky behavior, discovery of moleculartargets for treatment, genomic profiling of patients and other relevant scientific questions. Users will haveaccess to modern statistical, machine learning, data mining and visualization tools. The initial phase of work will involve development of the platform, optimizing performance on the cloudand testing the integration of new technology. BioDatomics is committed to funding the next phase of workwhich will include usability testing and finalizing a commercial product, following which full commercializationwill proceed. Preliminary commercialization plans have demonstrated that the project has the capacity togenerate a million dollars in revenue during the first full year after commercial release. The ultimate beneficiaries of this platform will be government agencies, academic researchers andpharmaceutical companies pursuing collaborative projects to discover treatments for substance abusedisorders. This open source platform will enable significant savings to the end users in terms of data storageand analytic capabilities, and promises to have a major impact in increasing the success of molecular, clinicaland translational research for substance abuse disorders.
Thesaurus Terms: Academia;Apache Indians;Base;Beneficiary;Bioconductor;Bioinformatics;Biological;Businesses;Centers For Disease Control And Prevention (U.S.);Client;Clinical;Clinical Research;Cloud Based;Collaborations;Commercialization;Commit;Complex;Computer Infrastructure;Computer Software;Cost;Cost Effective;Cost Savings;Custom;Data;Data Analyses;Data Mining;Data Set;Data Sources;Data Storage And Retrieval;Databases;Development;Disease;Distributed Databases;Drug Discovery;Drug Industry;Environment;Expenditure;Experience;Funding;Genomics;Government Agencies;Growth;Imagery;Improved;Industry;Information Dissemination;Information Systems;Java;Language;Licensing;Link;Machine Learning;Maintenance;Marketing;Mathematical Model;Meetings;Member;Messenger Rna;Modeling;Models And Simulation;Molecular;Molecular Target;Mutation;New Technology;Next Generation Sequencing;Novel;Online Systems;Open Source;Patients;Performance;Pharmacologic Substance;Phase;Programming Languages;Programs;Public Health Relevance;Reporting;Research;Research Infrastructure;Research Personnel;Risk Behaviors;Savings;Services;Software Engineering;Software Tools;Solutions;Speed (Motion);Statistical Data Interpretation;Statistical Models;Substance Abuse Problem;Substance Abuse Treatment;Substance Addiction;Substance Use Disorder;Success;System;Technology;Testing;Time;Tool;Training;Translating;Translational Research;United States National Institutes Of Health;Usability;User-Friendly;Visual;Work;