Together with the ability to measure genome-wide expression of millions of individual cells, single-cell technologies have also brought the challenge of translating such data into a better understanding of the underlying biological phenomena. Existing computational methods and software for single-cell data analysis have critical limitations related to scalability, accuracy, usability, and interpretation capabilities. The main goal of this project is to pioneer a new platform for the analysis of single-cell data that is capable of: i) accurately identifying cell types and their composition in complex tissues, ii) inferring cell developmental stages and pseudo- time trajectories, and iii) identifying cell-type-specific pathways and putative mechanisms in a phenotype comparison. The proposed platform will also be able to deconvolve bulk expression data to identify the cell type composition of each bulk sample. The significance of the proposed work lies in its potential to provide new methodologies for single-cell data analysis that far exceed the performance of current state-of-the-art techniques. The accurate deconvolution will also allow researchers to extract more information from the vast repositories of existing bulk data, including GDC/TCGA, NCBI SRA, GEO, and ArrayExpress, which are currently containing data from bulk experiments that collectively cost over a billion dollars. The hypothesis driving this work is that single-cell data analysis and cellular deconvolution of bulk data can greatly benefit from: i) the systems-level knowledge that holds key characteristics for cellular developments, and ii) the valuable information available in validated cell types and reference single-cell datasets available in single-cell atlases. Indeed, our preliminary work shows that single-cell data analysis and cellular deconvolution can achieve an outstanding accuracy of approximately 90-100% if we properly utilize reference single-cell datasets and pathway knowledge. The proposed platform will be extensively validated by comparing its capabilities against the state-of-the-art software in both single-cell data analysis (cell type identification, developmental states and time-trajectory inference, systems-level analysis) and cellular deconvolution of bulk expression data. This will be done using both 663 datasets representing 279 cell types and 116 human organ parts (including bulk data, single-cell data, and matched cell flow cytometry). The pathway analysis and mechanisms inference capabilities will be further validated using real knock-out datasets (in which the true cause of the phenotype is known). The company, Advaita, has a strong IP portfolio, an experienced team, and a proven track record in this area, having developed and commercialized similar analysis platforms. Advaita's existing products are currently used by top principal investigators, core facilities, and pharmaceutical companies around the world.
Public Health Relevance Statement: PROJECT NARRATIVE Single-cell technologies are revolutionizing the progress of biomedical research because they allow researchers to monitor the biological systems at the single-cell resolution. This project will develop a complete analysis pipeline and web-based platform able to accurately identify cell type composition from both single-cell and bulk data, infer the developmental stages and trajectories of cells, and perform cell-type-specific, systems-level analysis. This platform is expected to have a major impact in many areas where single-cell technologies are currently used, including diagnostics, drug discovery, immunology, neurobiology, and cancer research.
Project Terms: Atlases; Automobile Driving; driving; Biological Monitoring; Biologic Monitoring; biomonitoring; Biological Phenomena; Biologic Phenomena; Biomedical Research; Cell Extracts; Cells; Cell Body; Computing Methodologies; computational methodology; computational methods; computer based method; computer methods; computing method; Data Analyses; Data Analysis; data interpretation; Feedback; Flow Cytometry; Flow Cytofluorometries; Flow Cytofluorometry; Flow Microfluorimetry; Flow Microfluorometry; flow cytophotometry; Gene Expression; Genes; Goals; Human; Modern Man; Industry; Methods; Methodology; Mission; Music; Neurobiology; neurobiological; Phenotype; Research Personnel; Investigators; Researchers; Computer software; Software; Technology; Time; Tissues; Body Tissues; Translating; Work; Measures; Data Set; Immunology; bases; base; Organ; Area; Phase; biologic; Biological; Individual; Measurement; tool; Diagnostic; Knowledge; Life; Scientist; Complex; cell type; Techniques; System; 3-Dimensional; 3-D; 3D; three dimensional; experience; Performance; single cell analysis; user friendly software; user friendly computer software; Pathway Analysis; Network Analysis; Sampling; repository; depository; software development; develop software; developing computer software; drug discovery; Bio-Informatics; Bioinformatics; Pharmaceutical Agent; Pharmaceuticals; Pharmacological Substance; pharmaceutical; Pharmacologic Substance; Academia; Core Facility; Data; Resolution; resolutions; Whole Organism; Principal Investigator; Characteristics; Knock-out; Knockout; Development; developmental; Pathway interactions; pathway; cost; anti-cancer research; cancer research; anticancer research; designing; design; usability; comparative; commercialization; genome scale; genomewide; genome-wide; cellular development; biological systems; TCGA; The Cancer Genome Atlas; RNA Seq; RNA sequencing; RNAseq; transcriptomic sequencing; transcriptome sequencing; single cell sequencing; global gene expression; global transcription profile; transcriptome; phenotypic data; single cell technology; experiment; experimental research; experiments; experimental study; high dimensionality; scRNA-seq; single cell RNA-seq; single cell RNAseq; single cell expression profiling; single cell transcriptomic profiling; single-cell RNA sequencing; deep learning; unsupervised machine learning; unsupervised learning; random forest; analysis pipeline; learning algorithm; internet based platform; internet platform; web based platform; web based system; web enabled platform; web platform; gradient boosting; transfer learning