The human transcriptome is significantly more complex than its cognate genome, due to the hundreds of thousands of possible isoforms, allele-specific expression issues, variable RNA editing changes, and differential expression patterns spanning cell types, developmental stages, and physiological stresses. Next- generation sequencing (NGS) platforms are fundamentally altering genetic and genomic research by providing massive amounts of data in a low-cost, high-throughput format. The main drawback of existing technologies is the short sequence read lengths they produce (Illumina) or the high error rate (PacBio). Identifying single nucleotide variations is problematic with the long read technology and de novo assembly of most transcripts is compromised with short read NGS technologies alone. Even with a high quality reference human genome (which is a mosaic of the parental alleles), transcriptome sequencing and assembly is a significant challenge. Haplotyping across an entire mRNA is critical for understanding the full extent of RNA editing and is not readily achieved without resorting to cloned DNA. New tools that bridge the gap between massively parallel short read sequencing technologies and the need to assemble complete mRNA molecules are clearly needed. The SBIR Phase I of this grant proposes to develop the short read NGS technology to accurately sequence mRNAs along their entire length, regardless of size. This technology will enable the accurate assembly of complex transcriptomes, without cDNA cloning and primer walking using Sanger sequencing based strategies. The development of these tools could enable the de novo sequencing of daunting transcriptomes, reduce computational costs of transcriptome assembly significantly, produce more complete and accurate catalogs of RNA edited transcripts, and make personal transcriptome resequencing tractable.
Public Health Relevance Statement: Public Health Relevance: Narrative RNA editing is a cellular mechanism that changes genomically encoded information in the expressed RNA transcripts. Dysregulation of RNA editing is implicated in a number of human disorders, primarily neurological and behavioral diseases. We will develop new technologies to decipher where and how RNA editing events are distributed in human transcripts, which regulatory sequences control the RNA editing enzymes, and identify other proteins involved in editing. These technologies can unlock the genetic basis of gene regulation in healthy and diseased states.
Project Terms: Affect; Alleles; Alternative Splicing; Animals; base; Behavioral; Benchmarking; Biochemical; Bioinformatics; Biology; Brain; Cataloging; Catalogs; cDNA Library; cell type; Cells; Cellular biology; Chimera organism; Cloning; Code; Complementary DNA; Complex; Computer software; computerized data processing; computerized tools; cost; Data; Data Analyses; Development; Dideoxy Chain Termination DNA Sequencing; Disease; DNA; DNA Resequencing; Enzymes; Epigenetic Process; Event; Exons; Face; Gene Expression Profile; Gene Expression Regulation; Generations; Genes; Genetic; Genome; genome sequencing; Genomics; Goals; Grant; Haplotypes; Health; Human; Individual; innovation; Length; Libraries; Malignant Neoplasms; Maps; Messenger RNA; Methods; Modification; Natural regeneration; Neurologic; neurological pathology; new technology; next generation sequencing; novel; Nucleotides; open source; Pattern; Phase; Physiological; Plants; Process; programs; Protein Isoforms; Proteins; Protocols documentation; public health relevance; Reading; Regulation; Research; Resort; Reverse Transcription; RNA; RNA Editing; RNA Sequences; RNA Splice Sites; RNA Splicing; Sampling; Scientist; Sequence Analysis; single molecule; Site; Small Business Innovation Research Grant; Staging; Stress; success; Technology; Time; tool; tool development; Transcript; transcriptome sequencing; transcriptomics; Universities; Variant; Walking; Wisconsin