Proteins are responsible for much of the structure and function of all cells. Subtle changes in expression of var- ious protein forms are critical for proper growth and development, but irregularities can cause deleterious cellular effects or large-scale biological dysfunction. Sequencing samples with both high- and low-abundance proteins could greatly accelerate research into protein function and biology, but there is currently no efficient and cost-effective strategy to sequence mixtures of unknown protein molecules at single-amino-acid resolution. Two methods are commercially available for protein sequencing. The first method, âEdman degradationâ, re- quires purification of the individual target protein. Bulk quantities of whole protein or purified fragments are sequenced by cleaving off the first (N-terminal) amino acid and chemically identifying it. The second method, based on mass spectrometry, requires enzymatically degrading a single protein or mixture of proteins into small fragments, then analyzing the molecular mass and charge of each fragment. This information is com- pared to that of known protein sequences to infer the identity of the input proteins. Both of these methods require ~1 million molecules of each protein for detection. Currently, Edman degradation cannot be used on heterogeneous protein mixtures, further limiting its utility. Single molecule protein sequencing is hindered by the number and diversity of amino acids, as well as the in- teractions between amino acids that interfere with chemical identification of their side chains. Identifying N- terminal amino acid that is still attached to the rest of the protein will be hindered by the N-1 (and N-2) amino acids, proportional to the bulk of the side chain. Harsh denaturation agents can mitigate some of these issues. However, these reagents can compromise the biomolecule-based identification systems themselves and do not fully remove the steric hindrance, affecting the access to the N-terminal amino acid. Glyphic Biotechnologies has developed a novel âNext-Generationâ protein sequencing strategy, in which DNA barcodes associate rounds of cleaved N-terminal amino acid with a protein-specific barcode. Each of the 20 different amino acids will be first cleaved (circumventing the stearic hindrance of the N-1 amino acid) and then captured by specific antibodies. Each amino acid will then be associated with two barcodes, indicating the (1) originating protein and (2) sequential position this amino acid can be found in. After next-generation DNA se- quencing of all conjugated barcodes, this information can be deconvoluted â placing each amino acid into the correct position within the correct protein. This approach has the potential to be scaled to sequence millions to billions of single molecules simultaneously in hours. Developing this technology will revolutionize protein analysis by making large-scale protein sequenc- ing feasible, inexpensive, and routine.
Public Health Relevance Statement: NARRATIVE No current technology is capable of sequencing individual proteins from beginning to end. Glyphic Biotechnol- ogies plans to develop a novel method of single-molecule protein sequencing, which will bring improvements analogous to those of Next-Generation Sequencing of DNA. The Glyphic protein sequencing technology will allow high-throughput, simultaneous sequencing of millions of proteins from samples as small as single cells.
Project Terms: Data; Detection; Protein Analysis; Resolution; resolutions; Process; protein function; next generation; nano pore; nanopore; cost effective; pathogen; NH2-terminal; N-terminal; Coupled; innovate; innovative; innovation; C-terminal; commercialization; bio-markers; biologic marker; biomarker; Biological Markers; NGS Method; NGS system; next gen sequencing; nextgen sequencing; next generation sequencing; High-Throughput Sequencing; High-Throughput Nucleotide Sequencing; global gene expression; global transcription profile; transcriptome; medical diagnostic; clinical diagnostics; DNA seq; DNAseq; DNA sequencing; Acceleration; Acids; Affect; Primary Protein Structure; protein sequence; Amino Acid Sequence; aminoacid; Amino Acids; Antibodies; Bar Codes; barcode; Biology; Biotechnology; Biotech; Blood; Blood Reticuloendothelial System; Buffers; Cells; Cell Body; Charge; Chemistry; High Pressure Liquid Chromatography; HPLC; High Performance Liquid Chromatography; High Speed Liquid Chromatography; Disease; Disorder; DNA; Deoxyribonucleic Acid; Face; faces; facial; Future; Gel; Genome; Growth and Development function; Growth and Development; Libraries; Light; Photoradiation; Methods; Parents; parent; Peptide Mapping; Peptide Fingerprinting; Peptides; Proteins; Reagent; Research; Rest; RNA; Non-Polyadenylated RNA; RNA Gene Products; Ribonucleic Acid; Signal Transduction; Cell Communication and Signaling; Cell Signaling; Intracellular Communication and Signaling; Signal Transduction Systems; Signaling; biological signal transduction; Specificity; Mass Spectrum Analysis; Mass Photometry/Spectrum Analysis; Mass Spectrometry; Mass Spectroscopy; Mass Spectrum; Mass Spectrum Analyses; Technology; bases; base; improved; biologic; Biological; Link; Chemicals; Individual; Dysfunction; Physiopathology; pathophysiology; Functional disorder; Hour; Complex; Side; Reaction; System; gel electrophoresis; molecular mass; Structure; novel; Basic Science; Basic Research; Positioning Attribute; Position; Protein Sequence Analysis; Amino Acid Sequence Analyses; Amino Acid Sequence Analysis; Peptide Sequence Analyses; Peptide Sequence Analysis; Protein Sequence Analyses; Peptide Sequence Determination; Amino Acid Sequence Determinations; Protein Sequence Determinations; Protein Sequencing; Protein Sequencing Molecular Biology; Proteome; Sampling; drug development; Proteomics; single molecule; Protein Secretion; Molecular Interaction; Binding; Complex Mixtures; magnetic beads; Polymerase; Address;