Peptides possess exceptional therapeutic qualities, with high potency, selectivity, and low toxicity. Challenges like short half-life and poor oral bioavailability of peptides have been addressed through innovative strategies such as unnatural amino acids, conjugates, and cyclization. However, the complexity of peptide datasets has grown, complicating drug discovery processes. Machine learning (ML) has shown promise, with various algorithms showcased on numerous applications. Yet current approaches lack encodings for non-canonical amino acids (NCAAs) and struggle with small datasets. This proposal aims to develop methods for encoding NCAAs and cyclic peptides and demonstrate high performance machine learning on commercial sequence/activity as well as stability and permeability datasets. In addition, methods to assess data diversity and minimal dataset requirements will be addressed. The platform, commercialized in a browser-based software, will empower wet lab researchers to train potency models and utilize pre-trained solubility, stability and permeability models. This comprehensive platform is poised to expedite development schedules by extracting valuable insights from limited datasets and preemptively addressing development challenges through property predictions. More importantly, this endeavor has the potential to greatly benefit patients by introducing novel synthetic peptide treatments that are not only safe and effective but also more affordable.
Public Health Relevance Statement: PUBLIC RELEVANCE The development of peptide therapeutics holds significant promise in addressing a wide range of medical conditions, from diabetes and obesity to cancer and pain management. By advancing research and technology in this field, we can unlock new avenues for more effective and affordable treatments, improving patient outcomes and transforming the landscape of modern medicine. Terms: