C2PO: an ML-powered optimizer of the membrane permeability of cyclic peptides through chemical modification

Roy Aerts, Joris Tavernier, Alan Kerstjens, Mazen Ahmad, Jose Carlos Gómez-Tamayo, Gary Tresadern, Hans De Winter

Journal of Cheminformatics

DOI: https://doi.org/10.1186/s13321-025-01109-x

Abstract

Peptide drug development is currently receiving due attention as a modality between small and large molecules. Therapeutic peptides represent an opportunity to achieve high potency, selectivity, and reach intracellular targets. A new era in the development of therapeutic peptides emerged with the arrival of cyclic peptides which avoid the limitations of parenteral administration via achieving sufficient oral bioavailability. However, improving the membrane permeability of cyclic peptides remains one of the principal bottlenecks. Here, we introduce a deep learning regression model of cyclic peptide membrane permeability based on publicly available data. The model starts with a chemical structure and goes beyond the limited vocabulary language models to generalize to monomers beyond the ones in the training dataset. Moreover, we introduce an efficient estimator2generative wrapper to enable using the model in direct molecular optimization of membrane permeability via chemical modification. We name our application C2PO (Cyclic Peptide Permeability Optimizer). Lastly, we demonstrate how a molecule correction tool can be used to limit the presence of unfamiliar chemistry in the generated molecules.

Summary

This study presents C2PO (Cyclic Peptide Permeability Optimizer), a novel machine learning-driven application that improves the membrane permeability of cyclic peptides through chemical structure modification. The core of C2PO consists of a Graph Transformer deep learning model trained on the CycPeptMPDB dataset (7,451 permeability measurements), achieving state-of-the-art performance (R² = 0.61, Pearson r = 0.78, MAE = 0.37 on test set). Unlike conventional generative models, C2PO employs an estimator2generative approach, using gradient-based optimization based on the HotFlip algorithm to suggest structural modifications. The framework operates in two stages: first, it generates permeability-optimized peptide analogs by mutating side chains while preserving the macrocycle backbone; second, it automatically corrects chemically invalid structures using a dictionary-based correction tool referencing ChEMBL31. A case study on 700 low-permeability cyclic peptides demonstrated that 76.86% of optimization campaigns successfully produced at least one offspring with improved permeability (logPapp > -6.0), with 42.05% of all 13,043 generated molecules crossing this threshold. The system allows flexible user control over modification scope, elemental composition, and optimization parameters, making it a practical tool for medicinal chemists to generate ideas for improving peptide drug candidates.

Highlights

  • First-in-class application converting a machine learning model into a generative optimizer specifically for cyclic peptide permeability improvement
  • Estimator2generative paradigm that decouples property estimation from structure generation, enabling broader chemical space exploration beyond training vocabulary
  • State-of-the-art Graph Transformer model (based on GRAPHGPS framework) trained on comprehensive CycPeptMPDB dataset with robust cross-validation performance
  • Automated chemistry correction workflow using a dictionary-based tool to validate and fix chemically unrealistic structures post-optimization, preserving 78% of successful optimizations
  • Demonstrated effectiveness in a large-scale case study: 76.86% success rate for campaigns and 42.05% of offspring molecules achieving high permeability
  • High flexibility allowing user-defined constraints on backbone protection, elemental modifications, molecular size changes (±5 atoms), and optimization parameters
  • Beyond-vocabulary generalization capability to handle monomers not present in training data, overcoming limitations of language model-based approaches

Conclusion

Generally, cyclic peptides lack adequate membrane permeability to be developed into medicines. We propose C2PO (Cyclic Peptide Permeability Optimizer), an application that improves permeability by modifying the chemical structure of a given cyclic peptide. C2PO is ML-driven, trained on the experimental CycPeptMPDB dataset, and can be categorized in the estimator2generative optimization paradigm. However, ML-based applications that output chemical structures have the tendency of occasionally proposing strange chemistry, attributable to the loss of chemical knowledge, although it is generally considered to be implicitly learned. Therefore, we opted for checking and correcting the outcomes of C2PO using a chemistry library-based autocorrection application in a subsequent step. This contribution provides insights into what one can expect when applying these two applications. Seven hundred permeability optimization campaigns were launched where only peptide side chains were allowed to be modified. In general, we observed optimization for many campaigns, meaning that bad permeability starting points were optimized to structures with estimated permeability above the threshold of -6.0 logPapp. In the chemical correctness check step, we identified that a substantial portion (22.9%) of output structures needed correction. The autocorrection tool modified these, and we tracked how optimized permeability altered upon chemical correction. Various scenarios occurred, but the most important was that for many campaigns, the second step did not counteract the initial permeability optimization. We discussed in detail how to properly use our model and workflow, noting its flexibility for user customization. We focused on providing insights into basic capabilities rather than pursuing optimal performance, while informing about ways to improve both permeability optimization and molecular autocorrection. Finally, we hope to raise general interest in adopting estimator2generative optimizer strategies for chemical problems and deploying chemistry-library-driven applications for post-correcting ML-generated structures.

logo
logo