Cyclic peptidomimetics (CPM) have attracted growing attention in drug discovery because they combine the developability of small molecules with the target-recognition capability of larger biomolecules. Yet their complex macrocyclic topologies, noncanonical amino acids, and diverse cross-linking chemistries continue to challenge conventional AI-based molecular modeling methods.
Recently, the Computational Chemistry team at HitGen introduced CycWeave, a token-free dual-view coarse-grained graph neural framework designed for complex modular molecular systems. By adopting a representation strategy that better matches the modular nature of CPMs and DNA-encoded library (DEL) compounds, CycWeave demonstrated robust and competitive performance in both CPM membrane permeability prediction and large-scale DEL enrichment modeling. (Preprint available on ChemRxiv, https://chemrxiv.org/doi/full/10.26434/chemrxiv.15001512/v1)
In AI-driven drug discovery, computational modeling of CPMs and structurally complex DEL compounds faces two major limitations:
1. Atom-level graphs often fail to capture global topology
Conventional graph neural networks (GNNs) primarily focus on local atoms and bonds, but often struggle to effectively represent the higher-order topological organization characteristic of cyclic peptide-like systems, such as scaffold architecture, branch placement, and connection patterns.
2. Vocabulary-dependent token models have limited generalization
Many existing peptide or fragment-based modeling methods rely on predefined vocabularies or tokenization schemes. In realistic CPM-oriented DEL settings, however, noncanonical monomers and open-ended chemical modifications are common. As a result, such methods can suffer from out-of-vocabulary limitations and reduced generalizability in open chemical space.
Figure 1. Summary of existing molecular modeling strategies for CPM
To address these challenges, CycWeave introduces a new representation framework specifically designed for structurally complex and modular molecules.
1. Dual-view graph architecture
CycWeave represents each molecule simultaneously as an atom-level graph and a fragment-level coarse-grained graph. The atom-level view captures local chemical environments, while the coarse-grained view explicitly preserves modular structure by decomposing molecules into scaffold, branch, and connection-level components, including key chemical relations such as amide linkages, ring connection sites, and disulfide bonds. The two views are coupled and fused within a unified neural architecture, enabling coordinated modeling of both local detail and global topology.
2. Token-free continuous fragment embeddings
A central innovation of CycWeave is its token-free design. Instead of mapping fragments into discrete symbolic tokens, the framework uses continuous ECFP-based fragment embeddings to initialize coarse-grained nodes. This avoids dependence on a fixed vocabulary and enables the model to generalize more naturally to novel noncanonical monomers and open-ended chemical modifications.
3. Support for self-supervised pretraining
CycWeave also supports a self-supervised pretraining–fine-tuning paradigm. Through a masked fragment recovery task, the model learns to reconstruct original continuous fragment fingerprints from surrounding structural context. This allows CycWeave to learn transferable structural priors from large unlabeled DEL-related CPM chemical spaces and improves its applicability to downstream tasks with limited labeled data.
Figure 2. Schematic overview of the token-free coarse-grained dual-view framework of CycWeave.
The research team systematically evaluated CycWeave in two practically important application scenarios.
1. CPM membrane permeability prediction
Membrane permeability is jointly influenced by local physicochemical features and higher-order structural organization. On public benchmark datasets including PAMPA, Caco-2, MDCK, and RRCK, CycWeave achieved the strongest overall performance on the major benchmarks after pretraining and fine-tuning. Notably, it reached an R² of 0.728 in Caco-2 and 0.701 on the aggregated dataset, outperforming representative intermediate-granularity baselines such as PepLand and PeptideCLM. These results support the value of token-free dual-view representation for developability-related property prediction.
2. DEL enrichment modeling against TfR1
The team further applied CycWeave to DEL enrichment modeling against transferrin receptor 1 (TfR1), a biologically and translationally relevant target in drug delivery research. Because DEL enrichment signals are count-derived and typically overdispersed, the model used a negative binomial negative log-likelihood loss rather than a simple mean squared error objective. Under 10-fold scaffold-split evaluation, CycWeave outperformed both the general-purpose graph learning baseline Chemprop and the classical ECFP-MLP baseline. It achieved R² = 0.596, AUC-ROC = 0.962, and AP = 0.764, demonstrating strong regression fit as well as effective prioritization of enriched compounds under class imbalance.
In addition, latent-space visualization using t-SNE showed that enriched DEL compounds were organized into multiple separated yet internally compact clusters, suggesting that CycWeave not only improves predictive performance but may also help reveal distinct latent chemotypes or scaffold series for downstream hit triaging and series analysis.
The results of CycWeave suggest that, for complex modular molecular systems such as cyclic peptidomimetics and DEL compounds, chemically meaningful coarse-grained decomposition combined with a token-free open representation can substantially improve computational modeling performance.
As a unified molecular representation backbone, CycWeave is expected to support not only CPM property prediction, but also a broader range of AI-for-chemistry applications, including DEL activity modeling, selectivity analysis, pharmacokinetic property prediction, and multi-objective molecular optimization.