DEL Insight | Hermes: Large DEL Datasets Train Generalizable Protein-Ligand Binding Prediction Models

9 May 2026

Research Background

Protein–ligand interaction (PLI) prediction is a central task in computational drug discovery. Existing public affinity datasets such as BindingDB and ChEMBL are highly heterogeneous in origin, having been aggregated from thousands of laboratories using many different experimental protocols. As a result, they suffer from systematic bias and substantial standardization challenges, which in turn limit model generalization. In contrast, DNA-encoded library (DEL) technology enables ultrahigh-throughput screening of billions of compounds under unified experimental protocols, providing a new source of large-scale, high-quality training data for PLI modeling.

Model Architecture

Hermes adopts a lightweight Transformer-based architecture. Its main components are:

Pretrained sequence encoders: ESM2-150M for protein sequences, with only the final four layers trainable, and ChemBERTa-77M-MTR for ligand SMILES strings, with all parameters trainable.
Joint cross-attention module: alternating self-attention blocks, which process protein and ligand sequences separately, and cross-attention blocks, which enable information exchange between protein and ligand tokens.
Attention pooling layers: differentiable pooling functions that learn importance scores and aggregate variable-length sequences into fixed-dimensional vectors.
Prediction head: a multilayer perceptron (MLP) that outputs a binding probability.

For final inference, the authors used an ensemble of nine checkpoints trained with different hyperparameter settings and training-sampling strategies, and averaged their predictions.

Figure 1. Hermes architecture diagram.

Training Data

Hermes was trained on DEL screening data generated from the Kin0 chemical library, which contains 6.5 million members and was constructed as a three-cycle library using 38 cores connected to 384 and 446 building blocks. The dataset covers 239 unique protein targets, approximately two-thirds of which are kinases.

Labels were generated through a binarized hit-calling procedure based on enrichment relative to control screens, including DEL-only, bead-only/no-target, and proprietary controls. To manage class imbalance, the training procedure capped the number of positive samples per protein target, retained the highest-enrichment hits, and paired each positive example with a fixed number of negatives, drawn from both random negatives and hard negatives.

Table 1. Training and evaluation dataset statistics.

Model Evaluation

Hermes was evaluated on four benchmark datasets designed to test different forms of generalization:

1. DEL Protein Split: 164 protein targets not seen during training, screened against the same Kin0 library. This benchmark tests cold-target generalization.

2. DEL Chemical Library Split (STRELKA): 59 protein targets seen in training, but screened against a different 1-million-member benzimidazole library (AMA020). This benchmark tests cold-ligand generalization.

3. Public Binders/Decoys: 403 protein targets, with positives from Papyrus++ and negatives from GuacaMol property-matched synthetic decoys. This benchmark evaluates generalization to external public binding data.

4. MF-PCBA: 26 protein targets from PubChem BioAssay high-throughput screening data, where confirmed dose-response actives are positives and primary-screen inactives are negatives. This benchmark tests performance on heterogeneous public screening data.

The authors compared Hermes against two baselines:

Boltz-2, a state-of-the-art structure-based deep learning model built on an AlphaFold3-like architecture.
XGBoost, using concatenated ESM2-650M CLS embeddings and ECFP4 fingerprints as features.

Table 2. Hermes vs benchmarks per-protein AUROC comparison.

Key Findings

The results show clear variation across benchmarks, but several important patterns emerge:

On DEL Protein Split, both Hermes and the XGBoost baseline outperform Boltz-2, suggesting that DEL-derived training data contains strong information value when the assay system is consistent.
On DEL Chemical Library Split, all models perform more weakly, indicating that this benchmark is difficult for cold-ligand generalization.
On MF-PCBA, Boltz-2 substantially outperforms Hermes, although part of this benchmark may overlap with Boltz-2’s training data.
On Public Binders/Decoys, Hermes clearly outperforms XGBoost, demonstrating stronger generalization to genuinely novel chemical space.

The paper also reports that Hermes performs better on kinase targets than on non-kinases in most benchmarks, which is consistent with the kinase-enriched composition of the training set. This suggests that targeted DEL data generation can improve generalization within a protein family.

Computational Efficiency

Table 3. Inference speed comparison.

Hermes is designed for efficient inference. After correcting for hardware differences, the authors estimate that Hermes is approximately 500–700 times faster than Boltz-2. Its sequence-only design allows protein embeddings to be cached, making it well suited for billion-scale virtual screening campaigns.

Discussion and Conclusions

This study shows that a model trained exclusively on DEL screening data can learn transferable representations of protein–ligand interactions. Without ever being trained on traditional affinity measurements, Hermes generalizes to unseen protein targets, unseen chemical scaffolds, and external datasets derived from different experimental systems. This provides a strong proof of concept for the use of DEL data in PLI modeling.

At the same time, the study identifies several limitations. First, the kinase-heavy training distribution leads to weaker performance on non-kinase proteins. Second, binary label binarization likely restricts model expressiveness. Third, performance on data similar to the training set still appears to be influenced by memorization effects. Future work may benefit from incorporating structure-prediction outputs, modeling continuous enrichment scores instead of binary labels, and improving sampling strategies to better support generalization to unseen protein families.

Overall, as DEL technology continues to mature and generate data at a faster pace than public affinity databases, DEL-trained models such as Hermes are likely to become an important methodology for the next generation of PLI prediction

Reference

Kleinsasser M, Halverson B J , Kraft E ,et al.Hermes: Large DEL Datasets Train Generalizable Protein-Ligand Binding Prediction Models[J]. 2026.

Back to DELHunter