Questions
Posts
Reply
Junior Badge
Intermediate Badge
Senior Badge
25 June 2025
HitGen
China
James Wellnitz , Shabbir Ahmad , Nabin Bagale , Xuemin Cheng , Jermiah Joseph , Hong Zeng , Albina Bolotokova , Aiping Dong , Shaghayegh Reza , Pegah Ghiabi , Elisa Gibson , Guiping Tu , Xianyang Li , Jian Liu , Dengfeng Dou , Jin Li , Timothy L. Foley , Anthony R. Harris , Jacquelyn L. Klug-McLeod , Jisun Lee , Zsofia Lengyel-Zhand , Justin I. Montgomery , Sylvie Sakata , Jinzhi Zhang , Hongyao Zhu , Dafydd R. Owen , Rachel J. Harding , Aled M. Edwards , Benjamin Haibe-Kains , Levon Halabelian , Alexander Tropsha , Rafael M. Couñago
Journal of Medicinal Chemistry
DOI: 10.1021/acs.jmedchem.5c01972
Abstract
Machine learning (ML) is increasingly used in DNA-encoded library (DEL) screening for ligand discovery, but its success depends on access to suitable data sets, which are often proprietary and costly. To overcome this, we present the first fully open, automated DEL-ML framework using public DEL data sets and chemical fingerprints to enable reproducible, accessible drug discovery. Our workflow─from model training to virtual screening and compound selection─requires no human intervention. As a proof of concept, we identified binders for WDR91 by training ML models on the HitGen OpenDEL library (3B molecules) and screening the Enamine REAL Space library (37B molecules), yielding 50 candidates. Experimental testing confirmed seven novel binders with dissociation constants between 2.7–21 μM. Our open-source approach matches the performance of proprietary methods, demonstrating that public DEL data can support robust ML-driven ligand discovery and fostering transparency and broader community participation in drug development.