Enabling Open Machine Learning of Deoxyribonucleic Acid-Encoded Library Selections to Accelerate the Discovery of Small Molecule Protein Binders

James Wellnitz , Shabbir Ahmad , Nabin Bagale , Xuemin Cheng , Jermiah Joseph , Hong Zeng , Albina Bolotokova , Aiping Dong , Shaghayegh Reza , Pegah Ghiabi , Elisa Gibson , Guiping Tu , Xianyang Li , Jian Liu , Dengfeng Dou , Jin Li , Timothy L. Foley , Anthony R. Harris , Jacquelyn L. Klug-McLeod , Jisun Lee , Zsofia Lengyel-Zhand , Justin I. Montgomery , Sylvie Sakata , Jinzhi Zhang , Hongyao Zhu , Dafydd R. Owen , Rachel J. Harding , Aled M. Edwards , Benjamin Haibe-Kains , Levon Halabelian , Alexander Tropsha , Rafael M. Couñago

Journal of Medicinal Chemistry

 

DOI: 10.1021/acs.jmedchem.5c01972

 

Abstract

Machine learning (ML) is increasingly used in DNA-encoded library (DEL) screening for ligand discovery, but its success depends on access to suitable data sets, which are often proprietary and costly. To overcome this, we present the first fully open, automated DEL-ML framework using public DEL data sets and chemical fingerprints to enable reproducible, accessible drug discovery. Our workflow─from model training to virtual screening and compound selection─requires no human intervention. As a proof of concept, we identified binders for WDR91 by training ML models on the HitGen OpenDEL library (3B molecules) and screening the Enamine REAL Space library (37B molecules), yielding 50 candidates. Experimental testing confirmed seven novel binders with dissociation constants between 2.7–21 μM. Our open-source approach matches the performance of proprietary methods, demonstrating that public DEL data can support robust ML-driven ligand discovery and fostering transparency and broader community participation in drug development.

logo
logo