Riya, Singh, Aryan Amit, Barsainyan, Abhiraj Pravin, Mengade, Rida, Irfan, Bharath, Ramsundar
ChemRxiv
DOI: 10.26434/chemrxiv-2025-f11mk
Abstract
DNA-encoded libraries (DELs) have emerged as a powerful platform for screening ultra-large chemical spaces by leveraging DNA barcodes to tag and track individual small molecules. Recent work has shown that machine learning can enhance DEL based hit discovery by denoising sequencing artifacts and improving binder identification. However, existing tools for DEL modeling remain fragmented, limiting reproducibility and scalability. To address these challenges, we introduce Deepchem-DEL, an open source suite of workflows built on top of the DeepChem ecosystem. Deepchem-DEL integrates (i) a configurable denoising pipeline and (ii) modular Deepchem workflows for enrichment/hit prediction and benchmarking. We evaluated Deepchem-DEL using the KinDEL dataset and reproduced key baselines across diverse model architectures. Our experiments demonstrate that Deepchem-DEL enables reproducible and scalable machine learning workflows for DEL modeling, reducing engineering overhead for hit discovery.