A Hybrid Unsupervised Methodology on Artificial Intelligence Filtering for automatically processing cellular DNA-Encoded Library (DEL) Datasets.

Yiran Huang ,  Xiao Tan ,  Xiaoyu Li ,  Feng Xiong ,  Siu Ming Yiu

Bioinformatics (Oxford, England)

DOI: 10.1093/bioinformatics/btag001

Abstract

Motivation

DNA encoded library (DEL) technology has been developed as a powerful platform for drug development. Live cell-based selection methodologies were recently developed to expedite drug candidate discovery with higher biological relevance. Nevertheless, hit characterization is challenged by prominent background signals of cell-based selections. Therefore, automated data processing streamline compatible with noisy sequencing output is highly desirable.

Results

Herein we report an innovative automatic method that enables the most promising hit identification from large quantities of cell-based DEL datasets with improved accuracy and efficiency. This processing workflow is based on a comprehensive unsupervised algorithm incorporating data pre-processing, feature extracting and outlier filtering, descriptor-based classification, similarity score ranking and active compound prediction. We performed methodology development with two DEL selection datasets targeting insulin receptor (INSR) on live cells, from both ˜30 million- and 1.033 billion- membered libraries. The automated scheme has demonstrated high consistency with experimental results as well as self-adaptivity to on-cell DEL datasets with varied library scales. Extended methodology application to cellular thrombopoietin receptor (TPOR) further substantiated the algorithmic generalization capability regarding target proteins. Thus, this approach can serve as a widely applicable workflow automatically differentiating hit compounds and thereby facilitates drug development from candidate discovery.

logo
logo