Back to overview

ChEMBL-Likeness Score and Database GDBChEMBL

Type of publication Peer-reviewed
Publikationsform Original article (peer-reviewed)
Author Bühlmann Sven, Reymond Jean-Louis,
Project Chemical Space Design of Small Molecules and Peptides
Show all

Original article (peer-reviewed)

Journal Frontiers in Chemistry
Volume (Issue) 8
Page(s) 46
Title of proceedings Frontiers in Chemistry
DOI 10.3389/fchem.2020.00046

Open Access

Type of Open Access Publisher (Gold Open Access)


The generated database GDB17 enumerates 166.4 billion molecules up to 17 atoms of C, N, O, S and halogens following simple rules of chemical stability and synthetic feasibility. However, most molecules in GDB17 are too complex to be considered for chemical synthesis. To address this limitation, we report GDBChEMBL as a subset of GDB17 featuring 10 million molecules selected according to a ChEMBL-likeness score (CLscore) calculated from the frequency of occurrence of circular substructures in ChEMBL, followed by uniform sampling across molecular size, stereocenters and heteroatoms. Compared to the previously reported subsets FDB17 and GDBMedChem selected from GDB17 by fragment-likeness, respectively, medicinal chemistry criteria, our new subset features molecules with higher synthetic accessibility and possibly bioactivity yet retains a broad and continuous coverage of chemical space typical of the entire GDB17. GDBChEMBL is accessible at for download and for browsing using an interactive chemical space map at