Skip to main content

Research Repository

Advanced Search

Identifying Chirality in Line Drawings of Molecules Using Imbalanced Dataset Sampler for a Multilabel Classification Task

Kok, Yong En; Woodward, Simon; Özcan, Ender; Torres Torres, Mercedes

Identifying Chirality in Line Drawings of Molecules Using Imbalanced Dataset Sampler for a Multilabel Classification Task Thumbnail


Authors

Yong En Kok

Profile image of SIMON WOODWARD

SIMON WOODWARD simon.woodward@nottingham.ac.uk
Professor of Synthetic Organic Chemistry

Profile image of ENDER OZCAN

ENDER OZCAN ender.ozcan@nottingham.ac.uk
Professor of Computer Science and Operational Research

Mercedes Torres Torres



Abstract

Chirality, the ability of some molecules to exist as two non-superimposable mirror images, profoundly influences both chemistry and biology. Advances in deep learning enable the automatic recognition of chemical structure diagrams, however, studies on discovering the molecule chirality are scarce and the machine-readable molecular representations are not always sufficient to fully support the encoding of this important property. Here, we pretrained networks on a ChEMBL+ dataset (79641 molecules) and fine-tuned them for the binary classification of chirality (achiral/chiral) or multilabel chirality type classifications (none/centre/axial/planar). To address the label combination imbalanced problem in the multilabel task, the study proposed a Formulated Imbalanced Dataset Sampler (FIDS) to sample a formulated amount of minority label combinations on top of the training set. On a 10-fold cross validation experiment using our CHIRAL dataset (1142 manually curated molecules), our models achieved up to an accuracy of 90 % in the binary task. In the multilabel task incorporated with FIDS, the overall performance increases from 87 % to 89 % and the accuracy per label combination can attained up to a 50 % increase. Through the study of heatmaps, our work also exemplified the potential of deep neural network to make predictions based on the actual location of chirality elements.

Citation

Kok, Y. E., Woodward, S., Özcan, E., & Torres Torres, M. (2022). Identifying Chirality in Line Drawings of Molecules Using Imbalanced Dataset Sampler for a Multilabel Classification Task. Molecular Informatics, 41(12), Article 2200068. https://doi.org/10.1002/minf.202200068

Journal Article Type Article
Acceptance Date Jun 6, 2022
Online Publication Date Jun 30, 2022
Publication Date 2022-12
Deposit Date Jun 15, 2022
Publicly Available Date Jul 1, 2023
Journal Molecular Informatics
Print ISSN 1868-1743
Electronic ISSN 1868-1751
Publisher Wiley-VCH Verlag
Peer Reviewed Peer Reviewed
Volume 41
Issue 12
Article Number 2200068
DOI https://doi.org/10.1002/minf.202200068
Keywords Organic Chemistry; Computer Science Applications; Drug Discovery; Molecular Medicine; Structural Biology
Public URL https://nottingham-repository.worktribe.com/output/8497083
Publisher URL https://onlinelibrary.wiley.com/doi/abs/10.1002/minf.202200068
Additional Information This is the peer reviewed version of the following article: Kok, Y.E., Woodward, S., Özcan, E. and Torres Torres, M. (2022), Identifying Chirality in Line Drawings of Molecules Using Imbalanced Dataset Sampler for a Multilabel Classification Task. Mol. Inf. 2022, 41, 2200068., which has been published in final form at https://onlinelibrary.wiley.com/doi/10.1002/minf.202200068.

Files





You might also like



Downloadable Citations