Yong En Kok
Identifying Chirality in Line Drawings of Molecules Using Imbalanced Dataset Sampler for a Multilabel Classification Task
Kok, Yong En; Woodward, Simon; Özcan, Ender; Torres Torres, Mercedes
Authors
SIMON WOODWARD simon.woodward@nottingham.ac.uk
Professor of Synthetic Organic Chemistry
ENDER OZCAN ender.ozcan@nottingham.ac.uk
Professor of Computer Science and Operational Research
Mercedes Torres Torres
Abstract
Chirality, the ability of some molecules to exist as two non-superimposable mirror images, profoundly influences both chemistry and biology. Advances in deep learning enable the automatic recognition of chemical structure diagrams, however, studies on discovering the molecule chirality are scarce and the machine-readable molecular representations are not always sufficient to fully support the encoding of this important property. Here, we pretrained networks on a ChEMBL+ dataset (79641 molecules) and fine-tuned them for the binary classification of chirality (achiral/chiral) or multilabel chirality type classifications (none/centre/axial/planar). To address the label combination imbalanced problem in the multilabel task, the study proposed a Formulated Imbalanced Dataset Sampler (FIDS) to sample a formulated amount of minority label combinations on top of the training set. On a 10-fold cross validation experiment using our CHIRAL dataset (1142 manually curated molecules), our models achieved up to an accuracy of 90 % in the binary task. In the multilabel task incorporated with FIDS, the overall performance increases from 87 % to 89 % and the accuracy per label combination can attained up to a 50 % increase. Through the study of heatmaps, our work also exemplified the potential of deep neural network to make predictions based on the actual location of chirality elements.
Citation
Kok, Y. E., Woodward, S., Özcan, E., & Torres Torres, M. (2022). Identifying Chirality in Line Drawings of Molecules Using Imbalanced Dataset Sampler for a Multilabel Classification Task. Molecular Informatics, 41(12), Article 2200068. https://doi.org/10.1002/minf.202200068
Journal Article Type | Article |
---|---|
Acceptance Date | Jun 6, 2022 |
Online Publication Date | Jun 30, 2022 |
Publication Date | 2022-12 |
Deposit Date | Jun 15, 2022 |
Publicly Available Date | Jul 1, 2023 |
Journal | Molecular Informatics |
Print ISSN | 1868-1743 |
Electronic ISSN | 1868-1751 |
Publisher | Wiley-VCH Verlag |
Peer Reviewed | Peer Reviewed |
Volume | 41 |
Issue | 12 |
Article Number | 2200068 |
DOI | https://doi.org/10.1002/minf.202200068 |
Keywords | Organic Chemistry; Computer Science Applications; Drug Discovery; Molecular Medicine; Structural Biology |
Public URL | https://nottingham-repository.worktribe.com/output/8497083 |
Publisher URL | https://onlinelibrary.wiley.com/doi/abs/10.1002/minf.202200068 |
Additional Information | This is the peer reviewed version of the following article: Kok, Y.E., Woodward, S., Özcan, E. and Torres Torres, M. (2022), Identifying Chirality in Line Drawings of Molecules Using Imbalanced Dataset Sampler for a Multilabel Classification Task. Mol. Inf. 2022, 41, 2200068., which has been published in final form at https://onlinelibrary.wiley.com/doi/10.1002/minf.202200068. |
Files
Identifying Chirality in Line Drawings of Molecules Using Imbalanced Dataset Sampler for a Multilabel Classification Task
(2 Mb)
PDF
You might also like
N-(Alkylsulfamoyl)aldimines: easily deprotected precursors for diarylmethylamine synthesis
(2013)
Journal Article
Tetrathiotetracene thin film morphology and electrical properties
(2015)
Journal Article
An efficient synthesis of substituted chrysenes
(2016)
Journal Article
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search