Saul Calderon-Ramirez
Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images
Calderon-Ramirez, Saul; Yang, Shengxiang; Moemeni, Armaghan; Elizondo, David; Colreavy-Donnelly, Simon; Chavarría-Estrada, Luis Fernando; Molina-Cabello, Miguel A.
Authors
Shengxiang Yang
ARMAGHAN MOEMENI ARMAGHAN.MOEMENI@NOTTINGHAM.AC.UK
Assistant Professor
David Elizondo
Simon Colreavy-Donnelly
Luis Fernando Chavarría-Estrada
Miguel A. Molina-Cabello
Abstract
A key factor in the fight against viral diseases such as the coronavirus (COVID-19) is the identification of virus carriers as early and quickly as possible, in a cheap and efficient manner. The application of deep learning for image classification of chest X-ray images of COVID-19 patients could become a useful pre-diagnostic detection methodology. However, deep learning architectures require large labelled datasets. This is often a limitation when the subject of research is relatively new as in the case of the virus outbreak, where dealing with small labelled datasets is a challenge. Moreover, in such context, the datasets are also highly imbalanced, with few observations from positive cases of the new disease. In this work we evaluate the performance of the semi-supervised deep learning architecture known as MixMatch with a very limited number of labelled observations and highly imbalanced labelled datasets. We demonstrate the critical impact of data imbalance to the model's accuracy. Therefore, we propose a simple approach for correcting data imbalance, by re-weighting each observation in the loss function, giving a higher weight to the observations corresponding to the under-represented class. For unlabelled observations, we use the pseudo and augmented labels calculated by MixMatch to choose the appropriate weight. The proposed method improved classification accuracy by up to 18%, with respect to the non balanced MixMatch algorithm. We tested our proposed approach with several available datasets using 10, 15 and 20 labelled observations, for binary classification (COVID-19 positive and normal cases). For multi-class classification (COVID-19 positive, pneumonia and normal cases), we tested 30, 50, 70 and 90 labelled observations. Additionally, a new dataset is included among the tested datasets, composed of chest X-ray images of Costa Rican adult patients.
Citation
Calderon-Ramirez, S., Yang, S., Moemeni, A., Elizondo, D., Colreavy-Donnelly, S., Chavarría-Estrada, L. F., & Molina-Cabello, M. A. (2021). Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images. Applied Soft Computing, 111, Article 107692. https://doi.org/10.1016/j.asoc.2021.107692
Journal Article Type | Article |
---|---|
Acceptance Date | Jul 7, 2021 |
Online Publication Date | Jul 13, 2021 |
Publication Date | 2021-11 |
Deposit Date | Jul 22, 2021 |
Journal | Applied Soft Computing |
Print ISSN | 1568-4946 |
Electronic ISSN | 1872-9681 |
Publisher | Elsevier |
Peer Reviewed | Peer Reviewed |
Volume | 111 |
Article Number | 107692 |
DOI | https://doi.org/10.1016/j.asoc.2021.107692 |
Keywords | Coronavirus; COVID-19; Computer aided diagnosis; Data imbalance; Semi-supervised learning |
Public URL | https://nottingham-repository.worktribe.com/output/5815185 |
Publisher URL | https://www.sciencedirect.com/science/article/pii/S156849462100613X |
You might also like
Dementia with Lewy Bodies: Genomics, Transcriptomics, and Its Future with Data Science
(2024)
Journal Article
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search