Saul Calderon-Ramirez
A real use case of semi-supervised learning for mammogram classification in a local clinic of Costa Rica
Calderon-Ramirez, Saul; Murillo-Hernandez, Diego; Rojas-Salazar, Kevin; Elizondo, David; Yang, Shengxiang; Moemeni, Armaghan; Molina-Cabello, Miguel
Authors
Diego Murillo-Hernandez
Kevin Rojas-Salazar
David Elizondo
Shengxiang Yang
ARMAGHAN MOEMENI ARMAGHAN.MOEMENI@NOTTINGHAM.AC.UK
Assistant Professor
Miguel Molina-Cabello
Abstract
The implementation of deep learning-based computer-aided diagnosis systems for the classification of mammogram images can help in improving the accuracy, reliability, and cost of diagnosing patients. However, training a deep learning model requires a considerable amount of labelled images, which can be expensive to obtain as time and effort from clinical practitioners are required. To address this, a number of publicly available datasets have been built with data from different hospitals and clinics, which can be used to pre-train the model. However, using models trained on these datasets for later transfer learning and model fine-tuning with images sampled from a different hospital or clinic might result in lower performance. This is due to the distribution mismatch of the datasets, which include different patient populations and image acquisition protocols. In this work, a real-world scenario is evaluated where a novel target dataset sampled from a private Costa Rican clinic is used, with few labels and heavily imbalanced data. The use of two popular and publicly available datasets (INbreast and CBIS-DDSM) as source data, to train and test the models on the novel target dataset, is evaluated. A common approach to further improve the model’s performance under such small labelled target dataset setting is data augmentation. However, often cheaper unlabelled data is available from the target clinic. Therefore, semi-supervised deep learning, which leverages both labelled and unlabelled data, can be used in such conditions. In this work, we evaluate the semi-supervised deep learning approach known as MixMatch, to take advantage of unlabelled data from the target dataset, for whole mammogram image classification. We compare the usage of semi-supervised learning on its own, and combined with transfer learning (from a source mammogram dataset) with data augmentation, as also against regular supervised learning with transfer learning and data augmentation from source datasets. It is shown that the use of a semi-supervised deep learning combined with transfer learning and data augmentation can provide a meaningful advantage when using scarce labelled observations. Also, we found a strong influence of the source dataset, which suggests a more data-centric approach needed to tackle the challenge of scarcely labelled data. We used several different metrics to assess the performance gain of using semi-supervised learning, when dealing with very imbalanced test datasets (such as the G-mean and the F2-score), as mammogram datasets are often very imbalanced.
Citation
Calderon-Ramirez, S., Murillo-Hernandez, D., Rojas-Salazar, K., Elizondo, D., Yang, S., Moemeni, A., & Molina-Cabello, M. (2022). A real use case of semi-supervised learning for mammogram classification in a local clinic of Costa Rica. Medical and Biological Engineering and Computing, 60(4), 1159-1175. https://doi.org/10.1007/s11517-021-02497-6
Journal Article Type | Article |
---|---|
Acceptance Date | Dec 17, 2021 |
Online Publication Date | Mar 3, 2022 |
Publication Date | Apr 1, 2022 |
Deposit Date | Mar 3, 2022 |
Publicly Available Date | Mar 4, 2023 |
Journal | Medical and Biological Engineering and Computing |
Print ISSN | 0140-0118 |
Electronic ISSN | 1741-0444 |
Publisher | Springer Verlag |
Peer Reviewed | Peer Reviewed |
Volume | 60 |
Issue | 4 |
Pages | 1159-1175 |
DOI | https://doi.org/10.1007/s11517-021-02497-6 |
Keywords | Computer Science Applications; Biomedical Engineering |
Public URL | https://nottingham-repository.worktribe.com/output/7535396 |
Publisher URL | https://link.springer.com/article/10.1007/s11517-021-02497-6 |
Additional Information | This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s11517-021-02497-6 |
Files
A real use case of semi-supervised learning for mammogram classification in a local clinic of Costa Rica
(994 Kb)
PDF
You might also like
Dementia with Lewy Bodies: Genomics, Transcriptomics, and Its Future with Data Science
(2024)
Journal Article
Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images
(2021)
Journal Article
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search