Saul Calderon-Ramirez
Dataset Similarity to Assess Semisupervised Learning Under Distribution Mismatch Between the Labeled and Unlabeled Datasets
Calderon-Ramirez, Saul; Oala, Luis; Torrentes-Barrena, Jordina; Yang, Shengxiang; Elizondo, David; Moemeni, Armaghan; Colreavy-Donnelly, Simon; Samek, Wojciech; Molina-Cabello, Miguel A.; Lopez-Rubio, Ezequiel
Authors
Luis Oala
Jordina Torrentes-Barrena
Shengxiang Yang
David Elizondo
Dr ARMAGHAN MOEMENI ARMAGHAN.MOEMENI@NOTTINGHAM.AC.UK
ASSISTANT PROFESSOR
Simon Colreavy-Donnelly
Wojciech Samek
Miguel A. Molina-Cabello
Ezequiel Lopez-Rubio
Abstract
Semi-supervised deep learning (SSDL) is a popular strategy to leverage unlabelled data for machine learning when labelled data is not readily available. In real-world scenarios, different unlabelled data sources are usually available, with varying degrees of distribution mismatch regarding the labelled datasets. It begs the question which unlabelled dataset to choose for good SSDL outcomes. ftentimes, semantic heuristics are used to match unlabelled data with labelled data. However, a quantitative and systematic approach to this election problem would be preferable. In this work, we first test the SSDL MixMatch algorithm under various distribution mismatch configurations to study the impact on SSDL accuracy. Then, we propose a quantitative unlabelled dataset selection heuristic based on dataset dissimilarity measures. These are designed to systematically assess how distribution mismatch between the labelled and unlabelled datasets affects MixMatch performance. We refer to our proposed method as deep dataset dissimilarity measures (DeDiMs), designed to compare labelled and unlabelled datasets. They use the feature space of a generic Wide-ResNet, can be applied prior to learning, are quick to evaluate and model agnostic. The strong correlation in our tests between MixMatch accuracy and the proposed DeDiMs suggests that this approach can be a good fit for quantitatively ranking different unlabelled datasets prior to SSDL training.
Citation
Calderon-Ramirez, S., Oala, L., Torrentes-Barrena, J., Yang, S., Elizondo, D., Moemeni, A., Colreavy-Donnelly, S., Samek, W., Molina-Cabello, M. A., & Lopez-Rubio, E. (2023). Dataset Similarity to Assess Semisupervised Learning Under Distribution Mismatch Between the Labeled and Unlabeled Datasets. IEEE Transactions on Artificial Intelligence, 4(2), 282-291. https://doi.org/10.1109/tai.2022.3168804
Journal Article Type | Article |
---|---|
Acceptance Date | Apr 22, 2022 |
Online Publication Date | Apr 22, 2022 |
Publication Date | Apr 1, 2023 |
Deposit Date | May 3, 2022 |
Publicly Available Date | May 5, 2022 |
Journal | IEEE Transactions on Artificial Intelligence |
Electronic ISSN | 2691-4581 |
Publisher | Institute of Electrical and Electronics Engineers |
Peer Reviewed | Peer Reviewed |
Volume | 4 |
Issue | 2 |
Pages | 282-291 |
DOI | https://doi.org/10.1109/tai.2022.3168804 |
Keywords | Training , Deep learning , Artificial intelligence , Feature extraction , Semisupervised learning , Data models , Semantics, Dataset similarity, distribution mismatch, MixMatch, out of distribution data, semisupervised deep learning |
Public URL | https://nottingham-repository.worktribe.com/output/7950447 |
Publisher URL | https://ieeexplore.ieee.org/document/9762063 |
Files
IEEE TAI Journal More Than Meets The Eye Final 2
(596 Kb)
PDF
You might also like
Dementia with Lewy Bodies: Genomics, Transcriptomics, and Its Future with Data Science
(2024)
Journal Article
Robustness of Deep Learning Methods for Occluded Object Detection - A Study Introducing a Novel Occlusion Dataset
(2023)
Presentation / Conference Contribution
Comparing a Graphical User Interface, Hand Gestures and Controller in Virtual Reality for Robot Teleoperation
(2023)
Presentation / Conference Contribution
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search