Hoang Lam Le
EUSC: A clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification
Le, Hoang Lam; Landa-Silva, Dario; Galar, Mikel; Garcia, Salvador; Triguero, Isaac
Authors
DARIO LANDA SILVA DARIO.LANDASILVA@NOTTINGHAM.AC.UK
Professor of Computational Optimisation
Mikel Galar
Salvador Garcia
ISAAC TRIGUERO VELAZQUEZ I.TrigueroVelazquez@nottingham.ac.uk
Associate Professor
Abstract
© 2020 Learning from imbalanced datasets is highly demanded in real-world applications and a challenge for standard classifiers that tend to be biased towards the classes with the majority of the examples. Undersampling approaches reduce the size of the majority class to balance the class distributions. Evolutionary-based approaches are prominent, treating undersampling as a binary optimisation problem that determines which examples are removed. However, their utilisation is limited to small datasets due to fitness evaluation costs. This work proposes a two-stage clustering-based surrogate model that enables evolutionary undersampling to compute fitness values faster. The main novelty lies in the development of a surrogate model for binary optimisation which is based on the meaning (phenotype) rather than their binary representation (genotype). We conduct an evaluation on 44 imbalanced datasets, showing that in comparison with the original evolutionary undersampling, we can save up to 83% of the runtime without significantly deteriorating the classification performance.
Citation
Le, H. L., Landa-Silva, D., Galar, M., Garcia, S., & Triguero, I. (2021). EUSC: A clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification. Applied Soft Computing, 101, Article 107033. https://doi.org/10.1016/j.asoc.2020.107033
Journal Article Type | Article |
---|---|
Acceptance Date | Dec 12, 2020 |
Online Publication Date | Dec 19, 2020 |
Publication Date | Mar 1, 2021 |
Deposit Date | Jan 5, 2021 |
Publicly Available Date | Dec 20, 2021 |
Journal | Applied Soft Computing |
Print ISSN | 1568-4946 |
Electronic ISSN | 1872-9681 |
Publisher | Elsevier |
Peer Reviewed | Peer Reviewed |
Volume | 101 |
Article Number | 107033 |
DOI | https://doi.org/10.1016/j.asoc.2020.107033 |
Keywords | Software |
Public URL | https://nottingham-repository.worktribe.com/output/5201456 |
Publisher URL | https://www.sciencedirect.com/science/article/pii/S1568494620309728 |
Additional Information | This article is maintained by: Elsevier; Article Title: EUSC: A clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification; Journal Title: Applied Soft Computing; CrossRef DOI link to publisher maintained version: https://doi.org/10.1016/j.asoc.2020.107033; Content Type: article; Copyright: Crown Copyright © 2020 Published by Elsevier B.V. All rights reserved. |
Files
EUSC_Revising (1)
(690 Kb)
PDF
You might also like
MRPR: A MapReduce solution for prototype reduction in big data classification
(2014)
Journal Article
Labelling strategies for hierarchical multi-label classification techniques
(2016)
Journal Article
kNN-IS: an iterative spark-based design of the k-nearest neighbors classifier for big data
(2016)
Journal Article
Evolutionary undersampling for extremely imbalanced big data classification under apache spark
(2016)
Presentation / Conference Contribution
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search