Skip to main content

Research Repository

Advanced Search

Multi-Objective Feature Selection With Missing Data in Classification

Xue, Yu; Tang, Yihang; Xu, Xin; Liang, Jiayu; Neri, Ferrante

Multi-Objective Feature Selection With Missing Data in Classification Thumbnail


Authors

Yu Xue

Yihang Tang

Xin Xu

Jiayu Liang

Ferrante Neri



Abstract

Feature selection (FS) is an important research topic in machine learning. Usually, FS is modelled as a bi-objective optimization problem whose objectives are: 1) classification accuracy; 2) number of features. One of the main issues in real-world applications is missing data. Databases with missing data are likely to be unreliable. Thus, FS performed on a data set missing some data is also unreliable. In order to directly control this issue plaguing the field, we propose in this study a novel modelling of FS: we include reliability as the third objective of the problem. In order to address the modified problem, we propose the application of the non-dominated sorting genetic algorithm-III (NSGA-III). We selected six incomplete data sets from the University of California Irvine (UCI) machine learning repository. We used the mean imputation method to deal with the missing data. In the experiments, k-nearest neighbors (K-NN) is used as the classifier to evaluate the feature subsets. Experimental results show that the proposed three-objective model coupled with NSGA-III efficiently addresses the FS problem for the six data sets included in this study.

Citation

Xue, Y., Tang, Y., Xu, X., Liang, J., & Neri, F. (2022). Multi-Objective Feature Selection With Missing Data in Classification. IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2), 355-364. https://doi.org/10.1109/TETCI.2021.3074147

Journal Article Type Article
Acceptance Date Apr 8, 2021
Online Publication Date May 3, 2021
Publication Date 2022-04
Deposit Date Apr 11, 2021
Publicly Available Date May 3, 2021
Journal IEEE Transactions on Emerging Topics in Computational Intelligence
Electronic ISSN 2471-285X
Publisher Institute of Electrical and Electronics Engineers
Peer Reviewed Peer Reviewed
Volume 6
Issue 2
Pages 355-364
DOI https://doi.org/10.1109/TETCI.2021.3074147
Keywords Feature selection; Multi-objective; Optimization; NSGA-III; Missing data
Public URL https://nottingham-repository.worktribe.com/output/5460815
Publisher URL https://ieeexplore.ieee.org/document/9420459
Additional Information © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Files




Downloadable Citations