M. Galar
A Preliminary Study of the Feasibility of Global Evolutionary Feature Selection for Big Datasets under Apache Spark
Galar, M.; Triguero, I.; Bustince, H.; Herrera, F.
Authors
Dr ISAAC TRIGUERO VELAZQUEZ I.TrigueroVelazquez@nottingham.ac.uk
ASSOCIATE PROFESSOR
H. Bustince
F. Herrera
Abstract
Designing efficient learning models capable of dealing with tons of data has become a reality in the era of big data. However, the amount of available data is too much for traditional data mining techniques to be applicable. This issue is even more serious when evolutionary algorithms are a key part of the learning algorithm. In this scenario, one typical approach is to follow a divide-and-conquer strategy, where data is divided into different chunks that are individually and independently addressed. Afterwards, the partial knowledge obtained from each chunk of data is combined in order to give a solution to the problem. Nevertheless, these kinds of local approaches do not look at data as a whole, missing a global view of the problem, which may result in less accurate models that also depend on how data is split. In this work, we focus on evolutionary feature selection algorithms. A divide-and-conquer approach to handle evolutionary feature selection in big data was already developed. We aim at designing its global counterpart, which looks at the feature selection problem from a global perspective, making use of the data as a whole to select the most appropriate features. In order to do so, we consider Apache Spark as a big data technology where our algorithm is implemented. We design a genetic algorithm capable of dealing with big datasets by selecting the proper parameters for our base algorithm (the well-known CHC) and adapting the evaluation procedure to take all the distributed data into account. Several preliminary results are discussed to study the feasibility of global evolutionary feature selection methods for big datasets.
Citation
Galar, M., Triguero, I., Bustince, H., & Herrera, F. (2018, July). A Preliminary Study of the Feasibility of Global Evolutionary Feature Selection for Big Datasets under Apache Spark. Presented at 2018 IEEE Congress on Evolutionary Computation (CEC), Rio de Janeiro, Brazil
Presentation Conference Type | Edited Proceedings |
---|---|
Conference Name | 2018 IEEE Congress on Evolutionary Computation (CEC) |
Start Date | Jul 8, 2018 |
End Date | Jul 13, 2018 |
Acceptance Date | Mar 15, 2018 |
Online Publication Date | Oct 4, 2018 |
Publication Date | Jul 12, 2018 |
Deposit Date | Oct 18, 2018 |
Publicly Available Date | Oct 18, 2018 |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 1-8 |
Book Title | 2018 IEEE Congress on Evolutionary Computation (CEC) - Proceedings |
Chapter Number | N/a |
ISBN | 978-1-5090-6018-4 |
DOI | https://doi.org/10.1109/CEC.2018.8477878 |
Public URL | https://nottingham-repository.worktribe.com/output/1174978 |
Publisher URL | https://ieeexplore.ieee.org/document/8477878 |
Additional Information | © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Contract Date | Oct 18, 2018 |
Files
Preliminary Study of the Feasibility of Global Evolutionary Feature Selection
(477 Kb)
PDF
You might also like
Machine Learning Pipeline for Energy and Environmental Prediction in Cold Storage Facilities
(2024)
Journal Article
Local-global methods for generalised solar irradiance forecasting
(2024)
Journal Article
Hyper-Stacked: Scalable and Distributed Approach to AutoML for Big Data
(2023)
Presentation / Conference Contribution
Explaining time series classifiers through meaningful perturbation and optimisation
(2023)
Journal Article
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search