Jesus Maillo
Fast and Scalable Approaches to Accelerate the Fuzzy k Nearest Neighbors Classifier for Big Data
Maillo, Jesus; Garc�a, Salvador; Luengo, Juli�n; Herrera, Francisco; Triguero, Isaac
Authors
Salvador Garc�a
Juli�n Luengo
Francisco Herrera
Dr ISAAC TRIGUERO VELAZQUEZ I.TrigueroVelazquez@nottingham.ac.uk
ASSOCIATE PROFESSOR
Abstract
One of the best-known and most effective methods in supervised classification is the k nearest neighbors algorithm (kNN). Several approaches have been proposed to improve its accuracy, where fuzzy approaches prove to be among the most successful, highlighting the classical Fuzzy k nearest neighbors (FkNN). However, these traditional algorithms fail to tackle the large amounts of data that are available today. There are multiple alternatives to enable kNN classification in big datasets, spotlighting the approximate version of kNN called Hybrid Spill Tree. Nevertheless, the existing proposals of FkNN for big data problems are not fully scalable, because a high computational load is required to obtain the same behavior as the original FkNN algorithm. This work proposes Global Approximate Hybrid Spill Tree FkNN and Local Hybrid Spill Tree FkNN, two approximate approaches that speed up runtime without losing quality in the classification process. The experimentation compares various FkNN approaches for big data with datasets of up to 11 million instances. The results show an improvement in runtime and accuracy over literature algorithms.
Citation
Maillo, J., García, S., Luengo, J., Herrera, F., & Triguero, I. (2020). Fast and Scalable Approaches to Accelerate the Fuzzy k Nearest Neighbors Classifier for Big Data. IEEE Transactions on Fuzzy Systems, 28(5), 874-886. https://doi.org/10.1109/TFUZZ.2019.2936356
Journal Article Type | Article |
---|---|
Acceptance Date | Aug 9, 2019 |
Online Publication Date | Aug 20, 2019 |
Publication Date | 2020-05 |
Deposit Date | Aug 23, 2019 |
Publicly Available Date | Aug 23, 2019 |
Journal | IEEE Transactions on Fuzzy Systems |
Print ISSN | 1063-6706 |
Electronic ISSN | 1941-0034 |
Publisher | Institute of Electrical and Electronics Engineers |
Peer Reviewed | Peer Reviewed |
Volume | 28 |
Issue | 5 |
Pages | 874-886 |
DOI | https://doi.org/10.1109/TFUZZ.2019.2936356 |
Keywords | Index Terms-Fuzzy sets; k nearest neighbors; Classification; MapReduce; Apache Spark; Big Data |
Public URL | https://nottingham-repository.worktribe.com/output/2484097 |
Publisher URL | https://ieeexplore.ieee.org/document/8807277 |
Additional Information | © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Contract Date | Aug 23, 2019 |
Files
TFS-Accepted
(4.7 Mb)
PDF
You might also like
Machine Learning Pipeline for Energy and Environmental Prediction in Cold Storage Facilities
(2024)
Journal Article
Local-global methods for generalised solar irradiance forecasting
(2024)
Journal Article
Hyper-Stacked: Scalable and Distributed Approach to AutoML for Big Data
(2023)
Presentation / Conference Contribution
Explaining time series classifiers through meaningful perturbation and optimisation
(2023)
Journal Article
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search