Jesus Maillo
Exact fuzzy k-Nearest neighbor classification for big datasets
Maillo, Jesus; Luengo, Julian; Garc�a, Salvador; Herrera, Francisco; Triguero, Isaac
Authors
Julian Luengo
Salvador Garc�a
Francisco Herrera
Dr ISAAC TRIGUERO VELAZQUEZ I.TrigueroVelazquez@nottingham.ac.uk
ASSOCIATE PROFESSOR
Abstract
The k-Nearest Neighbors (kNN) classifier is one of the most effective methods in supervised learning problems. It classifies unseen cases comparing their similarity with the training data. Nevertheless, it gives to each labeled sample the same importance to classify. There are several approaches to enhance its precision, with the Fuzzy k Nearest Neighbors (FuzzykNN) classifier being among the most successful ones. FuzzykNN computes a fuzzy degree of membership of each instance to the classes of the problem. As a result, it generates smoother borders between classes. Apart from the existing kNN approach to handle big datasets, there is not a fuzzy variant to manage that volume of data. Nevertheless, calculating this class membership adds an extra computational cost becoming even less scalable to tackle large datasets because of memory needs and high runtime. In this work, we present an exact and distributed approach to run the Fuzzy-kNN classifier on big datasets based on Spark, which provides the same precision than the original algorithm. It presents two separately stages. The first stage transforms the training set adding the class membership degrees. The second stage classifies with the kNN algorithm the test set using the class membership computed previously. In our experiments, we study the scaling-up capabilities of the proposed approach with datasets up to 11 million instances, showing promising results.
Citation
Maillo, J., Luengo, J., García, S., Herrera, F., & Triguero, I. Exact fuzzy k-Nearest neighbor classification for big datasets. Presented at IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2017)
Conference Name | IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2017) |
---|---|
End Date | Jul 12, 2017 |
Acceptance Date | Mar 14, 2017 |
Publication Date | Jul 10, 2017 |
Deposit Date | Aug 16, 2017 |
Publicly Available Date | Aug 16, 2017 |
Peer Reviewed | Peer Reviewed |
Public URL | https://nottingham-repository.worktribe.com/output/872258 |
Additional Information | 1544-5615. © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Contract Date | Aug 16, 2017 |
Files
exact-fuzzy-k.pdf
(992 Kb)
PDF
You might also like
Machine Learning Pipeline for Energy and Environmental Prediction in Cold Storage Facilities
(2024)
Journal Article
Local-global methods for generalised solar irradiance forecasting
(2024)
Journal Article
Hyper-Stacked: Scalable and Distributed Approach to AutoML for Big Data
(2023)
Presentation / Conference Contribution
Explaining time series classifiers through meaningful perturbation and optimisation
(2023)
Journal Article
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search