Jesus Maillo
A preliminary study on Hybrid Spill-Tree Fuzzy k-Nearest Neighbors for big data classification
Maillo, Jesus; Luengo, Julián; Garcia, Salvador; Herrera, Francisco; Triguero, Isaac
Authors
Julián Luengo
Salvador Garcia
Francisco Herrera
ISAAC TRIGUERO VELAZQUEZ I.TrigueroVelazquez@nottingham.ac.uk
Associate Professor
Abstract
The Fuzzy k Nearest Neighbor (Fuzzy kNN) classifier is well known for its effectiveness in supervised learning problems. kNN classifies by comparing new incoming examples with a similarity function using the samples of the training set. The fuzzy version of the kNN accounts for the underlying uncertainty in the class labels, and it is composed of two different stages. The first one is responsible for calculating the fuzzy membership degree for each sample of the problem in order to obtain smoother boundaries between classes. The second stage classifies similarly to the standard kNN algorithm but uses the previously calculated class membership degree. To deal with very large datasets, distributed versions of the Fuzzy kNN algorithm have been proposed. However, existing approaches remain not fully scalable as they aim to replicate the exact behavior of the Fuzzy kNN. In this work, we present an approximate and distributed Fuzzy kNN approach based on Hybrid Spill-Tree implemented under Apache Spark. The aim of this model is to alleviate the scalability problems and to deal with big datasets maintaining high accuracy. In our experiments, we compare in precision and runtime with the Fuzzy kNN for big data problems existing in the literature, running with datasets of up to 11 million instances. The results show an improvement in the runtime and accuracy with respect to the previous exact model.
Citation
Maillo, J., Luengo, J., Garcia, S., Herrera, F., & Triguero, I. (2018, July). A preliminary study on Hybrid Spill-Tree Fuzzy k-Nearest Neighbors for big data classification. Presented at 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Rio de Janeiro, Brazil
Presentation Conference Type | Edited Proceedings |
---|---|
Conference Name | 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) |
Start Date | Jul 8, 2018 |
End Date | Jul 13, 2018 |
Acceptance Date | Mar 15, 2018 |
Online Publication Date | Oct 15, 2018 |
Publication Date | Oct 12, 2018 |
Deposit Date | Oct 18, 2018 |
Publicly Available Date | Oct 18, 2018 |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 1-8 |
Book Title | 2018 IEEE International Conference on Fuzzy Systems (FUXX-IEEE) |
Chapter Number | N/a |
ISBN | 978-1-5090-6021-4 |
DOI | https://doi.org/10.1109/FUZZ-IEEE.2018.8491595 |
Public URL | https://nottingham-repository.worktribe.com/output/1175103 |
Publisher URL | https://ieeexplore.ieee.org/document/8491595 |
Additional Information | © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Contract Date | Oct 18, 2018 |
Files
Preliminary study on Hybrid Spill-Tree Fuzzy
(1.5 Mb)
PDF
You might also like
MRPR: A MapReduce solution for prototype reduction in big data classification
(2014)
Journal Article
Labelling strategies for hierarchical multi-label classification techniques
(2016)
Journal Article
kNN-IS: an iterative spark-based design of the k-nearest neighbors classifier for big data
(2016)
Journal Article
Evolutionary undersampling for extremely imbalanced big data classification under apache spark
(2016)
Presentation / Conference Contribution
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search