ISAAC TRIGUERO VELAZQUEZ I.TrigueroVelazquez@nottingham.ac.uk
Associate Professor
MRPR: A MapReduce solution for prototype reduction in big data classification
Triguero, Isaac; Peralta, Daniel; Bacardit, Jaume; Garc�a, Salvador; Herrera, Francisco
Authors
Daniel Peralta
Jaume Bacardit
Salvador Garc�a
Francisco Herrera
Abstract
In the era of big data, analyzing and extracting knowledge from large-scale data sets is a very interesting and challenging task. The application of standard data mining tools in such data sets is not straightforward. Hence, a new class of scalable mining method that embraces the huge storage and processing capacity of cloud platforms is required. In this work, we propose a novel distributed partitioning methodology for prototype reduction techniques in nearest neighbor classification. These methods aim at representing original training data sets as a reduced number of instances. Their main purposes are to speed up the classification process and reduce the storage requirements and sensitivity to noise of the nearest neighbor rule. However, the standard prototype reduction methods cannot cope with very large data sets. To overcome this limitation, we develop a MapReduce-based framework to distribute the functioning of these algorithms through a cluster of computing elements, proposing several algorithmic strategies to integrate multiple partial solutions (reduced sets of prototypes) into a single one. The proposed model enables prototype reduction algorithms to be applied over big data classification problems without significant accuracy loss. We test the speeding up capabilities of our model with data sets up to 5.7 millions of instances. The results show that this model is a suitable tool to enhance the performance of the nearest neighbor classifier with big data.
Citation
Triguero, I., Peralta, D., Bacardit, J., García, S., & Herrera, F. (2015). MRPR: A MapReduce solution for prototype reduction in big data classification. Neurocomputing, 150(Part A), 331-345. https://doi.org/10.1016/j.neucom.2014.04.078
Journal Article Type | Article |
---|---|
Acceptance Date | Apr 22, 2014 |
Online Publication Date | Oct 15, 2014 |
Publication Date | Feb 20, 2015 |
Deposit Date | Sep 4, 2017 |
Publicly Available Date | Sep 4, 2017 |
Journal | Neurocomputing |
Print ISSN | 0925-2312 |
Electronic ISSN | 1872-8286 |
Publisher | Elsevier |
Peer Reviewed | Peer Reviewed |
Volume | 150 |
Issue | Part A |
Pages | 331-345 |
DOI | https://doi.org/10.1016/j.neucom.2014.04.078 |
Keywords | Big data, Mahout, Hadoop, Prototype reduction, Prototype generation, Nearest neighbor classification |
Public URL | https://nottingham-repository.worktribe.com/output/744703 |
Publisher URL | http://www.sciencedirect.com/science/article/pii/S0925231214013009?via%3Dihub |
Contract Date | Sep 4, 2017 |
Files
triguero-peralta-bacardit-garcia-herrera-HadoopPG.pdf
(443 Kb)
PDF
Copyright Statement
Copyright information regarding this work can be found at the following address: http://creativecommons.org/licenses/by-nc-nd/4.0
You might also like
Labelling strategies for hierarchical multi-label classification techniques
(2016)
Journal Article
kNN-IS: an iterative spark-based design of the k-nearest neighbors classifier for big data
(2016)
Journal Article
Evolutionary undersampling for extremely imbalanced big data classification under apache spark
(2016)
Presentation / Conference Contribution
From Big data to Smart Data with the K-Nearest Neighbours algorithm
(2016)
Presentation / Conference Contribution
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search