MRPR: A MapReduce solution for prototype reduction in big data classification

Triguero, Isaac; Peralta, Daniel; Bacardit, Jaume; Garc�a, Salvador; Herrera, Francisco

doi:10.1016/j.neucom.2014.04.078

MRPR: A MapReduce solution for prototype reduction in big data classification

Triguero, Isaac; Peralta, Daniel; Bacardit, Jaume; Garc�a, Salvador; Herrera, Francisco

Authors

Dr ISAAC TRIGUERO VELAZQUEZ I.TrigueroVelazquez@nottingham.ac.uk
ASSOCIATE PROFESSOR

Daniel Peralta

Jaume Bacardit

Salvador Garc�a

Francisco Herrera

Abstract

In the era of big data, analyzing and extracting knowledge from large-scale data sets is a very interesting and challenging task. The application of standard data mining tools in such data sets is not straightforward. Hence, a new class of scalable mining method that embraces the huge storage and processing capacity of cloud platforms is required. In this work, we propose a novel distributed partitioning methodology for prototype reduction techniques in nearest neighbor classification. These methods aim at representing original training data sets as a reduced number of instances. Their main purposes are to speed up the classification process and reduce the storage requirements and sensitivity to noise of the nearest neighbor rule. However, the standard prototype reduction methods cannot cope with very large data sets. To overcome this limitation, we develop a MapReduce-based framework to distribute the functioning of these algorithms through a cluster of computing elements, proposing several algorithmic strategies to integrate multiple partial solutions (reduced sets of prototypes) into a single one. The proposed model enables prototype reduction algorithms to be applied over big data classification problems without significant accuracy loss. We test the speeding up capabilities of our model with data sets up to 5.7 millions of instances. The results show that this model is a suitable tool to enhance the performance of the nearest neighbor classifier with big data.

Citation

Triguero, I., Peralta, D., Bacardit, J., García, S., & Herrera, F. (2015). MRPR: A MapReduce solution for prototype reduction in big data classification. Neurocomputing, 150(Part A), 331-345. https://doi.org/10.1016/j.neucom.2014.04.078

Journal Article Type	Article
Acceptance Date	Apr 22, 2014
Online Publication Date	Oct 15, 2014
Publication Date	Feb 20, 2015
Deposit Date	Sep 4, 2017
Publicly Available Date	Sep 4, 2017
Journal	Neurocomputing
Print ISSN	0925-2312
Electronic ISSN	1872-8286
Publisher	Elsevier
Peer Reviewed	Peer Reviewed
Volume	150
Issue	Part A
Pages	331-345
DOI	https://doi.org/10.1016/j.neucom.2014.04.078
Keywords	Big data, Mahout, Hadoop, Prototype reduction, Prototype generation, Nearest neighbor classification
Public URL	https://nottingham-repository.worktribe.com/output/744703
Publisher URL	http://www.sciencedirect.com/science/article/pii/S0925231214013009?via%3Dihub
Contract Date	Sep 4, 2017

Files

triguero-peralta-bacardit-garcia-herrera-HadoopPG.pdf (443 Kb)
PDF

Copyright Statement
Copyright information regarding this work can be found at the following address: http://creativecommons.org/licenses/by-nc-nd/4.0