ISAAC TRIGUERO VELAZQUEZ I.TrigueroVelazquez@nottingham.ac.uk
Associate Professor
Transforming big data into smart data: An insight on the use of the k-nearest neighbors algorithm to obtain quality data
Triguero, Isaac; Garcia-Gil, Diego; Maillo, Jesus; Luengo, Julian; Garcia, Salvador; Herrera, Francisco
Authors
Diego Garcia-Gil
Jesus Maillo
Julian Luengo
Salvador Garcia
Francisco Herrera
Abstract
The k-nearest neighbours algorithm is characterised as a simple yet effective data mining technique. The main drawback of this technique appears when massive amounts of data -likely to contain noise and imperfections - are involved, turning this algorithm into an imprecise and especially inefficient technique. These disadvantages have been subject of research for many years, and among others approaches, data preprocessing techniques such as instance reduction or missing values imputation have targeted these weaknesses. As a result, these issues have turned out as strengths and the k-nearest neighbours rule has become a core algorithm to identify and correct imperfect data, removing noisy and redundant samples, or imputing missing values, transforming Big Data into Smart Data - which is data of sufficient quality to expect a good outcome from any data mining algorithm. The role of this smart data gleaning algorithm in a supervised learning context will be investigated. This will include a brief overview of Smart Data, current and future trends for the k-nearest neighbour algorithm in the Big Data context, and the existing data preprocessing techniques based on this algorithm. We present the emerging big data-ready versions of these algorithms and develop some new methods to cope with Big Data. We carry out a thorough experimental analysis in a series of big datasets that provide guidelines as to how to use the k-nearest neighbour algorithm to obtain Smart/Quality Data for a high quality data mining process. Moreover, multiple Spark Packages have been developed including all the Smart Data algorithms analysed.
Citation
Triguero, I., Garcia-Gil, D., Maillo, J., Luengo, J., Garcia, S., & Herrera, F. (2019). Transforming big data into smart data: An insight on the use of the k-nearest neighbors algorithm to obtain quality data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(2), Article e1289. https://doi.org/10.1002/widm.1289
Journal Article Type | Article |
---|---|
Acceptance Date | Sep 26, 2018 |
Online Publication Date | Nov 28, 2018 |
Publication Date | 2019-03 |
Deposit Date | Oct 19, 2018 |
Publicly Available Date | Nov 29, 2019 |
Journal | Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery |
Electronic ISSN | 1942-4795 |
Publisher | Wiley |
Peer Reviewed | Peer Reviewed |
Volume | 9 |
Issue | 2 |
Article Number | e1289 |
DOI | https://doi.org/10.1002/widm.1289 |
Public URL | https://nottingham-repository.worktribe.com/output/1176205 |
Publisher URL | https://onlinelibrary.wiley.com/doi/full/10.1002/widm.1289 |
Contract Date | Oct 19, 2018 |
Files
Transforming big data into smart data
(3.4 Mb)
PDF
You might also like
MRPR: A MapReduce solution for prototype reduction in big data classification
(2014)
Journal Article
Labelling strategies for hierarchical multi-label classification techniques
(2016)
Journal Article
kNN-IS: an iterative spark-based design of the k-nearest neighbors classifier for big data
(2016)
Journal Article
Evolutionary undersampling for extremely imbalanced big data classification under apache spark
(2016)
Presentation / Conference Contribution
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search