Dr ISAAC TRIGUERO VELAZQUEZ I.TrigueroVelazquez@nottingham.ac.uk
ASSOCIATE PROFESSOR
From Big data to Smart Data with the K-Nearest Neighbours algorithm
Triguero, Isaac; Maillo, Jesus; Luengo, Julian; Garc�a, Salvador; Herrera, Francisco
Authors
Jesus Maillo
Julian Luengo
Salvador Garc�a
Francisco Herrera
Abstract
The k-nearest neighbours algorithm is one of the most widely used data mining models because of its simplicity and accurate results. However, when it comes to deal with big datasets, with potentially noisy and missing information, this technique becomes ineffective and inefficient. Due to its drawbacks to tackle large amounts of imperfect data, plenty of research has aimed at improving this algorithm by means of data preprocessing techniques. These weaknesses have turned out as strengths and the k-nearest neighbours rule has become a core model to actually detect and correct imperfect data, eliminating noisy and redundant data, as well as correcting missing values.
In this work, we delve into the role of the k nearest neighbour algorithm to come up with smart data from big datasets. We analyse how this model is affected by the big data problem, but at the same time, how it can be used to transform raw data into useful data. Concretely, we discuss the benefits of recent big data technologies (Hadoop and Spark) to enable this model to address large amounts of data, as well as the usefulness of prototype reduction and missing values imputation techniques based on it. As a result, guidelines on the use of the k-nearest neighbour to obtain Smart data are provided and new potential research trends are drawn.
Citation
Triguero, I., Maillo, J., Luengo, J., García, S., & Herrera, F. From Big data to Smart Data with the K-Nearest Neighbours algorithm. Presented at IEEE International Conference on Smart Data (Smart Data 2016)
Conference Name | IEEE International Conference on Smart Data (Smart Data 2016) |
---|---|
End Date | Dec 19, 2016 |
Acceptance Date | Nov 4, 2016 |
Publication Date | Dec 16, 2016 |
Deposit Date | May 3, 2017 |
Publicly Available Date | May 3, 2017 |
Peer Reviewed | Peer Reviewed |
Keywords | k-Nearest Neighbours, Prototype Reduction, Data Preprocessing, Smart Data, Big Data |
Public URL | https://nottingham-repository.worktribe.com/output/833159 |
Additional Information | © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Contract Date | May 3, 2017 |
Files
Triguero-et-al-KNNsmartData.pdf
(571 Kb)
PDF
You might also like
Machine Learning Pipeline for Energy and Environmental Prediction in Cold Storage Facilities
(2024)
Journal Article
Local-global methods for generalised solar irradiance forecasting
(2024)
Journal Article
Hyper-Stacked: Scalable and Distributed Approach to AutoML for Big Data
(2023)
Presentation / Conference Contribution
Explaining time series classifiers through meaningful perturbation and optimisation
(2023)
Journal Article
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search