Skip to main content

Research Repository

Advanced Search

All Outputs (6)

From Big data to Smart Data with the K-Nearest Neighbours algorithm (2016)
Presentation / Conference Contribution
Triguero, I., Maillo, J., Luengo, J., García, S., & Herrera, F. From Big data to Smart Data with the K-Nearest Neighbours algorithm. Presented at IEEE International Conference on Smart Data (Smart Data 2016)

The k-nearest neighbours algorithm is one of the most widely used data mining models because of its simplicity and accurate results. However, when it comes to deal with big datasets, with potentially noisy and missing information, this technique beco... Read More about From Big data to Smart Data with the K-Nearest Neighbours algorithm.

EPRENNID: An evolutionary prototype reduction based ensemble for nearest neighbor classification of imbalanced data (2016)
Journal Article
Vluymans, S., Triguero, I., Cornelis, C., & Saeys, Y. (2016). EPRENNID: An evolutionary prototype reduction based ensemble for nearest neighbor classification of imbalanced data. Neurocomputing, 216, https://doi.org/10.1016/j.neucom.2016.08.026

Classification problems with an imbalanced class distribution have received an increased amount of attention within the machine learning community over the last decade. They are encountered in a growing number of real-world situations and pose a chal... Read More about EPRENNID: An evolutionary prototype reduction based ensemble for nearest neighbor classification of imbalanced data.

Evolutionary undersampling for extremely imbalanced big data classification under apache spark (2016)
Presentation / Conference Contribution
Triguero, I., Galar, M., Merino, D., Maillo, J., Bustince, H., & Herrera, F. Evolutionary undersampling for extremely imbalanced big data classification under apache spark. Presented at 2016 IEEE Congress on Evolutionary Computation (CEC)

The classification of datasets with a skewed class distribution is an important problem in data mining. Evolutionary undersampling of the majority class has proved to be a successful approach to tackle this issue. Such a challenging task may become e... Read More about Evolutionary undersampling for extremely imbalanced big data classification under apache spark.

kNN-IS: an iterative spark-based design of the k-nearest neighbors classifier for big data (2016)
Journal Article
Maillo, J., Ramirez, S., Triguero, I., & Herrera, F. (2017). kNN-IS: an iterative spark-based design of the k-nearest neighbors classifier for big data. Knowledge-Based Systems, 117, 3-15. https://doi.org/10.1016/j.knosys.2016.06.012

The k-Nearest Neighbors classifier is a simple yet effective widely renowned method in data mining. The actual application of this model in the big data domain is not feasible due to time and memory restrictions. Several distributed alternatives base... Read More about kNN-IS: an iterative spark-based design of the k-nearest neighbors classifier for big data.

DPD-DFF: a dual phase distributed scheme with double fingerprint fusion for fast and accurate identification in large databases (2016)
Journal Article
Peralta, D., Triguero, I., García, S., Herrera, F., & Benitez, J. M. (2016). DPD-DFF: a dual phase distributed scheme with double fingerprint fusion for fast and accurate identification in large databases. Information Fusion, 32(Part A), https://doi.org/10.1016/j.inffus.2016.03.002

Nowadays, many companies and institutions need fast and reliable identification systems that are able to deal with very large databases. Fingerprints are among the most used biometric traits for identification. In the current literature there are fin... Read More about DPD-DFF: a dual phase distributed scheme with double fingerprint fusion for fast and accurate identification in large databases.

Labelling strategies for hierarchical multi-label classification techniques (2016)
Journal Article
Triguero, I., & Vens, C. (2016). Labelling strategies for hierarchical multi-label classification techniques. Pattern Recognition, 56, 170-183. https://doi.org/10.1016/j.patcog.2016.02.017

© 2016 Elsevier Ltd Many hierarchical multi-label classification systems predict a real valued score for every (instance, class) couple, with a higher score reflecting more confidence that the instance belongs to that class. These classifiers leave t... Read More about Labelling strategies for hierarchical multi-label classification techniques.