Skip to main content

Research Repository

Advanced Search


KEEL 3.0: an open source software for multi-stage analysis in data mining (2017)
Journal Article
Triguero, I., González, S., Moyano, J. M., García, S., Alcalá-Fdez, J., Luengo, J., Fernández, A., del Jesús, M. J., Sánchez, L., & Herrera, F. (2017). KEEL 3.0: an open source software for multi-stage analysis in data mining. International Journal of Computational Intelligence Systems, 10(1),

This paper introduces the 3rd major release of the KEEL Software. KEEL is an open source Java framework (GPLv3 license) that provides a number of modules to perform a wide variety of data mining tasks. It includes tools to performdata management, des... Read More about KEEL 3.0: an open source software for multi-stage analysis in data mining.

Vehicle incident hot spots identification: An approach for big data (2017)
Presentation / Conference Contribution
Triguero, I., Figueredo, G. P., Mesgarpour, M., Garibaldi, J. M., & John, R. (2017, August). Vehicle incident hot spots identification: An approach for big data. Presented at 2017 IEEE Trustcom/BigDataSE/ICESS, Sydney, NSW, Australia

In this work we introduce a fast big data approach for road incident hot spot identification using Apache Spark. We implement an existing immuno-inspired mechanism, namely SeleSup, as a series of MapReduce-like operations. SeleSup is composed of a nu... Read More about Vehicle incident hot spots identification: An approach for big data.

An Immune-Inspired Technique to Identify Heavy Goods Vehicles Incident Hot Spots (2017)
Journal Article
Figueredo, G. P., Triguero, I., Mesgarpour, M., Maciel Guerra, A., Garibaldi, J. M., & John, R. (2017). An Immune-Inspired Technique to Identify Heavy Goods Vehicles Incident Hot Spots. IEEE Transactions on Emerging Topics in Computational Intelligence, 1(4), 248-258.

We report on the adaptation of an immune-inspired instance selection technique to solve a real-world big data problem of determining vehicle incident hot spots. The technique, which is inspired by the Immune System self-regulation mechanism, was orig... Read More about An Immune-Inspired Technique to Identify Heavy Goods Vehicles Incident Hot Spots.

Self-labeling techniques for semi-supervised time series classification: an empirical study (2017)
Journal Article
González, M., Bergmeir, C., Triguero, I., Rodríguez, Y., & Benítez, J. M. (in press). Self-labeling techniques for semi-supervised time series classification: an empirical study. Knowledge and Information Systems,

An increasing amount of unlabeled time series data available render the semi-supervised paradigm a suitable approach to tackle classification problems with a reduced quantity of labeled data. Self-labeled techniques stand out from semi-supervised cla... Read More about Self-labeling techniques for semi-supervised time series classification: an empirical study.

Exact fuzzy k-Nearest neighbor classification for big datasets (2017)
Presentation / Conference Contribution
Maillo, J., Luengo, J., García, S., Herrera, F., & Triguero, I. Exact fuzzy k-Nearest neighbor classification for big datasets. Presented at IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2017)

The k-Nearest Neighbors (kNN) classifier is one of the most effective methods in supervised learning problems. It classifies unseen cases comparing their similarity with the training data. Nevertheless, it gives to each labeled sample the same import... Read More about Exact fuzzy k-Nearest neighbor classification for big datasets.

A first attempt on global evolutionary undersampling for imbalanced big data (2017)
Presentation / Conference Contribution
Triguero, I., Galar, M., Bustince, H., & Herrera, F. A first attempt on global evolutionary undersampling for imbalanced big data. Presented at IEEE Congress on Evolutionary Computation (CEC 2017)

The design of efficient big data learning models has become a common need in a great number of applications. The massive amounts of available data may hinder the use of traditional data mining techniques, especially when evolutionary algorithms are i... Read More about A first attempt on global evolutionary undersampling for imbalanced big data.

Distributed incremental fingerprint identification with reduced database penetration rate using a hierarchical classification based on feature fusion and selection (2017)
Journal Article
Peralta, D., Triguero, I., García, S., Saeys, Y., Benitez, J. M., & Herrera, F. (2017). Distributed incremental fingerprint identification with reduced database penetration rate using a hierarchical classification based on feature fusion and selection. Knowledge-Based Systems, 126,

Fingerprint recognition has been a hot research topic along the last few decades, with many applications and ever growing populations to identify. The need of flexible, fast identification systems is therefore patent in such situations. In this conte... Read More about Distributed incremental fingerprint identification with reduced database penetration rate using a hierarchical classification based on feature fusion and selection.

From Big data to Smart Data with the K-Nearest Neighbours algorithm (2016)
Presentation / Conference Contribution
Triguero, I., Maillo, J., Luengo, J., García, S., & Herrera, F. From Big data to Smart Data with the K-Nearest Neighbours algorithm. Presented at IEEE International Conference on Smart Data (Smart Data 2016)

The k-nearest neighbours algorithm is one of the most widely used data mining models because of its simplicity and accurate results. However, when it comes to deal with big datasets, with potentially noisy and missing information, this technique beco... Read More about From Big data to Smart Data with the K-Nearest Neighbours algorithm.

EPRENNID: An evolutionary prototype reduction based ensemble for nearest neighbor classification of imbalanced data (2016)
Journal Article
Vluymans, S., Triguero, I., Cornelis, C., & Saeys, Y. (2016). EPRENNID: An evolutionary prototype reduction based ensemble for nearest neighbor classification of imbalanced data. Neurocomputing, 216,

Classification problems with an imbalanced class distribution have received an increased amount of attention within the machine learning community over the last decade. They are encountered in a growing number of real-world situations and pose a chal... Read More about EPRENNID: An evolutionary prototype reduction based ensemble for nearest neighbor classification of imbalanced data.

Evolutionary undersampling for extremely imbalanced big data classification under apache spark (2016)
Presentation / Conference Contribution
Triguero, I., Galar, M., Merino, D., Maillo, J., Bustince, H., & Herrera, F. Evolutionary undersampling for extremely imbalanced big data classification under apache spark. Presented at 2016 IEEE Congress on Evolutionary Computation (CEC)

The classification of datasets with a skewed class distribution is an important problem in data mining. Evolutionary undersampling of the majority class has proved to be a successful approach to tackle this issue. Such a challenging task may become e... Read More about Evolutionary undersampling for extremely imbalanced big data classification under apache spark.

kNN-IS: an iterative spark-based design of the k-nearest neighbors classifier for big data (2016)
Journal Article
Maillo, J., Ramirez, S., Triguero, I., & Herrera, F. (2017). kNN-IS: an iterative spark-based design of the k-nearest neighbors classifier for big data. Knowledge-Based Systems, 117, 3-15.

The k-Nearest Neighbors classifier is a simple yet effective widely renowned method in data mining. The actual application of this model in the big data domain is not feasible due to time and memory restrictions. Several distributed alternatives base... Read More about kNN-IS: an iterative spark-based design of the k-nearest neighbors classifier for big data.

DPD-DFF: a dual phase distributed scheme with double fingerprint fusion for fast and accurate identification in large databases (2016)
Journal Article
Peralta, D., Triguero, I., García, S., Herrera, F., & Benitez, J. M. (2016). DPD-DFF: a dual phase distributed scheme with double fingerprint fusion for fast and accurate identification in large databases. Information Fusion, 32(Part A),

Nowadays, many companies and institutions need fast and reliable identification systems that are able to deal with very large databases. Fingerprints are among the most used biometric traits for identification. In the current literature there are fin... Read More about DPD-DFF: a dual phase distributed scheme with double fingerprint fusion for fast and accurate identification in large databases.

Labelling strategies for hierarchical multi-label classification techniques (2016)
Journal Article
Triguero, I., & Vens, C. (2016). Labelling strategies for hierarchical multi-label classification techniques. Pattern Recognition, 56, 170-183.

© 2016 Elsevier Ltd Many hierarchical multi-label classification systems predict a real valued score for every (instance, class) couple, with a higher score reflecting more confidence that the instance belongs to that class. These classifiers leave t... Read More about Labelling strategies for hierarchical multi-label classification techniques.

ROSEFW-RF: the winner algorithm for the ECBDL’14 big data competition: an extremely imbalanced big data bioinformatics problem (2015)
Journal Article
Triguero, I., del Río, S., López, V., Bacardit, J., Benítez, J. M., & Herrera, F. (2015). ROSEFW-RF: the winner algorithm for the ECBDL’14 big data competition: an extremely imbalanced big data bioinformatics problem. Knowledge-Based Systems, 87,

The application of data mining and machine learning techniques to biological and biomedicine data continues to be an ubiquitous research theme in current bioinformatics. The rapid advances in biotechnology are allowing us to obtain and store large qu... Read More about ROSEFW-RF: the winner algorithm for the ECBDL’14 big data competition: an extremely imbalanced big data bioinformatics problem.

MRPR: A MapReduce solution for prototype reduction in big data classification (2014)
Journal Article
Triguero, I., Peralta, D., Bacardit, J., García, S., & Herrera, F. (2015). MRPR: A MapReduce solution for prototype reduction in big data classification. Neurocomputing, 150(Part A), 331-345.

In the era of big data, analyzing and extracting knowledge from large-scale data sets is a very interesting and challenging task. The application of standard data mining tools in such data sets is not straightforward. Hence, a new class of scalable m... Read More about MRPR: A MapReduce solution for prototype reduction in big data classification.

SEG-SSC: a framework based on synthetic examples generation for self-labeled semi-supervised classification (2014)
Journal Article
Triguero, I., Garcia, S., & Herrera, F. (2015). SEG-SSC: a framework based on synthetic examples generation for self-labeled semi-supervised classification. IEEE Transactions on Cybernetics, 45(4),

Self-labeled techniques are semi-supervised classification methods that address the shortage of labeled examples via a self-learning process based on supervised models. They progressively classify unlabeled data and use them to modify the hypothesis... Read More about SEG-SSC: a framework based on synthetic examples generation for self-labeled semi-supervised classification.

Minutiae filtering to improve both efficacy and efficiency of fingerprint matching algorithms (2014)
Journal Article
Peralta, D., Galar, M., Triguero, I., Miguel-Hurtado, O., Benitez, J. M., & Herrera, F. (2014). Minutiae filtering to improve both efficacy and efficiency of fingerprint matching algorithms. Engineering Applications of Artificial Intelligence, 32, 37-53.

Fingerprint minutiae extraction is a critical issue in fingerprint recognition. Both missing and spurious minutiae hinder the posterior matching process. Spurious minutiae are more frequent than missing ones, but they can be removed by post-processin... Read More about Minutiae filtering to improve both efficacy and efficiency of fingerprint matching algorithms.