Skip to main content

Research Repository

Advanced Search

Outputs (47)

Local-global methods for generalised solar irradiance forecasting (2024)
Journal Article
Cargan, T. R., Landa-Silva, D., & Triguero, I. (2024). Local-global methods for generalised solar irradiance forecasting. Applied Intelligence, 54(2), 2225-2247. https://doi.org/10.1007/s10489-024-05273-9

For efficient operation, solar power operators often require generation forecasts for multiple sites with varying data availability. Many proposed methods for forecasting solar irradiance / solar power production formulate the problem as a time-serie... Read More about Local-global methods for generalised solar irradiance forecasting.

General Purpose Artificial Intelligence Systems (GPAIS): Properties, definition, taxonomy, societal implications and responsible governance (2023)
Journal Article
Triguero, I., Molina, D., Poyatos, J., Del Ser, J., & Herrera, F. (2024). General Purpose Artificial Intelligence Systems (GPAIS): Properties, definition, taxonomy, societal implications and responsible governance. Information Fusion, 103, Article 102135. https://doi.org/10.1016/j.inffus.2023.102135

Most applications of Artificial Intelligence (AI) are designed for a confined and specific task. However, there are many scenarios that call for a more general AI, capable of solving a wide array of tasks without being specifically designed for them.... Read More about General Purpose Artificial Intelligence Systems (GPAIS): Properties, definition, taxonomy, societal implications and responsible governance.

Hyper-Stacked: Scalable and Distributed Approach to AutoML for Big Data (2023)
Presentation / Conference Contribution
Dave, R., Angarita-Zapata, J. S., & Triguero, I. (2023). Hyper-Stacked: Scalable and Distributed Approach to AutoML for Big Data. In Machine Learning and Knowledge Extraction (82-102). https://doi.org/10.1007/978-3-031-40837-3_6

The emergence of Machine Learning (ML) has altered how researchers and business professionals value data. Applicable to almost every industry, considerable amounts of time are wasted creating bespoke applications and repetitively hand-tuning models t... Read More about Hyper-Stacked: Scalable and Distributed Approach to AutoML for Big Data.

Explaining time series classifiers through meaningful perturbation and optimisation (2023)
Journal Article
Meng, H., Wagner, C., & Triguero, I. (2023). Explaining time series classifiers through meaningful perturbation and optimisation. Information Sciences, 645, Article 119334. https://doi.org/10.1016/j.ins.2023.119334

Machine learning approaches have enabled increasingly powerful time series classifiers. While performance has improved drastically, the resulting classifiers generally suffer from poor explainability, limiting their applicability in critical areas. S... Read More about Explaining time series classifiers through meaningful perturbation and optimisation.

Identifying bird species by their calls in Soundscapes (2023)
Journal Article
Maclean, K., & Triguero, I. (2023). Identifying bird species by their calls in Soundscapes. Applied Intelligence, 53, 21485-21499. https://doi.org/10.1007/s10489-023-04486-8

In many real data science problems, it is common to encounter a domain mismatch between the training and testing datasets, which means that solutions designed for one may not transfer well to the other due to their differences. An example of such was... Read More about Identifying bird species by their calls in Soundscapes.

Forced vital capacity trajectories in patients with idiopathic pulmonary fibrosis: a secondary analysis of a multicentre, prospective, observational cohort (2022)
Journal Article
Fainberg, H. P., Oldham, J. M., Molyneau, P. L., Allen, R. J., Kraven, L. M., Fahy, W. A., …Jenkins, R. G. (2022). Forced vital capacity trajectories in patients with idiopathic pulmonary fibrosis: a secondary analysis of a multicentre, prospective, observational cohort. The Lancet. Digital Health, 4(12), e862-e872. https://doi.org/10.1016/S2589-7500%2822%2900173-X

Background: Idiopathic Pulmonary Fibrosis (IPF) is a progressive fibrotic lung disease with a variable clinical trajectory. Decline in Forced Vital Capacity (FVC) is the main indicator of progression, however missingness prevents long-term analysis o... Read More about Forced vital capacity trajectories in patients with idiopathic pulmonary fibrosis: a secondary analysis of a multicentre, prospective, observational cohort.

Feature Importance Identification for Time Series Classifiers (2022)
Presentation / Conference Contribution
Meng, H., Wagner, C., & Triguero, I. (2022). Feature Importance Identification for Time Series Classifiers. In 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC) (3293-3298). https://doi.org/10.1109/smc53654.2022.9945205

Time series classification is a challenging research area where machine learning techniques such as deep learning perform well, yet lack interpretability. Identifying the most important features for such classifiers provides a pathway to improving th... Read More about Feature Importance Identification for Time Series Classifiers.

A fusion spatial attention approach for few-shot learning (2021)
Journal Article
Song, H., Deng, B., Pound, M., Özcan, E., & Triguero, I. (2022). A fusion spatial attention approach for few-shot learning. Information Fusion, 81, 187-202. https://doi.org/10.1016/j.inffus.2021.11.019

Few-shot learning is a challenging problem in computer vision that aims to learn a new visual concept from very limited data. A core issue is that there is a large amount of uncertainty introduced by the small training set. For example, the few image... Read More about A fusion spatial attention approach for few-shot learning.

Beyond global and local multi-target learning (2021)
Journal Article
Basgalupp, M., Cerri, R., Schietgat, L., Triguero, I., & Vens, C. (2021). Beyond global and local multi-target learning. Information Sciences, 579, 508-524. https://doi.org/10.1016/j.ins.2021.08.022

In multi-target prediction, an instance has to be classified along multiple target variables at the same time, where each target represents a category or numerical value. There are several strategies to tackle multi-target prediction problems: the lo... Read More about Beyond global and local multi-target learning.

EUSC: A clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification (2020)
Journal Article
Le, H. L., Landa-Silva, D., Galar, M., Garcia, S., & Triguero, I. (2021). EUSC: A clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification. Applied Soft Computing, 101, Article 107033. https://doi.org/10.1016/j.asoc.2020.107033

© 2020 Learning from imbalanced datasets is highly demanded in real-world applications and a challenge for standard classifiers that tend to be biased towards the classes with the majority of the examples. Undersampling approaches reduce the size of... Read More about EUSC: A clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification.

Decomposition-Fusion for Label Distribution Learning (2020)
Journal Article
González, M., González-Almagro, G., Triguero, I., Cano, J.-R., & García, S. (2021). Decomposition-Fusion for Label Distribution Learning. Information Fusion, 66, 64-75. https://doi.org/10.1016/j.inffus.2020.08.024

Label Distribution Learning (LDL) is a general learning framework that assigns an instance to a distribution over a set of labels rather than to a single label or multiple labels. Current LDL methods have proven their effectiveness in many real-life... Read More about Decomposition-Fusion for Label Distribution Learning.

Redundancy and Complexity Metrics for Big Data Classification: Towards Smart Data (2020)
Journal Article
Maillo, J., Triguero, I., & Herrera, F. (2020). Redundancy and Complexity Metrics for Big Data Classification: Towards Smart Data. IEEE Access, 1-1. https://doi.org/10.1109/access.2020.2991800

It is recognized the importance of knowing the descriptive properties of a dataset when tackling a data science problem. Having information about the redundancy, complexity and density of a problem allows us to make decisions as to which data preproc... Read More about Redundancy and Complexity Metrics for Big Data Classification: Towards Smart Data.

Multigranulation Super-Trust Model for Attribute Reduction (2020)
Journal Article
Ding, W., Pedrycz, W., Triguero, I., Cao, Z., & Lin, C.-T. (2020). Multigranulation Super-Trust Model for Attribute Reduction. IEEE Transactions on Fuzzy Systems, 29(6), 1395-1408. https://doi.org/10.1109/tfuzz.2020.2975152

As big data often contains a significant amount of uncertain, unstructured, and imprecise data that are structurally complex and incomplete, traditional attribute reduction methods are less effective when applied to large-scale incomplete information... Read More about Multigranulation Super-Trust Model for Attribute Reduction.

Evaluating Automated Machine Learning on Supervised Regression Traffic Forecasting Problems (2020)
Book Chapter
Angarita-Zapata, J. S., Masegosa, A. D., & Triguero, I. (2020). Evaluating Automated Machine Learning on Supervised Regression Traffic Forecasting Problems. In O. Llanes Santiago, C. Cruz Corona, A. J. Silva Neto, & J. L. Verdegay (Eds.), Computational intelligence in emerging technologies for engineering applications (187-204). Springer. https://doi.org/10.1007/978-3-030-34409-2_11

© Springer Nature Switzerland AG 2020. Traffic forecasting is a well-known strategy that supports road users and decision-makers to plan their movements on the roads and to improve the management of traffic, respectively. Current data availability an... Read More about Evaluating Automated Machine Learning on Supervised Regression Traffic Forecasting Problems.

Fast and Scalable Approaches to Accelerate the Fuzzy k Nearest Neighbors Classifier for Big Data (2019)
Journal Article
Maillo, J., García, S., Luengo, J., Herrera, F., & Triguero, I. (2020). Fast and Scalable Approaches to Accelerate the Fuzzy k Nearest Neighbors Classifier for Big Data. IEEE Transactions on Fuzzy Systems, 28(5), 874-886. https://doi.org/10.1109/TFUZZ.2019.2936356

One of the best-known and most effective methods in supervised classification is the k nearest neighbors algorithm (kNN). Several approaches have been proposed to improve its accuracy, where fuzzy approaches prove to be among the most successful, hig... Read More about Fast and Scalable Approaches to Accelerate the Fuzzy k Nearest Neighbors Classifier for Big Data.

Multi-head CNN–RNN for multi-time series anomaly detection: An industrial case study (2019)
Journal Article
Canizo, M., Triguero, I., Conde, A., & Onieva, E. (2019). Multi-head CNN–RNN for multi-time series anomaly detection: An industrial case study. Neurocomputing, 363, 246-260. https://doi.org/10.1016/j.neucom.2019.07.034

Detecting anomalies in time series data is becoming mainstream in a wide variety of industrial applications in which sensors monitor expensive machinery. The complexity of this task increases when multiple heterogeneous sensors provide information of... Read More about Multi-head CNN–RNN for multi-time series anomaly detection: An industrial case study.

Fuzzy Hot Spot Identification for Big Data: An Initial Approach (2019)
Presentation / Conference Contribution
Triguero, I., Tickle, R., Figueredo, G. P., Mesgarpour, M., Ozcan, E., & John, R. I. (2019). Fuzzy Hot Spot Identification for Big Data: An Initial Approach. In 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). https://doi.org/10.1109/FUZZ-IEEE.2019.8858979

Hot spot identification problems are present across a wide range of areas, such as transportation, health care and energy. Hot spots are locations where a certain type of event occurs with high frequency. A recent big data approach is capable of iden... Read More about Fuzzy Hot Spot Identification for Big Data: An Initial Approach.

A Taxonomy of Traffic Forecasting Regression Problems From a Supervised Learning Perspective (2019)
Journal Article
Angarita-Zapata, J. S., Masegosa, A. D., & Triguero, I. (2019). A Taxonomy of Traffic Forecasting Regression Problems From a Supervised Learning Perspective. IEEE Access, 7, 68185 -68205. https://doi.org/10.1109/ACCESS.2019.2917228

One contemporary policy to deal with traffic congestion is the design and implementation of forecasting methods that allow users to plan ahead of time and decision makers to improve traffic management. Current data availability and growing computatio... Read More about A Taxonomy of Traffic Forecasting Regression Problems From a Supervised Learning Perspective.

Virtual porous materials to predict the air void topology and hydraulic conductivity of asphalt roads (2019)
Journal Article
Aboufoul, M., Chiarelli, A., Triguero, I., & Garcia, A. (2019). Virtual porous materials to predict the air void topology and hydraulic conductivity of asphalt roads. Powder Technology, 352, 294-304. https://doi.org/10.1016/j.powtec.2019.04.072

This paper investigates the effects of air void topology on hydraulic conductivity in asphalt mixtures with porosity in the range 14%–31%. Virtual asphalt pore networks were generated using the Intersected Stacked Air voids (ISA) method, with its par... Read More about Virtual porous materials to predict the air void topology and hydraulic conductivity of asphalt roads.

A review on the self and dual interactions between machine learning and optimisation (2019)
Journal Article
Song, H., Triguero, I., & Özcan, E. (2019). A review on the self and dual interactions between machine learning and optimisation. Progress in Artificial Intelligence, 8(2), 143–165. https://doi.org/10.1007/s13748-019-00185-z

Machine learning and optimisation are two growing fields of artificial intelligence with an enormous number of computer science applications. The techniques in the former area aim to learn knowledge from data or experience, while the techniques from... Read More about A review on the self and dual interactions between machine learning and optimisation.

PAS3-HSID: a Dynamic Bio-Inspired Approach for Real-Time Hot Spot Identification in Data Streams (2019)
Journal Article
Tickle, R., Triguero, I., Figueredo, G. P., Mesgarpour, M., & John, R. I. (2019). PAS3-HSID: a Dynamic Bio-Inspired Approach for Real-Time Hot Spot Identification in Data Streams. Cognitive Computation, 11(3), 434–458. https://doi.org/10.1007/s12559-019-09638-y

© 2019, Springer Science+Business Media, LLC, part of Springer Nature. Hot spot identification is a very relevant problem in a wide variety of areas such as health care, energy or transportation. A hot spot is defined as a region of high likelihood o... Read More about PAS3-HSID: a Dynamic Bio-Inspired Approach for Real-Time Hot Spot Identification in Data Streams.

Handling uncertainty in citizen science data: towards an improved amateur-based large-scale classification (2018)
Journal Article
Jiménez, M., Triguero, I., & John, R. (2019). Handling uncertainty in citizen science data: towards an improved amateur-based large-scale classification. Information Sciences, 479, 301-320. https://doi.org/10.1016/j.ins.2018.12.011

© 2018 Citizen Science, traditionally known as the engagement of amateur participants in research, is showing great potential for large-scale processing of data. In areas such as astronomy, biology, or geo-sciences, where emerging technologies genera... Read More about Handling uncertainty in citizen science data: towards an improved amateur-based large-scale classification.

Transforming big data into smart data: An insight on the use of the k-nearest neighbors algorithm to obtain quality data (2018)
Journal Article
Triguero, I., Garcia-Gil, D., Maillo, J., Luengo, J., Garcia, S., & Herrera, F. (2019). Transforming big data into smart data: An insight on the use of the k-nearest neighbors algorithm to obtain quality data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(2), Article e1289. https://doi.org/10.1002/widm.1289

The k-nearest neighbours algorithm is characterised as a simple yet effective data mining technique. The main drawback of this technique appears when massive amounts of data -likely to contain noise and imperfections - are involved, turning this algo... Read More about Transforming big data into smart data: An insight on the use of the k-nearest neighbors algorithm to obtain quality data.

A preliminary study on Hybrid Spill-Tree Fuzzy k-Nearest Neighbors for big data classification (2018)
Presentation / Conference Contribution
Maillo, J., Luengo, J., Garcia, S., Herrera, F., & Triguero, I. (2018). A preliminary study on Hybrid Spill-Tree Fuzzy k-Nearest Neighbors for big data classification. In 2018 IEEE International Conference on Fuzzy Systems (FUXX-IEEE) (1-8). https://doi.org/10.1109/FUZZ-IEEE.2018.8491595

The Fuzzy k Nearest Neighbor (Fuzzy kNN) classifier is well known for its effectiveness in supervised learning problems. kNN classifies by comparing new incoming examples with a similarity function using the samples of the training set. The fuzzy ver... Read More about A preliminary study on Hybrid Spill-Tree Fuzzy k-Nearest Neighbors for big data classification.

Coevolutionary fuzzy attribute order reduction with complete attribute-value space tree (2018)
Journal Article
Ding, W., Triguero, I., & Lin, C.-T. (2018). Coevolutionary fuzzy attribute order reduction with complete attribute-value space tree. IEEE Transactions on Emerging Topics in Computational Intelligence, https://doi.org/10.1109/tetci.2018.2869919

Since big data sets are structurally complex, high-dimensional, and their attributes exhibit some redundant and irrelevant information, the selection, evaluation, and combination of those large-scale attributes pose huge challenges to traditional met... Read More about Coevolutionary fuzzy attribute order reduction with complete attribute-value space tree.

A Preliminary Study of the Feasibility of Global Evolutionary Feature Selection for Big Datasets under Apache Spark (2018)
Presentation / Conference Contribution
Galar, M., Triguero, I., Bustince, H., & Herrera, F. (2018). A Preliminary Study of the Feasibility of Global Evolutionary Feature Selection for Big Datasets under Apache Spark. In 2018 IEEE Congress on Evolutionary Computation (CEC) - Proceedings (1-8). https://doi.org/10.1109/CEC.2018.8477878

Designing efficient learning models capable of dealing with tons of data has become a reality in the era of big data. However, the amount of available data is too much for traditional data mining techniques to be applicable. This issue is even more s... Read More about A Preliminary Study of the Feasibility of Global Evolutionary Feature Selection for Big Datasets under Apache Spark.

A first approach for handling uncertainty in citizen science (2018)
Presentation / Conference Contribution
Jiménez, M., Triguero, I., & John, R. (2018, July). A first approach for handling uncertainty in citizen science. Paper presented at IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2018)

Citizen Science is coming to the forefront of scientific research as a valuable method for large-scale processing of data. New technologies in fields such as astronomy or bio-sciences generate tons of data, for which a thorough expert analysis is no... Read More about A first approach for handling uncertainty in citizen science.

Instance reduction for one-class classification (2018)
Journal Article
Krawczyk, B., Triguero, I., García, S., Woźniak, M., & Herrera, F. (in press). Instance reduction for one-class classification. Knowledge and Information Systems, https://doi.org/10.1007/s10115-018-1220-z

Instance reduction techniques are data preprocessing methods originally developed to enhance the nearest neighbor rule for standard classification. They reduce the training data by selecting or generating representative examples of a given problem. T... Read More about Instance reduction for one-class classification.

A preliminary study on automatic algorithm selection for short-term traffic forecasting (2018)
Presentation / Conference Contribution
Angarita-Zapata, J. S., Triguero, I., & Masegosa, A. D. (2018). A preliminary study on automatic algorithm selection for short-term traffic forecasting. In Intelligent Distributed Computing XII (204-214). https://doi.org/10.1007/978-3-319-99626-4_18

© 2018, Springer Nature Switzerland AG. Despite the broad range of Machine Learning (ML) algorithms, there are no clear baselines to find the best method and its configuration given a Short-Term Traffic Forecasting (STTF) problem. In ML, this is know... Read More about A preliminary study on automatic algorithm selection for short-term traffic forecasting.

On the use of convolutional neural networks for robust classification of multiple fingerprint captures (2017)
Journal Article
Peralta, D., Triguero, I., García, S., Saeys, Y., Benitez, J. M., & Herrera, F. (in press). On the use of convolutional neural networks for robust classification of multiple fingerprint captures. International Journal of Intelligent Systems, 33(1), https://doi.org/10.1002/int.21948

Fingerprint classification is one of the most common approaches to accelerate the identification in large databases of fingerprints. Fingerprints are grouped into disjoint classes, so that an input fingerprint is compared only with those belonging to... Read More about On the use of convolutional neural networks for robust classification of multiple fingerprint captures.

KEEL 3.0: an open source software for multi-stage analysis in data mining (2017)
Journal Article
Triguero, I., González, S., Moyano, J. M., García, S., Alcalá-Fdez, J., Luengo, J., …Herrera, F. (2017). KEEL 3.0: an open source software for multi-stage analysis in data mining. International Journal of Computational Intelligence Systems, 10(1), https://doi.org/10.2991/ijcis.10.1.82

This paper introduces the 3rd major release of the KEEL Software. KEEL is an open source Java framework (GPLv3 license) that provides a number of modules to perform a wide variety of data mining tasks. It includes tools to performdata management, des... Read More about KEEL 3.0: an open source software for multi-stage analysis in data mining.

Vehicle incident hot spots identification: An approach for big data (2017)
Presentation / Conference Contribution
Triguero, I., Figueredo, G. P., Mesgarpour, M., Garibaldi, J. M., & John, R. (2017). Vehicle incident hot spots identification: An approach for big data. In Proceedings - 16th IEEE International Conference on Trust, Security and Privacy in Computing and Communications; 11th IEEE International Conference on Big Data Science and Engineering (IEEE BigDataSE); and 14th IEEE International Conference on Embedded Software and Systems, (901-908). https://doi.org/10.1109/Trustcom/BigDataSE/ICESS.2017.329

In this work we introduce a fast big data approach for road incident hot spot identification using Apache Spark. We implement an existing immuno-inspired mechanism, namely SeleSup, as a series of MapReduce-like operations. SeleSup is composed of a nu... Read More about Vehicle incident hot spots identification: An approach for big data.

An Immune-Inspired Technique to Identify Heavy Goods Vehicles Incident Hot Spots (2017)
Journal Article
Figueredo, G. P., Triguero, I., Mesgarpour, M., Maciel Guerra, A., Garibaldi, J. M., & John, R. (2017). An Immune-Inspired Technique to Identify Heavy Goods Vehicles Incident Hot Spots. IEEE Transactions on Emerging Topics in Computational Intelligence, 1(4), 248-258. https://doi.org/10.1109/TETCI.2017.2721960

We report on the adaptation of an immune-inspired instance selection technique to solve a real-world big data problem of determining vehicle incident hot spots. The technique, which is inspired by the Immune System self-regulation mechanism, was orig... Read More about An Immune-Inspired Technique to Identify Heavy Goods Vehicles Incident Hot Spots.

Self-labeling techniques for semi-supervised time series classification: an empirical study (2017)
Journal Article
González, M., Bergmeir, C., Triguero, I., Rodríguez, Y., & Benítez, J. M. (in press). Self-labeling techniques for semi-supervised time series classification: an empirical study. Knowledge and Information Systems, https://doi.org/10.1007/s10115-017-1090-9

An increasing amount of unlabeled time series data available render the semi-supervised paradigm a suitable approach to tackle classification problems with a reduced quantity of labeled data. Self-labeled techniques stand out from semi-supervised cla... Read More about Self-labeling techniques for semi-supervised time series classification: an empirical study.

Exact fuzzy k-Nearest neighbor classification for big datasets (2017)
Presentation / Conference Contribution
Maillo, J., Luengo, J., García, S., Herrera, F., & Triguero, I. (2017). Exact fuzzy k-Nearest neighbor classification for big datasets.

The k-Nearest Neighbors (kNN) classifier is one of the most effective methods in supervised learning problems. It classifies unseen cases comparing their similarity with the training data. Nevertheless, it gives to each labeled sample the same import... Read More about Exact fuzzy k-Nearest neighbor classification for big datasets.

A first attempt on global evolutionary undersampling for imbalanced big data (2017)
Presentation / Conference Contribution
Triguero, I., Galar, M., Bustince, H., & Herrera, F. (2017). A first attempt on global evolutionary undersampling for imbalanced big data.

The design of efficient big data learning models has become a common need in a great number of applications. The massive amounts of available data may hinder the use of traditional data mining techniques, especially when evolutionary algorithms are i... Read More about A first attempt on global evolutionary undersampling for imbalanced big data.

Distributed incremental fingerprint identification with reduced database penetration rate using a hierarchical classification based on feature fusion and selection (2017)
Journal Article
Peralta, D., Triguero, I., García, S., Saeys, Y., Benitez, J. M., & Herrera, F. (2017). Distributed incremental fingerprint identification with reduced database penetration rate using a hierarchical classification based on feature fusion and selection. Knowledge-Based Systems, 126, https://doi.org/10.1016/j.knosys.2017.03.014

Fingerprint recognition has been a hot research topic along the last few decades, with many applications and ever growing populations to identify. The need of flexible, fast identification systems is therefore patent in such situations. In this conte... Read More about Distributed incremental fingerprint identification with reduced database penetration rate using a hierarchical classification based on feature fusion and selection.

From Big data to Smart Data with the K-Nearest Neighbours algorithm (2016)
Presentation / Conference Contribution
Triguero, I., Maillo, J., Luengo, J., García, S., & Herrera, F. (2016). From Big data to Smart Data with the K-Nearest Neighbours algorithm.

The k-nearest neighbours algorithm is one of the most widely used data mining models because of its simplicity and accurate results. However, when it comes to deal with big datasets, with potentially noisy and missing information, this technique beco... Read More about From Big data to Smart Data with the K-Nearest Neighbours algorithm.

EPRENNID: An evolutionary prototype reduction based ensemble for nearest neighbor classification of imbalanced data (2016)
Journal Article
Vluymans, S., Triguero, I., Cornelis, C., & Saeys, Y. (2016). EPRENNID: An evolutionary prototype reduction based ensemble for nearest neighbor classification of imbalanced data. Neurocomputing, 216, https://doi.org/10.1016/j.neucom.2016.08.026

Classification problems with an imbalanced class distribution have received an increased amount of attention within the machine learning community over the last decade. They are encountered in a growing number of real-world situations and pose a chal... Read More about EPRENNID: An evolutionary prototype reduction based ensemble for nearest neighbor classification of imbalanced data.

Evolutionary undersampling for extremely imbalanced big data classification under apache spark (2016)
Presentation / Conference Contribution
Triguero, I., Galar, M., Merino, D., Maillo, J., Bustince, H., & Herrera, F. (2016). Evolutionary undersampling for extremely imbalanced big data classification under apache spark.

The classification of datasets with a skewed class distribution is an important problem in data mining. Evolutionary undersampling of the majority class has proved to be a successful approach to tackle this issue. Such a challenging task may become e... Read More about Evolutionary undersampling for extremely imbalanced big data classification under apache spark.

kNN-IS: an iterative spark-based design of the k-nearest neighbors classifier for big data (2016)
Journal Article
Maillo, J., Ramirez, S., Triguero, I., & Herrera, F. (2017). kNN-IS: an iterative spark-based design of the k-nearest neighbors classifier for big data. Knowledge-Based Systems, 117, 3-15. https://doi.org/10.1016/j.knosys.2016.06.012

The k-Nearest Neighbors classifier is a simple yet effective widely renowned method in data mining. The actual application of this model in the big data domain is not feasible due to time and memory restrictions. Several distributed alternatives base... Read More about kNN-IS: an iterative spark-based design of the k-nearest neighbors classifier for big data.

DPD-DFF: a dual phase distributed scheme with double fingerprint fusion for fast and accurate identification in large databases (2016)
Journal Article
Peralta, D., Triguero, I., García, S., Herrera, F., & Benitez, J. M. (2016). DPD-DFF: a dual phase distributed scheme with double fingerprint fusion for fast and accurate identification in large databases. Information Fusion, 32(Part A), https://doi.org/10.1016/j.inffus.2016.03.002

Nowadays, many companies and institutions need fast and reliable identification systems that are able to deal with very large databases. Fingerprints are among the most used biometric traits for identification. In the current literature there are fin... Read More about DPD-DFF: a dual phase distributed scheme with double fingerprint fusion for fast and accurate identification in large databases.

Labelling strategies for hierarchical multi-label classification techniques (2016)
Journal Article
Triguero, I., & Vens, C. (2016). Labelling strategies for hierarchical multi-label classification techniques. Pattern Recognition, 56, 170-183. https://doi.org/10.1016/j.patcog.2016.02.017

© 2016 Elsevier Ltd Many hierarchical multi-label classification systems predict a real valued score for every (instance, class) couple, with a higher score reflecting more confidence that the instance belongs to that class. These classifiers leave t... Read More about Labelling strategies for hierarchical multi-label classification techniques.

ROSEFW-RF: the winner algorithm for the ECBDL’14 big data competition: an extremely imbalanced big data bioinformatics problem (2015)
Journal Article
Triguero, I., del Río, S., López, V., Bacardit, J., Benítez, J. M., & Herrera, F. (2015). ROSEFW-RF: the winner algorithm for the ECBDL’14 big data competition: an extremely imbalanced big data bioinformatics problem. Knowledge-Based Systems, 87, https://doi.org/10.1016/j.knosys.2015.05.027

The application of data mining and machine learning techniques to biological and biomedicine data continues to be an ubiquitous research theme in current bioinformatics. The rapid advances in biotechnology are allowing us to obtain and store large qu... Read More about ROSEFW-RF: the winner algorithm for the ECBDL’14 big data competition: an extremely imbalanced big data bioinformatics problem.

MRPR: A MapReduce solution for prototype reduction in big data classification (2014)
Journal Article
Triguero, I., Peralta, D., Bacardit, J., García, S., & Herrera, F. (2015). MRPR: A MapReduce solution for prototype reduction in big data classification. Neurocomputing, 150(Part A), 331-345. https://doi.org/10.1016/j.neucom.2014.04.078

In the era of big data, analyzing and extracting knowledge from large-scale data sets is a very interesting and challenging task. The application of standard data mining tools in such data sets is not straightforward. Hence, a new class of scalable m... Read More about MRPR: A MapReduce solution for prototype reduction in big data classification.

SEG-SSC: a framework based on synthetic examples generation for self-labeled semi-supervised classification (2014)
Journal Article
Triguero, I., Garcia, S., & Herrera, F. (2015). SEG-SSC: a framework based on synthetic examples generation for self-labeled semi-supervised classification. IEEE Transactions on Cybernetics, 45(4), https://doi.org/10.1109/TCYB.2014.2332003

Self-labeled techniques are semi-supervised classification methods that address the shortage of labeled examples via a self-learning process based on supervised models. They progressively classify unlabeled data and use them to modify the hypothesis... Read More about SEG-SSC: a framework based on synthetic examples generation for self-labeled semi-supervised classification.

Minutiae filtering to improve both efficacy and efficiency of fingerprint matching algorithms (2014)
Journal Article
Peralta, D., Galar, M., Triguero, I., Miguel-Hurtado, O., Benitez, J. M., & Herrera, F. (2014). Minutiae filtering to improve both efficacy and efficiency of fingerprint matching algorithms. Engineering Applications of Artificial Intelligence, 32, 37-53. https://doi.org/10.1016/j.engappai.2014.02.016

Fingerprint minutiae extraction is a critical issue in fingerprint recognition. Both missing and spurious minutiae hinder the posterior matching process. Spurious minutiae are more frequent than missing ones, but they can be removed by post-processin... Read More about Minutiae filtering to improve both efficacy and efficiency of fingerprint matching algorithms.