Evolutionary undersampling for extremely imbalanced big data classification under apache spark

Triguero, Isaac; Galar, M.; Merino, D.; Maillo, Jesus; Bustince, H.; Herrera, Francisco

PAS3-HSID: a Dynamic Bio-Inspired Approach for Real-Time Hot Spot Identification in Data Streams (2019)
Journal Article
Tickle, R., Triguero, I., Figueredo, G. P., Mesgarpour, M., & John, R. I. (2019). PAS3-HSID: a Dynamic Bio-Inspired Approach for Real-Time Hot Spot Identification in Data Streams. Cognitive Computation, 11(3), 434–458. https://doi.org/10.1007/s12559-019-09638-y

© 2019, Springer Science+Business Media, LLC, part of Springer Nature. Hot spot identification is a very relevant problem in a wide variety of areas such as health care, energy or transportation. A hot spot is defined as a region of high likelihood o... Read More about PAS3-HSID: a Dynamic Bio-Inspired Approach for Real-Time Hot Spot Identification in Data Streams.

Handling uncertainty in citizen science data: towards an improved amateur-based large-scale classification (2018)
Journal Article
Jiménez, M., Triguero, I., & John, R. (2019). Handling uncertainty in citizen science data: towards an improved amateur-based large-scale classification. Information Sciences, 479, 301-320. https://doi.org/10.1016/j.ins.2018.12.011

© 2018 Citizen Science, traditionally known as the engagement of amateur participants in research, is showing great potential for large-scale processing of data. In areas such as astronomy, biology, or geo-sciences, where emerging technologies genera... Read More about Handling uncertainty in citizen science data: towards an improved amateur-based large-scale classification.

Transforming big data into smart data: An insight on the use of the k-nearest neighbors algorithm to obtain quality data (2018)
Journal Article
Triguero, I., Garcia-Gil, D., Maillo, J., Luengo, J., Garcia, S., & Herrera, F. (2019). Transforming big data into smart data: An insight on the use of the k-nearest neighbors algorithm to obtain quality data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(2), Article e1289. https://doi.org/10.1002/widm.1289

The k-nearest neighbours algorithm is characterised as a simple yet effective data mining technique. The main drawback of this technique appears when massive amounts of data -likely to contain noise and imperfections - are involved, turning this algo... Read More about Transforming big data into smart data: An insight on the use of the k-nearest neighbors algorithm to obtain quality data.

A preliminary study on Hybrid Spill-Tree Fuzzy k-Nearest Neighbors for big data classification (2018)
Presentation / Conference Contribution
Maillo, J., Luengo, J., Garcia, S., Herrera, F., & Triguero, I. (2018). A preliminary study on Hybrid Spill-Tree Fuzzy k-Nearest Neighbors for big data classification. In 2018 IEEE International Conference on Fuzzy Systems (FUXX-IEEE) (1-8). https://doi.org/10.1109/FUZZ-IEEE.2018.8491595

The Fuzzy k Nearest Neighbor (Fuzzy kNN) classifier is well known for its effectiveness in supervised learning problems. kNN classifies by comparing new incoming examples with a similarity function using the samples of the training set. The fuzzy ver... Read More about A preliminary study on Hybrid Spill-Tree Fuzzy k-Nearest Neighbors for big data classification.

Coevolutionary fuzzy attribute order reduction with complete attribute-value space tree (2018)
Journal Article
Ding, W., Triguero, I., & Lin, C.-T. (2018). Coevolutionary fuzzy attribute order reduction with complete attribute-value space tree. IEEE Transactions on Emerging Topics in Computational Intelligence, https://doi.org/10.1109/tetci.2018.2869919

Since big data sets are structurally complex, high-dimensional, and their attributes exhibit some redundant and irrelevant information, the selection, evaluation, and combination of those large-scale attributes pose huge challenges to traditional met... Read More about Coevolutionary fuzzy attribute order reduction with complete attribute-value space tree.

A Preliminary Study of the Feasibility of Global Evolutionary Feature Selection for Big Datasets under Apache Spark (2018)
Presentation / Conference Contribution
Galar, M., Triguero, I., Bustince, H., & Herrera, F. (2018). A Preliminary Study of the Feasibility of Global Evolutionary Feature Selection for Big Datasets under Apache Spark. In 2018 IEEE Congress on Evolutionary Computation (CEC) - Proceedings (1-8). https://doi.org/10.1109/CEC.2018.8477878

Designing efficient learning models capable of dealing with tons of data has become a reality in the era of big data. However, the amount of available data is too much for traditional data mining techniques to be applicable. This issue is even more s... Read More about A Preliminary Study of the Feasibility of Global Evolutionary Feature Selection for Big Datasets under Apache Spark.

A first approach for handling uncertainty in citizen science (2018)
Presentation / Conference Contribution
Jiménez, M., Triguero, I., & John, R. (2018, July). A first approach for handling uncertainty in citizen science. Paper presented at IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2018)

Citizen Science is coming to the forefront of scientific research as a valuable method for large-scale processing of data. New technologies in fields such as astronomy or bio-sciences generate tons of data, for which a thorough expert analysis is no... Read More about A first approach for handling uncertainty in citizen science.

Instance reduction for one-class classification (2018)
Journal Article
Krawczyk, B., Triguero, I., García, S., Woźniak, M., & Herrera, F. (in press). Instance reduction for one-class classification. Knowledge and Information Systems, https://doi.org/10.1007/s10115-018-1220-z

Instance reduction techniques are data preprocessing methods originally developed to enhance the nearest neighbor rule for standard classification. They reduce the training data by selecting or generating representative examples of a given problem. T... Read More about Instance reduction for one-class classification.

A preliminary study on automatic algorithm selection for short-term traffic forecasting (2018)
Presentation / Conference Contribution
Angarita-Zapata, J. S., Triguero, I., & Masegosa, A. D. (2018, October). A preliminary study on automatic algorithm selection for short-term traffic forecasting. Presented at 12th International Symposium on Intelligent Distributed Computing (IDC 2018), Bilbao, Spain

© 2018, Springer Nature Switzerland AG. Despite the broad range of Machine Learning (ML) algorithms, there are no clear baselines to find the best method and its configuration given a Short-Term Traffic Forecasting (STTF) problem. In ML, this is know... Read More about A preliminary study on automatic algorithm selection for short-term traffic forecasting.

On the use of convolutional neural networks for robust classification of multiple fingerprint captures (2017)
Journal Article
Peralta, D., Triguero, I., García, S., Saeys, Y., Benitez, J. M., & Herrera, F. (in press). On the use of convolutional neural networks for robust classification of multiple fingerprint captures. International Journal of Intelligent Systems, 33(1), https://doi.org/10.1002/int.21948

Fingerprint classification is one of the most common approaches to accelerate the identification in large databases of fingerprints. Fingerprints are grouped into disjoint classes, so that an input fingerprint is compared only with those belonging to... Read More about On the use of convolutional neural networks for robust classification of multiple fingerprint captures.

KEEL 3.0: an open source software for multi-stage analysis in data mining (2017)
Journal Article
Triguero, I., González, S., Moyano, J. M., García, S., Alcalá-Fdez, J., Luengo, J., …Herrera, F. (2017). KEEL 3.0: an open source software for multi-stage analysis in data mining. International Journal of Computational Intelligence Systems, 10(1), https://doi.org/10.2991/ijcis.10.1.82

This paper introduces the 3rd major release of the KEEL Software. KEEL is an open source Java framework (GPLv3 license) that provides a number of modules to perform a wide variety of data mining tasks. It includes tools to performdata management, des... Read More about KEEL 3.0: an open source software for multi-stage analysis in data mining.

Vehicle incident hot spots identification: An approach for big data (2017)
Presentation / Conference Contribution
Triguero, I., Figueredo, G. P., Mesgarpour, M., Garibaldi, J. M., & John, R. (2017). Vehicle incident hot spots identification: An approach for big data. In Proceedings - 16th IEEE International Conference on Trust, Security and Privacy in Computing and Communications; 11th IEEE International Conference on Big Data Science and Engineering (IEEE BigDataSE); and 14th IEEE International Conference on Embedded Software and Systems, (901-908). https://doi.org/10.1109/Trustcom/BigDataSE/ICESS.2017.329

In this work we introduce a fast big data approach for road incident hot spot identification using Apache Spark. We implement an existing immuno-inspired mechanism, namely SeleSup, as a series of MapReduce-like operations. SeleSup is composed of a nu... Read More about Vehicle incident hot spots identification: An approach for big data.

An Immune-Inspired Technique to Identify Heavy Goods Vehicles Incident Hot Spots (2017)
Journal Article
Figueredo, G. P., Triguero, I., Mesgarpour, M., Maciel Guerra, A., Garibaldi, J. M., & John, R. (2017). An Immune-Inspired Technique to Identify Heavy Goods Vehicles Incident Hot Spots. IEEE Transactions on Emerging Topics in Computational Intelligence, 1(4), 248-258. https://doi.org/10.1109/TETCI.2017.2721960

We report on the adaptation of an immune-inspired instance selection technique to solve a real-world big data problem of determining vehicle incident hot spots. The technique, which is inspired by the Immune System self-regulation mechanism, was orig... Read More about An Immune-Inspired Technique to Identify Heavy Goods Vehicles Incident Hot Spots.

Self-labeling techniques for semi-supervised time series classification: an empirical study (2017)
Journal Article
González, M., Bergmeir, C., Triguero, I., Rodríguez, Y., & Benítez, J. M. (in press). Self-labeling techniques for semi-supervised time series classification: an empirical study. Knowledge and Information Systems, https://doi.org/10.1007/s10115-017-1090-9

An increasing amount of unlabeled time series data available render the semi-supervised paradigm a suitable approach to tackle classification problems with a reduced quantity of labeled data. Self-labeled techniques stand out from semi-supervised cla... Read More about Self-labeling techniques for semi-supervised time series classification: an empirical study.

Exact fuzzy k-Nearest neighbor classification for big datasets (2017)
Presentation / Conference Contribution
Maillo, J., Luengo, J., García, S., Herrera, F., & Triguero, I. (2017). Exact fuzzy k-Nearest neighbor classification for big datasets.

The k-Nearest Neighbors (kNN) classifier is one of the most effective methods in supervised learning problems. It classifies unseen cases comparing their similarity with the training data. Nevertheless, it gives to each labeled sample the same import... Read More about Exact fuzzy k-Nearest neighbor classification for big datasets.

A first attempt on global evolutionary undersampling for imbalanced big data (2017)
Presentation / Conference Contribution
Triguero, I., Galar, M., Bustince, H., & Herrera, F. (2017). A first attempt on global evolutionary undersampling for imbalanced big data.

The design of efficient big data learning models has become a common need in a great number of applications. The massive amounts of available data may hinder the use of traditional data mining techniques, especially when evolutionary algorithms are i... Read More about A first attempt on global evolutionary undersampling for imbalanced big data.

Distributed incremental fingerprint identification with reduced database penetration rate using a hierarchical classification based on feature fusion and selection (2017)
Journal Article
Peralta, D., Triguero, I., García, S., Saeys, Y., Benitez, J. M., & Herrera, F. (2017). Distributed incremental fingerprint identification with reduced database penetration rate using a hierarchical classification based on feature fusion and selection. Knowledge-Based Systems, 126, https://doi.org/10.1016/j.knosys.2017.03.014

Fingerprint recognition has been a hot research topic along the last few decades, with many applications and ever growing populations to identify. The need of flexible, fast identification systems is therefore patent in such situations. In this conte... Read More about Distributed incremental fingerprint identification with reduced database penetration rate using a hierarchical classification based on feature fusion and selection.

From Big data to Smart Data with the K-Nearest Neighbours algorithm (2016)
Presentation / Conference Contribution
Triguero, I., Maillo, J., Luengo, J., García, S., & Herrera, F. (2016). From Big data to Smart Data with the K-Nearest Neighbours algorithm.

The k-nearest neighbours algorithm is one of the most widely used data mining models because of its simplicity and accurate results. However, when it comes to deal with big datasets, with potentially noisy and missing information, this technique beco... Read More about From Big data to Smart Data with the K-Nearest Neighbours algorithm.

EPRENNID: An evolutionary prototype reduction based ensemble for nearest neighbor classification of imbalanced data (2016)
Journal Article
Vluymans, S., Triguero, I., Cornelis, C., & Saeys, Y. (2016). EPRENNID: An evolutionary prototype reduction based ensemble for nearest neighbor classification of imbalanced data. Neurocomputing, 216, https://doi.org/10.1016/j.neucom.2016.08.026

Classification problems with an imbalanced class distribution have received an increased amount of attention within the machine learning community over the last decade. They are encountered in a growing number of real-world situations and pose a chal... Read More about EPRENNID: An evolutionary prototype reduction based ensemble for nearest neighbor classification of imbalanced data.

Evolutionary undersampling for extremely imbalanced big data classification under apache spark (2016)
Presentation / Conference Contribution
Triguero, I., Galar, M., Merino, D., Maillo, J., Bustince, H., & Herrera, F. (2016). Evolutionary undersampling for extremely imbalanced big data classification under apache spark.

The classification of datasets with a skewed class distribution is an important problem in data mining. Evolutionary undersampling of the majority class has proved to be a successful approach to tackle this issue. Such a challenging task may become e... Read More about Evolutionary undersampling for extremely imbalanced big data classification under apache spark.

All Outputs (47)