Dr ISAAC TRIGUERO VELAZQUEZ I.TrigueroVelazquez@nottingham.ac.uk
ASSOCIATE PROFESSOR
ROSEFW-RF: the winner algorithm for the ECBDL’14 big data competition: an extremely imbalanced big data bioinformatics problem
Triguero, Isaac; del R�o, Sara; L�pez, Victoria; Bacardit, Jaume; Ben�tez, Jos� M.; Herrera, Francisco
Authors
Sara del R�o
Victoria L�pez
Jaume Bacardit
Jos� M. Ben�tez
Francisco Herrera
Abstract
The application of data mining and machine learning techniques to biological and biomedicine data continues to be an ubiquitous research theme in current bioinformatics. The rapid advances in biotechnology are allowing us to obtain and store large quantities of data about cells, proteins, genes, etc., that should be processed. Moreover, in many of these problems such as contact map prediction, the problem tackled in this paper, it is difficult to collect representative positive examples. Learning under these circumstances, known as imbalanced big data classification, may not be straightforward for most of the standard machine learning methods.
In this work we describe the methodology that won the ECBDL’14 big data challenge for a bioinformatics big data problem. This algorithm, named as ROSEFW-RF, is based on several MapReduce approaches to (1) balance the classes distribution through random oversampling, (2) detect the most relevant features via an evolutionary feature weighting process and a threshold to choose them, (3) build an appropriate Random Forest model from the pre-processed data and finally (4) classify the test data. Across the paper, we detail and analyze the decisions made during the competition showing an extensive experimental study that characterize the way of working of our methodology. From this analysis we can conclude that this approach is very suitable to tackle large-scale bioinformatics classifications problems.
Citation
Triguero, I., del Río, S., López, V., Bacardit, J., Benítez, J. M., & Herrera, F. (2015). ROSEFW-RF: the winner algorithm for the ECBDL’14 big data competition: an extremely imbalanced big data bioinformatics problem. Knowledge-Based Systems, 87, https://doi.org/10.1016/j.knosys.2015.05.027
Journal Article Type | Article |
---|---|
Acceptance Date | May 28, 2015 |
Online Publication Date | Jun 1, 2015 |
Publication Date | Oct 1, 2015 |
Deposit Date | Sep 4, 2017 |
Publicly Available Date | Sep 4, 2017 |
Journal | Knowledge-Based Systems |
Print ISSN | 0950-7051 |
Electronic ISSN | 1872-7409 |
Publisher | Elsevier |
Peer Reviewed | Peer Reviewed |
Volume | 87 |
DOI | https://doi.org/10.1016/j.knosys.2015.05.027 |
Keywords | Bioinformatics; Big data; Hadoop; MapReduce; Imbalance classification; Evolutionary feature selection |
Public URL | https://nottingham-repository.worktribe.com/output/981985 |
Publisher URL | http://www.sciencedirect.com/science/article/pii/S0950705115002130 |
Contract Date | Sep 4, 2017 |
Files
Triguero-et-al-cleanVersion.pdf
(2.7 Mb)
PDF
Copyright Statement
Copyright information regarding this work can be found at the following address: http://creativecommons.org/licenses/by-nc-nd/4.0
You might also like
Machine Learning Pipeline for Energy and Environmental Prediction in Cold Storage Facilities
(2024)
Journal Article
Local-global methods for generalised solar irradiance forecasting
(2024)
Journal Article
Hyper-Stacked: Scalable and Distributed Approach to AutoML for Big Data
(2023)
Presentation / Conference Contribution
Explaining time series classifiers through meaningful perturbation and optimisation
(2023)
Journal Article
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search