Anna L. Swan
A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data
Swan, Anna L.; Stekel, Dov J.; Hodgman, Charlie; Allaway, David; Alqahtani, Mohammed H.; Mobasheri, Ali; Bacardit, Jaume
Authors
DOV STEKEL DOV.STEKEL@NOTTINGHAM.AC.UK
Professor of Computational Biology
Charlie Hodgman
David Allaway
Mohammed H. Alqahtani
Ali Mobasheri
Jaume Bacardit
Abstract
Background: Investigations into novel biomarkers using omics techniques generate large amounts of data. Due to their size and numbers of attributes, these data are suitable for analysis with machine learning methods. A key component of typical machine learning pipelines for omics data is feature selection, which is used to reduce the raw high-dimensional data into a tractable number of features. Feature selection needs to balance the objective of using as few features as possible, while maintaining high predictive power. This balance is crucial when the goal of data analysis is the identification of highly accurate but small panels of biomarkers with potential clinical utility. In this paper we propose a heuristic for the selection of very small feature subsets, via an iterative feature elimination process that is guided by rule-based machine learning, called RGIFE (Rule-guided Iterative Feature Elimination). We use this heuristic to identify putative biomarkers of osteoarthritis (OA), articular cartilage degradation and synovial inflammation, using both proteomic and transcriptomic datasets.
Results and discussion: Our RGIFE heuristic increased the classification accuracies achieved for all datasets when no feature selection is used, and performed well in a comparison with other feature selection methods. Using this method the datasets were reduced to a smaller number of genes or proteins, including those known to be relevant to OA, cartilage degradation and joint inflammation. The results have shown the RGIFE feature reduction method to be suitable for analysing both proteomic and transcriptomics data. Methods that generate large ‘omics’ datasets are increasingly being used in the area of rheumatology.
Conclusions: Feature reduction methods are advantageous for the analysis of omics data in the field of rheumatology, as the applications of such techniques are likely to result in improvements in diagnosis, treatment and drug discovery.
Citation
Swan, A. L., Stekel, D. J., Hodgman, C., Allaway, D., Alqahtani, M. H., Mobasheri, A., & Bacardit, J. (2015). A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data. BMC Genomics, 16(Suppl 1), Article S2. https://doi.org/10.1186/1471-2164-16-s1-s2
Journal Article Type | Article |
---|---|
Acceptance Date | Sep 24, 2014 |
Online Publication Date | Jan 15, 2015 |
Publication Date | Jan 15, 2015 |
Deposit Date | Dec 3, 2018 |
Publicly Available Date | Jan 25, 2019 |
Journal | BMC Genomics |
Publisher | Springer Verlag |
Peer Reviewed | Peer Reviewed |
Volume | 16 |
Issue | Suppl 1 |
Article Number | S2 |
DOI | https://doi.org/10.1186/1471-2164-16-s1-s2 |
Public URL | https://nottingham-repository.worktribe.com/output/1361802 |
Publisher URL | https://bmcgenomics.biomedcentral.com/track/pdf/10.1186/1471-2164-16-S1-S2 |
Contract Date | Dec 3, 2018 |
Files
A machine learning heuristic
(690 Kb)
PDF
Publisher Licence URL
https://creativecommons.org/licenses/by/4.0/
You might also like
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search