Daniele Soria
A "non-parametric" version of the naive Bayes classifier
Soria, Daniele; Garibaldi, Jonathan M.; Ambrogi, Federico; Biganzoli, Elia M.; Ellis, Ian O.
Authors
Jonathan M. Garibaldi
Federico Ambrogi
Elia M. Biganzoli
Ian O. Ellis
Abstract
Many algorithms have been proposed for the machine learning task of classication. One of the simplest methods, the naive Bayes classifyer, has often been found to give good performance despite the fact that its underlying assumptions (of independence and a Normal distribution of the variables) are perhaps violated. In previous work, we applied naive Bayes and other standard algorithms to a breast cancer database from Nottingham City Hospital in which the variables are highly non-Normal and found that the algorithm performed well when predicting a class that had been derived from the same data. However, when we then applied naive Bayes to predict an alternative clinical variable, it performed much worse than other techniques. This motivated us to propose an alternative method, based on naive Bayes, which removes the requirement for the variables to be Normally distributed, but retains the essential structure and other underlying assumptions of the method. We tested our novel algorithm on our breast cancer data and on three UCI datasets which also exhibited strong violations of Normality. We found our algorithm outperformed naive Bayes in all four cases and outperformed multinomial logistic regression (MLR) in two cases. We conclude that our method offers a competitive alternative to MLR and naive Bayes when dealing with data sets in which non-Normal distributions are observed.
Citation
Soria, D., Garibaldi, J. M., Ambrogi, F., Biganzoli, E. M., & Ellis, I. O. (2011). A "non-parametric" version of the naive Bayes classifier. Knowledge-Based Systems, 24(6), https://doi.org/10.1016/j.knosys.2011.02.014
Journal Article Type | Article |
---|---|
Publication Date | Aug 1, 2011 |
Deposit Date | Jan 30, 2015 |
Publicly Available Date | Jan 30, 2015 |
Journal | Knowledge-Based Systems |
Print ISSN | 0950-7051 |
Electronic ISSN | 1872-7409 |
Publisher | Elsevier |
Peer Reviewed | Peer Reviewed |
Volume | 24 |
Issue | 6 |
DOI | https://doi.org/10.1016/j.knosys.2011.02.014 |
Public URL | https://nottingham-repository.worktribe.com/output/1009811 |
Publisher URL | http://www.sciencedirect.com/science/article/pii/S0950705111000414 |
Additional Information | This is the author’s version of a work that was accepted for publication in Knowledge-Based Systems. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Knowledge-Based Systems, 24(6), 2011. doi:10.1016/j.knosys.2011.02.014. |
Files
soria2011a.pdf
(1 Mb)
PDF
You might also like
A pattern-based algorithm with fuzzy logic bin selector for online bin packing problem
(2024)
Journal Article
Lessons learned from the COVID-19 pandemic about sample access for research in the UK
(2022)
Journal Article
Machine learning can predict disease manifestations and outcomes in lymphangioleiomyomatosis
(2020)
Journal Article
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search