Prediction of premature all-cause mortality: a prospective general population cohort study comparing machine-learning and standard epidemiological approaches

Weng, Stephen F; Vaz, Luis; Qureshi, Nadeem; Kai, Joe

doi:10.1371/journal.pone.0214365

Prediction of premature all-cause mortality: a prospective general population cohort study comparing machine-learning and standard epidemiological approaches

Weng, Stephen F; Vaz, Luis; Qureshi, Nadeem; Kai, Joe

Authors

Stephen F Weng

Luis Vaz

Professor NADEEM QURESHI nadeem.qureshi@nottingham.ac.uk
CLINICAL PROFESSOR

Professor JOE KAI joe.kai@nottingham.ac.uk
PROFESSOR OF PRIMARY CARE

Abstract

Background: Prognostic modelling using standard methods is well-established, particularly for predicting risk of single diseases. Machine-learning may offer potential to explore outcomes of even greater complexity, such as premature death. This study aimed to develop novel prediction algorithms using machine-learning, in addition to standard survival modelling, to predict premature all-cause mortality.

Methods: A prospective population cohort of 502,628 participants aged 40-69 years were recruited to the UK Biobank from 2006-2010 and followed-up until 2016. Participants were assessed on a range of demographic, biometric, clinical and lifestyle factors. Mortality data by ICD-10 were obtained from linkage to Office of National Statistics. Models were developed using deep learning, random forest and Cox regression. Calibration was assessed by comparing observed to predicted risks; and discrimination by area under the ‘receiver operating curve’ (AUC).

Findings: 14,418 deaths (2.9%) occurred over a total follow-up time of 3,508,454 person-years. A simple age and gender Cox model was the least predictive (AUC 0.689, 95% CI 0.681 – 0.699). A multivariate Cox regression model significantly improved discrimination by 6.2% (AUC 0.751, 95% CI 0.748 – 0.767). The application of machine-learning algorithms further improved discrimination by 3.2% using random forest (AUC 0.783, 95% CI 0.776 – 0.791) and 3.9% using deep learning (AUC 0.790, 95% CI 0.783 – 0.797). These ML algorithms improved discrimination by 9.4% and 10.1% respectively from a simple age and gender Cox regression model. Random forest and deep learning achieved similar levels of discrimination with no significant difference. Machine-learning algorithms were well-calibrated, while Cox regression models consistently over-predicted risk.

Conclusions: Machine-learning significantly improved accuracy of prediction of premature all-cause mortality in this middle-aged population, compared to standard methods. This study illustrates the value of machine-learning for risk prediction within a traditional epidemiological study design, and how this approach might be reported to assist scientific verification.

Citation

Weng, S. F., Vaz, L., Qureshi, N., & Kai, J. (2019). Prediction of premature all-cause mortality: a prospective general population cohort study comparing machine-learning and standard epidemiological approaches. PLoS ONE, 14(3), 1-22. https://doi.org/10.1371/journal.pone.0214365

Journal Article Type	Article
Acceptance Date	Mar 12, 2019
Online Publication Date	Mar 27, 2019
Publication Date	Mar 27, 2019
Deposit Date	Mar 26, 2019
Publicly Available Date	Mar 29, 2019
Journal	PLOS ONE
Electronic ISSN	1932-6203
Publisher	Public Library of Science
Peer Reviewed	Peer Reviewed
Volume	14
Issue	3
Article Number	e0214365
Pages	1-22
DOI	https://doi.org/10.1371/journal.pone.0214365
Keywords	premature all-cause mortality; machine-learning; risk prediction
Public URL	https://nottingham-repository.worktribe.com/output/1669986
Publisher URL	https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0214365
Contract Date	Mar 29, 2019