Skip to main content

Research Repository

Advanced Search

Using a machine learning model to risk stratify for the presence of significant liver disease in a primary care population

Bennett, Lucy; Mostafa, Mohamed; Hammersley, Richard; Purssell, Huw; Patel, Manish; Street, Oliver; Athwal, Varinder S.; Hanley, Karen Piper; ,, The ID-LIVER Consortium; Hanley, Neil A.; Morling, Joanne R.; Guha, Indra Neil

Using a machine learning model to risk stratify for the presence of significant liver disease in a primary care population Thumbnail


Authors

Lucy Bennett

Mohamed Mostafa

Richard Hammersley

Huw Purssell

Manish Patel

Oliver Street

Varinder S. Athwal

Karen Piper Hanley

The ID-LIVER Consortium ,

Neil A. Hanley

JOANNE MORLING JOANNE.MORLING@NOTTINGHAM.AC.UK
Professor of Public Health and Epidemiology

Profile Image

NEIL GUHA neil.guha@nottingham.ac.uk
Professor of Hepatology



Abstract

Background: Current strategies for detecting significant chronic liver disease (CLD) in the community are based on the extrapolation of diagnostic tests used in secondary care settings. Whilst this approach provides clinical utility, it has limitations related to diagnostic accuracy being predicated on disease prevalence and spectrum bias, which will differ in the community. Machine learning (ML) techniques provide a novel way of identifying significant variables without preconceived bias. As a proof-of-concept study, we wanted to examine the performance of nine different ML models based on both risk factors and abnormal liver enzyme tests in a large community cohort. Methods: Routine demographic and laboratory data was collected on 1,453 patients with risk factors for CLD, including high alcohol consumption, diabetes and obesity, in a community setting in Nottingham (UK) as part of the Scarred Liver project. A total of 87 variables were extracted. Transient elastography (TE) was used to define clinically significant liver fibrosis. The data was split into a training and hold out set. The median age of the cohort was 59, mean body mass index (BMI) 29.7 kg/m2, median TE 5.5 kPa, 49.2% had type 2 diabetes and 20.3% had a TE >8 kPa. Results: The nine different ML models, which included Random Forrest classifier, Support Vector classification and Gradient Boosting classifier, had a range of area under the curve (AUC) statistics of 0.5 to 0.75. Ensemble Stacker model showed the best performance, and this was replicated in the testing dataset (AUC 0.72). Recursive feature elimination found eight variables had a significant impact on model output. The model had superior sensitivity (74%) compared to specificity (60%). Conclusions: ML shows encouraging performance and highlights variables that may have bespoke value for diagnosing community liver disease. Optimising how ML algorithms are integrated into clinical pathways of care and exploring new biomarkers will further enhance diagnostic utility.

Citation

Bennett, L., Mostafa, M., Hammersley, R., Purssell, H., Patel, M., Street, O., …Guha, I. N. (2023). Using a machine learning model to risk stratify for the presence of significant liver disease in a primary care population. Journal of Medical Artificial Intelligence, 6, Article 27. https://doi.org/10.21037/jmai-23-35

Journal Article Type Article
Acceptance Date Sep 22, 2023
Online Publication Date Nov 21, 2023
Publication Date Nov 30, 2023
Deposit Date Nov 27, 2023
Publicly Available Date Nov 27, 2023
Journal Journal of Medical Artificial Intelligence
Electronic ISSN 2617-2496
Publisher AME Publishing Company
Peer Reviewed Peer Reviewed
Volume 6
Article Number 27
DOI https://doi.org/10.21037/jmai-23-35
Keywords Liver disease; machine learning (ML); diagnosis; community
Public URL https://nottingham-repository.worktribe.com/output/27590043
Publisher URL https://jmai.amegroups.org/article/view/8267/