Skip to main content

Research Repository

Advanced Search

Machine learning on national shopping data reliably estimates childhood obesity prevalence and socio-economic deprivation

Long, Gavin; Nica-Avram, Georgiana; Harvey, John; Lukinova, Evgeniya; Mansilla, Roberto; Welham, Simon; Engelmann, Gregor; Dolan, Elizabeth; Makokoro, Kuzivakwashe; Thomas, Michelle; Powell, Edward; Goulding, James

Machine learning on national shopping data reliably estimates childhood obesity prevalence and socio-economic deprivation Thumbnail


Authors

Gregor Engelmann

Kuzivakwashe Makokoro

Michelle Thomas

Edward Powell



Abstract

Deprivation pushes people to choose cheap, calorie-dense foods instead of nutritious but expensive alternatives. Diseases, such as obesity, cardiovascu-lar disease, and diabetes, resulting from these poor dietary choices place a significant burden on public health systems. Measuring nutritional insecurity is difficult to achieve at scale and so the ability to study the relationship between nutritional outcomes and deprivation at a national level is very challenging. This makes it difficult to understand the effect of new policies or track changes over time. To address this challenge, we develop a machine learning approach using massive anonymised transactional data (4 million members and 2.5 billion transactions) in partnership with the retailer The Cooperative Group UK. We engineer a series of variables related to obe-sogenic diets, including a new measure called 'Calorie-oriented purchasing'. These variables help illustrate how large-scale transactional data can discriminate between neighbourhoods most affected by deprivation and childhood obesity. Through comparative assessment of machine learning approaches, we find better performance from tree-based models (Random Forest, XG-Boost) with the best-achieving accuracy of 0.88 for predicting deprivation and an accuracy of 0.79 for childhood obesity. Calorie-oriented purchasing emerges as a robust predictor of deprivation and childhood obesity at the census area level. Results show this approach can help summarise nutritional insecurity, and support its spatio-temporal monitoring. We conclude with policy implications and recommend retailers adopt new measures for measuring national nutrition insecurity.

Citation

Long, G., Nica-Avram, G., Harvey, J., Lukinova, E., Mansilla, R., Welham, S., Engelmann, G., Dolan, E., Makokoro, K., Thomas, M., Powell, E., & Goulding, J. (2025). Machine learning on national shopping data reliably estimates childhood obesity prevalence and socio-economic deprivation. Food Policy, 131, Article 102826. https://doi.org/10.1016/j.foodpol.2025.102826

Journal Article Type Article
Acceptance Date Feb 10, 2025
Online Publication Date Feb 27, 2025
Publication Date 2025-02
Deposit Date Feb 19, 2025
Publicly Available Date Mar 3, 2025
Journal Food Policy
Print ISSN 0306-9192
Publisher Elsevier
Peer Reviewed Peer Reviewed
Volume 131
Article Number 102826
DOI https://doi.org/10.1016/j.foodpol.2025.102826
Keywords Deprivation; Obesity; Machine learning; Dietary Monitoring; Digital Footprints; Food Security
Public URL https://nottingham-repository.worktribe.com/output/45595723
Publisher URL https://www.sciencedirect.com/science/article/pii/S0306919225000302?via%3Dihub
Additional Information This article is maintained by: Elsevier; Article Title: Machine learning on national shopping data reliably estimates childhood obesity prevalence and socio-economic deprivation; Journal Title: Food Policy; CrossRef DOI link to publisher maintained version: https://doi.org/10.1016/j.foodpol.2025.102826; Content Type: article; Copyright: © 2025 The Authors. Published by Elsevier Ltd.

Files





You might also like



Downloadable Citations