Skip to main content

Research Repository

Advanced Search

A large-scale and PCR-referenced vocal audio dataset for COVID-19

Budd, Jobie; Baker, Kieran; Karoune, Emma; Coppock, Harry; Patel, Selina; Payne, Richard; Tendero Cañadas, Ana; Titcomb, Alexander; Hurley, David; Egglestone, Sabrina; Butler, Lorraine; Mellor, Jonathon; Nicholson, George; Kiskin, Ivan; Koutra, Vasiliki; Jersakova, Radka; McKendry, Rachel; Diggle, Peter; Richardson, Sylvia; Schuller, Björn; Gilmour, Steven; Pigoli, Davide; Roberts, Stephen; Packham, Josef; Thornley, Tracey; Holmes, Chris

A large-scale and PCR-referenced vocal audio dataset for COVID-19 Thumbnail


Authors

Jobie Budd

Kieran Baker

Emma Karoune

Harry Coppock

Selina Patel

Richard Payne

Ana Tendero Cañadas

Alexander Titcomb

David Hurley

Sabrina Egglestone

Lorraine Butler

Jonathon Mellor

George Nicholson

Ivan Kiskin

Vasiliki Koutra

Radka Jersakova

Rachel McKendry

Peter Diggle

Sylvia Richardson

Björn Schuller

Steven Gilmour

Davide Pigoli

Stephen Roberts

Josef Packham

Chris Holmes



Abstract

The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech were collected in the ‘Speak up and help beat coronavirus’ digital survey alongside demographic, symptom and self-reported respiratory condition data. Digital survey submissions were linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,565 of 72,999 participants and 24,105 of 25,706 positive cases. Respiratory symptoms were reported by 45.6% of participants. This dataset has additional potential uses for bioacoustics research, with 11.3% participants self-reporting asthma, and 27.2% with linked influenza PCR test results.

Journal Article Type Article
Acceptance Date Jun 10, 2024
Online Publication Date Jun 27, 2024
Publication Date Jun 27, 2024
Deposit Date Apr 10, 2024
Publicly Available Date Jul 8, 2024
Electronic ISSN 2052-4463
Publisher Nature Publishing Group
Peer Reviewed Peer Reviewed
Volume 11
Article Number 700
DOI https://doi.org/10.1038/s41597-024-03492-w
Public URL https://nottingham-repository.worktribe.com/output/33561144
Publisher URL https://www.nature.com/articles/s41597-024-03492-w

Files





You might also like



Downloadable Citations