Skip to main content

Research Repository

Advanced Search

An overview of high-resource automatic speech recognition methods and their empirical evaluation in low-resource environments

Fatehi, Kavan; Torres Torres, Mercedes; Kucukyilmaz, Ayse

An overview of high-resource automatic speech recognition methods and their empirical evaluation in low-resource environments Thumbnail


Authors

Kavan Fatehi

Mercedes Torres Torres



Abstract

Deep learning methods for Automatic Speech Recognition (ASR) often rely on large-scale training datasets, which are typically unavailable in low-resource environments (LREs). This lack of sufficient and representative training data poses a significant challenge for applying ASR systems in specific domains categorized as LREs. In this paper, we provide a comprehensive overview and empirical analysis of state-of-the-art deep learning techniques for ASR, which are primarily designed for high-resource environments (HREs). Our aim is to explore their potential effectiveness in LRE settings. We focus on identifying key factors that influence the adaptation of HRE models to LRE tasks. To this end, we survey advanced deep learning models and conduct a comparative evaluation of their performance in LRE contexts. Additionally, we propose that pre-training ASR models on HRE datasets, followed by domain-specific fine-tuning on LRE data, can significantly enhance performance in data-scarce settings. Using LibriSpeech and WSJ as our HRE datasets, we evaluate these models on two LRE datasets: UASpeech for dysarthric speech and iCUBE, our novel human–robot interaction dataset. Our systematic experiments, involving varying dataset sizes for pre-training, demonstrate the efficacy of combining pre-training and fine-tuning strategies to improve recognition accuracy in LREs.

Citation

Fatehi, K., Torres Torres, M., & Kucukyilmaz, A. (2025). An overview of high-resource automatic speech recognition methods and their empirical evaluation in low-resource environments. Speech Communication, 167, Article 103151. https://doi.org/10.1016/j.specom.2024.103151

Journal Article Type Article
Acceptance Date Nov 22, 2024
Online Publication Date Dec 10, 2024
Publication Date 2025-02
Deposit Date Dec 17, 2024
Publicly Available Date Dec 20, 2024
Journal Speech Communication
Print ISSN 0167-6393
Publisher Elsevier
Peer Reviewed Peer Reviewed
Volume 167
Article Number 103151
DOI https://doi.org/10.1016/j.specom.2024.103151
Keywords Automatic speech recognition, End-to-end model, Deep learning models, Low-resource environment
Public URL https://nottingham-repository.worktribe.com/output/42835237
Publisher URL https://www.sciencedirect.com/science/article/pii/S0167639324001225?via%3Dihub

Files






You might also like



Downloadable Citations