Kavan Fatehi
An overview of high-resource automatic speech recognition methods and their empirical evaluation in low-resource environments
Fatehi, Kavan; Torres Torres, Mercedes; Kucukyilmaz, Ayse
Authors
Abstract
Deep learning methods for Automatic Speech Recognition (ASR) often rely on large-scale training datasets, which are typically unavailable in low-resource environments (LREs). This lack of sufficient and representative training data poses a significant challenge for applying ASR systems in specific domains categorized as LREs. In this paper, we provide a comprehensive overview and empirical analysis of state-of-the-art deep learning techniques for ASR, which are primarily designed for high-resource environments (HREs). Our aim is to explore their potential effectiveness in LRE settings. We focus on identifying key factors that influence the adaptation of HRE models to LRE tasks. To this end, we survey advanced deep learning models and conduct a comparative evaluation of their performance in LRE contexts. Additionally, we propose that pre-training ASR models on HRE datasets, followed by domain-specific fine-tuning on LRE data, can significantly enhance performance in data-scarce settings. Using LibriSpeech and WSJ as our HRE datasets, we evaluate these models on two LRE datasets: UASpeech for dysarthric speech and iCUBE, our novel human–robot interaction dataset. Our systematic experiments, involving varying dataset sizes for pre-training, demonstrate the efficacy of combining pre-training and fine-tuning strategies to improve recognition accuracy in LREs.
Citation
Fatehi, K., Torres Torres, M., & Kucukyilmaz, A. (2025). An overview of high-resource automatic speech recognition methods and their empirical evaluation in low-resource environments. Speech Communication, 167, Article 103151. https://doi.org/10.1016/j.specom.2024.103151
Journal Article Type | Article |
---|---|
Acceptance Date | Nov 22, 2024 |
Online Publication Date | Dec 10, 2024 |
Publication Date | 2025-02 |
Deposit Date | Dec 17, 2024 |
Publicly Available Date | Dec 20, 2024 |
Journal | Speech Communication |
Print ISSN | 0167-6393 |
Publisher | Elsevier |
Peer Reviewed | Peer Reviewed |
Volume | 167 |
Article Number | 103151 |
DOI | https://doi.org/10.1016/j.specom.2024.103151 |
Keywords | Automatic speech recognition, End-to-end model, Deep learning models, Low-resource environment |
Public URL | https://nottingham-repository.worktribe.com/output/42835237 |
Publisher URL | https://www.sciencedirect.com/science/article/pii/S0167639324001225?via%3Dihub |
Files
An overview of high-resource automatic speech recognition methods and their empirical evaluation in low-resource environments
(2 Mb)
PDF
Publisher Licence URL
https://creativecommons.org/licenses/by/4.0/
Copyright Statement
© 2024 The Authors. Published by Elsevier B.V.
You might also like
A Taxonomy of Domestic Robot Failure Outcomes: Understanding the impact of failure on trustworthiness of domestic robots
(2024)
Presentation / Conference Contribution
Charting Ethical Tensions in Multispecies Technology Research through Beneficiary-Epistemology Space
(2024)
Presentation / Conference Contribution
LABERT: A Combination of Local Aggregation and Self-Supervised Speech Representation Learning for Detecting Informative Hidden Units in Low-Resource ASR Systems
(2023)
Presentation / Conference Contribution
TAS for Cats: An Artist-led Exploration of Trustworthy Autonomous Systems for Companion Animals
(2023)
Presentation / Conference Contribution
Somabotics Toolkit for Rapid Prototyping Human-Robot Interaction Experiences using Wearable Haptics
(2023)
Presentation / Conference Contribution
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search