Kavan Fatehi
ScoutWav: Two-Step Fine-Tuning on Self-Supervised Automatic Speech Recognition for Low-Resource Environments
Fatehi, Kavan; Torres, Mercedes Torres; Kucukyilmaz, Ayse
Authors
Abstract
Recent improvements in Automatic Speech Recognition (ASR) systems obtain extraordinary results. However, there are specific domains where training data can be either limited or not representative enough, which are known as Low-Resource Environments (LRE). In this paper, we present ScoutWav, a network that integrates context-based word boundaries with self-supervised learning, wav2vec 2.0, to present a low-resource ASR model. First, we pre-train a model on High-Resource Environment (HRE) datasets and then fine-tune with the LRE datasets to obtain context-based word boundaries. The resulting word boundaries are used for fine-tuning with a pre-trained and iteratively refined wav2vec 2.0 to learn appropriate representations for the downstream ASR task. Our refinement strategy for wav2vec 2.0 comes determined by using canonical correlation analysis (CCA) to detect which layers need updating. This dynamic refinement allows wav2vec 2.0 to learn more descriptive LRE-based representations. Finally, the learned representations in the two-step fine-tuned wav2vec 2.0 framework are fed back to the Scout Network for the downstream task. We carried out experiments with two different LRE datasets: I-CUBE and UASpeech. Our experiments demonstrate that using the target domain word boundary after pre-training and automatic layer analysis, ScoutWav shows up to 12% relative WER reduction on the LR data.
Citation
Fatehi, K., Torres, M. T., & Kucukyilmaz, A. (2022, September). ScoutWav: Two-Step Fine-Tuning on Self-Supervised Automatic Speech Recognition for Low-Resource Environments. Presented at Interspeech 2022, Incheon, Korea
Presentation Conference Type | Edited Proceedings |
---|---|
Conference Name | Interspeech 2022 |
Start Date | Sep 18, 2022 |
End Date | Sep 22, 2022 |
Acceptance Date | Jun 15, 2022 |
Online Publication Date | Sep 22, 2022 |
Publication Date | Sep 22, 2022 |
Deposit Date | Jul 29, 2022 |
Publicly Available Date | Sep 22, 2022 |
Volume | 2022-September |
Pages | 3523-3527 |
Series Title | Interspeech |
Book Title | Proceedings of Interspeech 2022 |
DOI | https://doi.org/10.21437/Interspeech.2022-10270 |
Keywords | Speech Recognition, Deep Learning |
Public URL | https://nottingham-repository.worktribe.com/output/9409043 |
Publisher URL | https://www.isca-speech.org/archive/interspeech_2022/fatehi22_interspeech.html |
Related Public URLs | https://interspeech2022.org/ |
Files
Fatehi-InterSpeech22-ScoutWav
(851 Kb)
PDF
You might also like
A Taxonomy of Domestic Robot Failure Outcomes: Understanding the impact of failure on trustworthiness of domestic robots
(2024)
Presentation / Conference Contribution
Charting Ethical Tensions in Multispecies Technology Research through Beneficiary-Epistemology Space
(2024)
Presentation / Conference Contribution
LABERT: A Combination of Local Aggregation and Self-Supervised Speech Representation Learning for Detecting Informative Hidden Units in Low-Resource ASR Systems
(2023)
Presentation / Conference Contribution
TAS for Cats: An Artist-led Exploration of Trustworthy Autonomous Systems for Companion Animals
(2023)
Presentation / Conference Contribution
Somabotics Toolkit for Rapid Prototyping Human-Robot Interaction Experiences using Wearable Haptics
(2023)
Presentation / Conference Contribution
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search