Yorgos Tzimiropoulos
Zero-Shot Keyword Spotting for Visual Speech Recognition In-the-wild
Tzimiropoulos, Yorgos; Stafylakis, Themos
Authors
Themos Stafylakis
Abstract
Visual keyword spotting (KWS) is the problem of estimating whether a text query occurs in a given recording using only video information. This paper focuses on visual KWS for words unseen during training, a real-world, practical setting which so far has received no attention by the community. To this end, we devise an end-to-end architecture comprising (a) a state-of-the-art visual feature extractor based on spatiotemporal Residual Networks, (b) a grapheme-to-phoneme model based on sequence-to-sequence neural networks, and (c) a stack of recurrent neural networks which learn how to correlate visual features with the keyword representation. Different to prior works on KWS, which try to learn word representations merely from sequences of graphemes (i.e. letters), we propose the use of a grapheme-to-phoneme encoder-decoder model which learns how to map words to their pronunciation. We demonstrate that our system obtains very promising visual-only KWS results on the challenging LRS2 database, for keywords unseen during training. We also show that our system outperforms a baseline which addresses KWS via automatic speech recognition (ASR), while it drastically improves over other recently proposed ASR-free KWS methods.
Citation
Tzimiropoulos, Y., & Stafylakis, T. (2018, September). Zero-Shot Keyword Spotting for Visual Speech Recognition In-the-wild. Presented at European Conference on Computer Vision, Munich, Germany
Conference Name | European Conference on Computer Vision |
---|---|
Start Date | Sep 8, 2018 |
End Date | Sep 14, 2018 |
Acceptance Date | Jul 3, 2018 |
Online Publication Date | Oct 6, 2018 |
Publication Date | Oct 9, 2018 |
Deposit Date | Oct 15, 2018 |
Publicly Available Date | Oct 7, 2019 |
Publisher | Springer Nature |
Volume | 11208 LNCS |
Pages | 536-552 |
Series Title | Lecture notes in computer science |
Series Number | 11208 |
Series ISSN | 1611-3349 |
Book Title | Computer Vision – ECCV 2018 |
ISBN | 978-3-030-01224-3 |
DOI | https://doi.org/10.1007/978-3-030-01225-0_32 |
Keywords | Visual keyword spotting; Visual speech recognition; Zero-shot learning |
Public URL | https://nottingham-repository.worktribe.com/output/1164454 |
Publisher URL | https://link.springer.com/chapter/10.1007/978-3-030-01225-0_32 |
Contract Date | Oct 15, 2018 |
Files
Themos Stafylakis Zero-shot Keyword Search ECCV 2018 Paper
(471 Kb)
PDF
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search