Themos Stafylakis
Combining residual networks with LSTMs for lipreading
Stafylakis, Themos; Tzimiropoulos, Georgios
Authors
Georgios Tzimiropoulos
Abstract
We propose an end-to-end deep learning architecture for word-level visual speech recognition. The system is a combination of spatiotemporal convolutional, residual and bidirectional Long Short-Term Memory networks. We train and evaluate it on the Lipreading In-The-Wild benchmark, a challenging database of 500-size target-words consisting of 1.28sec video excerpts from BBC TV broadcasts. The proposed network attains word accuracy equal to 83.0%, yielding 6.8% absolute improvement over the current state-of-the-art, without using information about word boundaries during training or testing.
Citation
Stafylakis, T., & Tzimiropoulos, G. (2017, August). Combining residual networks with LSTMs for lipreading. Presented at Interspeech 2017, Stockholm, Sweden
Conference Name | Interspeech 2017 |
---|---|
Start Date | Aug 20, 2017 |
End Date | Aug 24, 2017 |
Acceptance Date | May 22, 2017 |
Deposit Date | Aug 10, 2017 |
Publicly Available Date | Dec 31, 2017 |
Peer Reviewed | Peer Reviewed |
Pages | 3652-3656 |
Book Title | Proc. Interspeech 2017 |
DOI | https://doi.org/10.21437/Interspeech.2017-85 |
Keywords | visual speech recognition, lipreading, deep learning |
Public URL | https://nottingham-repository.worktribe.com/output/861527 |
Publisher URL | http://www.isca-speech.org/archive/Interspeech_2017/abstracts/0085.html |
Related Public URLs | http://www.interspeech2017.org/ http://www.isca-speech.org/iscaweb/index.php/archive/online-archive http://www.isca-speech.org/archive/Interspeech_2017/pdfs/0085.PDF http://www.interspeech2017.org/calls/papers/ |
Additional Information | Paper available on http://www.isca-speech.org/iscaweb/index.php/archive/online-archive. pp. 3652-3656. doi:10.21437/Interspeech.2017-85 |
Contract Date | Aug 10, 2017 |
Files
1703.04105.pdf
(1.4 Mb)
PDF
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search