Combining residual networks with LSTMs for lipreading

Stafylakis, Themos; Tzimiropoulos, Georgios

doi:10.21437/Interspeech.2017-85

Combining residual networks with LSTMs for lipreading

Stafylakis, Themos; Tzimiropoulos, Georgios

Authors

Themos Stafylakis

Georgios Tzimiropoulos

Abstract

We propose an end-to-end deep learning architecture for word-level visual speech recognition. The system is a combination of spatiotemporal convolutional, residual and bidirectional Long Short-Term Memory networks. We train and evaluate it on the Lipreading In-The-Wild benchmark, a challenging database of 500-size target-words consisting of 1.28sec video excerpts from BBC TV broadcasts. The proposed network attains word accuracy equal to 83.0%, yielding 6.8% absolute improvement over the current state-of-the-art, without using information about word boundaries during training or testing.

Citation

Stafylakis, T., & Tzimiropoulos, G. (2017, August). Combining residual networks with LSTMs for lipreading. Presented at Interspeech 2017, Stockholm, Sweden

Conference Name	Interspeech 2017
Start Date	Aug 20, 2017
End Date	Aug 24, 2017
Acceptance Date	May 22, 2017
Deposit Date	Aug 10, 2017
Publicly Available Date	Dec 31, 2017
Peer Reviewed	Peer Reviewed
Pages	3652-3656
Book Title	Proc. Interspeech 2017
DOI	https://doi.org/10.21437/Interspeech.2017-85
Keywords	visual speech recognition, lipreading, deep learning
Public URL	https://nottingham-repository.worktribe.com/output/861527
Publisher URL	http://www.isca-speech.org/archive/Interspeech_2017/abstracts/0085.html
Related Public URLs	http://www.interspeech2017.org/ http://www.isca-speech.org/iscaweb/index.php/archive/online-archive http://www.isca-speech.org/archive/Interspeech_2017/pdfs/0085.PDF http://www.interspeech2017.org/calls/papers/
Additional Information	Paper available on http://www.isca-speech.org/iscaweb/index.php/archive/online-archive. pp. 3652-3656. doi:10.21437/Interspeech.2017-85
Contract Date	Aug 10, 2017

Files

1703.04105.pdf (1.4 Mb)
PDF

Licence
https://creativecommons.org/licenses/by/4.0/

Downloadable Citations

HTML

BIB

RTF