Skip to main content

Research Repository

Advanced Search

Combining residual networks with LSTMs for lipreading

Stafylakis, Themos; Tzimiropoulos, Georgios

Combining residual networks with LSTMs for lipreading Thumbnail


Authors

Themos Stafylakis

Georgios Tzimiropoulos



Abstract

We propose an end-to-end deep learning architecture for word-level visual speech recognition. The system is a combination of spatiotemporal convolutional, residual and bidirectional Long Short-Term Memory networks. We train and evaluate it on the Lipreading In-The-Wild benchmark, a challenging database of 500-size target-words consisting of 1.28sec video excerpts from BBC TV broadcasts. The proposed network attains word accuracy equal to 83.0%, yielding 6.8% absolute improvement over the current state-of-the-art, without using information about word boundaries during training or testing.

Citation

Stafylakis, T., & Tzimiropoulos, G. (2017, August). Combining residual networks with LSTMs for lipreading. Presented at Interspeech 2017, Stockholm, Sweden

Conference Name Interspeech 2017
Start Date Aug 20, 2017
End Date Aug 24, 2017
Acceptance Date May 22, 2017
Deposit Date Aug 10, 2017
Publicly Available Date Dec 31, 2017
Peer Reviewed Peer Reviewed
Pages 3652-3656
Book Title Proc. Interspeech 2017
DOI https://doi.org/10.21437/Interspeech.2017-85
Keywords visual speech recognition, lipreading, deep
learning
Public URL https://nottingham-repository.worktribe.com/output/861527
Publisher URL http://www.isca-speech.org/archive/Interspeech_2017/abstracts/0085.html
Related Public URLs http://www.interspeech2017.org/
http://www.isca-speech.org/iscaweb/index.php/archive/online-archive
http://www.isca-speech.org/archive/Interspeech_2017/pdfs/0085.PDF
http://www.interspeech2017.org/calls/papers/
Additional Information Paper available on http://www.isca-speech.org/iscaweb/index.php/archive/online-archive. pp. 3652-3656. doi:10.21437/Interspeech.2017-85
Contract Date Aug 10, 2017

Files





Downloadable Citations