Skip to main content

Research Repository

Advanced Search

Combining residual networks with LSTMs for lipreading

Stafylakis, Themos; Tzimiropoulos, Georgios

Authors

Themos Stafylakis

Georgios Tzimiropoulos



Abstract

We propose an end-to-end deep learning architecture for word level visual speech recognition. The system is a combination of spatiotemporal convolutional, residual and bidirectional Long Short-Term Memory networks. We trained and evaluated it on the Lipreading In-The-Wild benchmark, a challenging database of 500-size vocabulary consisting of video excerpts from BBC TV broadcasts. The proposed network attains word accuracy equal to 83.0%, yielding 6.8% absolute improvement over the current state-of-the-art.

Citation

Stafylakis, T., & Tzimiropoulos, G. (in press). Combining residual networks with LSTMs for lipreading. . https://doi.org/10.21437/Interspeech.2017-85

Conference Name Interspeech 2017
End Date Aug 24, 2017
Acceptance Date May 22, 2017
Deposit Date Aug 10, 2017
Publicly Available Date Dec 31, 2017
Peer Reviewed Peer Reviewed
DOI https://doi.org/10.21437/Interspeech.2017-85
Keywords visual speech recognition, lipreading, deep
learning
Public URL https://nottingham-repository.worktribe.com/output/861527
Publisher URL http://www.isca-speech.org/archive/Interspeech_2017/abstracts/0085.html
Related Public URLs http://www.interspeech2017.org/
http://www.isca-speech.org/iscaweb/index.php/archive/online-archive
http://www.isca-speech.org/archive/Interspeech_2017/pdfs/0085.PDF
http://www.interspeech2017.org/calls/papers/
Additional Information Paper available on http://www.isca-speech.org/iscaweb/index.php/archive/online-archive. pp. 3652-3656. doi:10.21437/Interspeech.2017-85

Files





You might also like



Downloadable Citations