Skip to main content

Research Repository

Advanced Search

Combining residual networks with LSTMs for lipreading

Stafylakis, Themos; Tzimiropoulos, Georgios

Authors

Themos Stafylakis

Georgios Tzimiropoulos



Abstract

We propose an end-to-end deep learning architecture for word level visual speech recognition. The system is a combination of spatiotemporal convolutional, residual and bidirectional Long Short-Term Memory networks. We trained and evaluated it on the Lipreading In-The-Wild benchmark, a challenging database of 500-size vocabulary consisting of video excerpts from BBC TV broadcasts. The proposed network attains word accuracy equal to 83.0%, yielding 6.8% absolute improvement over the current state-of-the-art.

Peer Reviewed Peer Reviewed
APA6 Citation Stafylakis, T., & Tzimiropoulos, G. (in press). Combining residual networks with LSTMs for lipreading. https://doi.org/10.21437/Interspeech.2017-85
DOI https://doi.org/10.21437/Interspeech.2017-85
Keywords visual speech recognition, lipreading, deep
learning
Publisher URL http://www.isca-speech.org/archive/Interspeech_2017/abstracts/0085.html
Related Public URLs http://www.interspeech2017.org/
http://www.isca-speech..../archive/online-archive
http://www.isca-speech....eech_2017/pdfs/0085.PDF
http://www.interspeech2017.org/calls/papers/
Copyright Statement Copyright information regarding this work can be found at the following address: http://eprints.nottingh.../end_user_agreement.pdf
Additional Information Paper available on http://www.isca-speech....archive/online-archive. pp. 3652-3656. doi:10.21437/Interspeech.2017-85

Files

1703.04105.pdf (1.4 Mb)
PDF

Copyright Statement
Copyright information regarding this work can be found at the following address: http://eprints.nottingham.ac.uk/end_user_agreement.pdf





You might also like



Downloadable Citations

;