CHRIS SCHOLES Chris.Scholes@nottingham.ac.uk
Assistant Professor in Psychology
The interrelationship between the face and vocal tract configuration during audiovisual speech
Scholes, Chris; Skipper, Jeremy I.; Johnston, Alan
Authors
Jeremy I. Skipper
ALAN JOHNSTON Alan.Johnston@nottingham.ac.uk
Professor of Psychology
Abstract
It is well established that speech perception is improved when we are able to see the speaker talking along with hearing their voice, especially when the speech is noisy. While we have a good understanding of where speech integration occurs in the brain, it is unclear how visual and auditory cues are combined to improve speech perception. One suggestion is that integration can occur as both visual and auditory cues arise from a common generator: the vocal tract. Here, we investigate whether facial and vocal tract movements are linked during speech production by comparing videos of the face and fast magnetic resonance (MR) image sequences of the vocal tract. The joint variation in the face and vocal tract was extracted using an application of principal components analysis (PCA), and we demonstrate that MR image sequences can be reconstructed with high fidelity using only the facial video and PCA. Reconstruction fidelity was significantly higher when images from the two sequences corresponded in time, and including implicit temporal information by combining contiguous frames also led to a significant increase in fidelity. A "Bubbles" technique was used to identify which areas of the face were important for recovering information about the vocal tract, and vice versa, on a frame-by-frame basis. Our data reveal that there is sufficient information in the face to recover vocal tract shape during speech. In addition, the facial and vocal tract regions that are important for reconstruction are those that are used to generate the acoustic speech signal.
Citation
Scholes, C., Skipper, J. I., & Johnston, A. (2020). The interrelationship between the face and vocal tract configuration during audiovisual speech. Proceedings of the National Academy of Sciences, 117(51), 32791-32798. https://doi.org/10.1073/pnas.2006192117
Journal Article Type | Article |
---|---|
Acceptance Date | Nov 6, 2020 |
Online Publication Date | Dec 8, 2020 |
Publication Date | Dec 22, 2020 |
Deposit Date | Dec 9, 2020 |
Publicly Available Date | Dec 9, 2020 |
Journal | Proceedings of the National Academy of Sciences of the United States of America |
Print ISSN | 0027-8424 |
Electronic ISSN | 1091-6490 |
Publisher | National Academy of Sciences |
Peer Reviewed | Peer Reviewed |
Volume | 117 |
Issue | 51 |
Pages | 32791-32798 |
DOI | https://doi.org/10.1073/pnas.2006192117 |
Public URL | https://nottingham-repository.worktribe.com/output/5128909 |
Publisher URL | https://www.pnas.org/content/117/51/32791 |
Files
interrelationship between the face and vocal tract
(1.5 Mb)
PDF
Publisher Licence URL
https://creativecommons.org/licenses/by/4.0/
You might also like
Visual motion induces a forward prediction of spatial pattern
(2011)
Journal Article
An adaptable metric shapes perceptual space
(2016)
Journal Article
Pupil dilation as an index of preferred mutual gaze duration
(2016)
Journal Article
Difference magnitude is not measured by discrimination steps for order of point patterns
(2016)
Journal Article
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search