Themos Stafylakis
A new penalty term for the BIC with respect to speaker diarization
Stafylakis, Themos; Tzimiropoulos, Georgios; Katsouros, Vassilis; Carayannis, George
Authors
Georgios Tzimiropoulos
Vassilis Katsouros
George Carayannis
Abstract
In this paper we examine a new penalty term for the Bayesian Information Criterion (BIC) that is suited to the problem of speaker diarization. Based on our previous approach of penalizing each cluster only with its effective sample size - an approach we called segmental - we propose a stricter penalty term. The criterion we derive retains the main property of the Segmental-BIC, i.e. it approximates the evidence of overall partitions of the data and simultaneously leads to a pairwise dissimilarity measure that is completely defined by the pair of clusters in question. The experimental results show significant improvement in diarization accuracy on the ESTER benchmark.
Citation
Stafylakis, T., Tzimiropoulos, G., Katsouros, V., & Carayannis, G. (2010). A new penalty term for the BIC with respect to speaker diarization.
Conference Name | ICASSP 2010 - 2010 IEEE International Conference on Acoustics Speech and Signal Processing |
---|---|
End Date | Mar 19, 2010 |
Publication Date | Jan 1, 2010 |
Deposit Date | Feb 1, 2016 |
Publicly Available Date | Feb 1, 2016 |
Peer Reviewed | Peer Reviewed |
Keywords | Bayes Methods, Speaker Recognition, Bayesian Information Criterion, Cluster Analysis, Speaker Diarization |
Public URL | https://nottingham-repository.worktribe.com/output/1013269 |
Publisher URL | http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5495076 |
Additional Information | Published in: 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing: proceedings: March 14-19, 2010, Sheraton Dallas Hotel, Dallas, Texas, U.S.A. Piscataway, N.J. : IEEE, 2010. ISBN: 978-1-4244-4295-9. pp. 4978-4981, doi: 10.1109/ICASSP.2010.5495076 ©2010 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Files
tzimiroICASSP10.pdf
(207 Kb)
PDF
You might also like
Deep word embeddings for visual speech recognition
(2018)
Conference Proceeding
End-to-end audiovisual speech recognition
(2018)
Conference Proceeding
Zero-Shot Keyword Spotting for Visual Speech Recognition In-the-wild
(2018)
Conference Proceeding
Combining residual networks with LSTMs for lipreading
(2017)
Conference Proceeding