Eigenvoice modeling for cross likelihood ratio based speaker clustering : a Bayesian approach

Wang, David, Vogt, Robert J., & Sridharan, Sridha (2013) Eigenvoice modeling for cross likelihood ratio based speaker clustering : a Bayesian approach. Computer Speech and Language, 27(4), pp. 1011-1027.

View at publisher


This paper proposes the use of Bayesian approaches with the cross likelihood ratio (CLR) as a criterion for speaker clustering within a speaker diarization system, using eigenvoice modeling techniques. The CLR has previously been shown to be an effective decision criterion for speaker clustering using Gaussian mixture models. Recently, eigenvoice modeling has become an increasingly popular technique, due to its ability to adequately represent a speaker based on sparse training data, as well as to provide an improved capture of differences in speaker characteristics. The integration of eigenvoice modeling into the CLR framework to capitalize on the advantage of both techniques has also been shown to be beneficial for the speaker clustering task. Building on that success, this paper proposes the use of Bayesian methods to compute the conditional probabilities in computing the CLR, thus effectively combining the eigenvoice-CLR framework with the advantages of a Bayesian approach to the diarization problem. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 33.5% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.

Impact and interest:

0 citations in Scopus
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

36 since deposited on 14 Jan 2013
8 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 56402
Item Type: Journal Article
Refereed: Yes
Additional URLs:
Keywords: eigenvoice modeling, joint factor analysis, cross likelihood ratio, speaker clustering, speaker diarization
DOI: 10.1016/j.csl.2012.12.001
ISSN: 1095-8363
Subjects: Australian and New Zealand Standard Research Classification > ENGINEERING (090000) > ELECTRICAL AND ELECTRONIC ENGINEERING (090600) > Signal Processing (090609)
Divisions: Past > QUT Faculties & Divisions > Faculty of Built Environment and Engineering
Past > Institutes > Information Security Institute
Copyright Owner: Copyright 2012 Elsevier Ltd.
Copyright Statement: This is the author’s version of a work that was accepted for publication in Computer Speech and Language. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Computer Speech and Language, [VOL 27, ISSUE 4, (2013)] DOI: 10.1016/j.csl.2012.12.001
Deposited On: 14 Jan 2013 04:42
Last Modified: 24 Oct 2014 04:13

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page