Cross likelihood ratio based speaker clustering using eigenvoice models
Wang, David, Vogt, Robert J., Sridharan, Sridha, & Dean , David (2011) Cross likelihood ratio based speaker clustering using eigenvoice models. In Interspeech 2011 : 12th Annual Conference of the International Speech Communication Association, 28-31 August 2011, Florence, Italy.
This paper proposes the use of eigenvoice modeling techniques with the Cross Likelihood Ratio (CLR) as a criterion for speaker clustering within a speaker diarization system. The CLR has previously been shown to be a robust decision criterion for speaker clustering using Gaussian Mixture Models. Recently, eigenvoice modeling techniques have become increasingly popular, due to its ability to adequately represent a speaker based on sparse training data, as well as an improved capture of differences in speaker characteristics. This paper hence proposes that it would be beneficial to capitalize on the advantages of eigenvoice modeling in a CLR framework. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 35.1% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.
Impact and interest:
Citation countsare sourced monthly fromand citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
Full-text downloadsdisplays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.
|Item Type:||Conference Paper|
|Keywords:||eigenvoice modeling, joint factor analysis, cross likelihood ratio, speaker clustering, speaker diarization|
|Subjects:||Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100) > Pattern Recognition and Data Mining (080109)|
Australian and New Zealand Standard Research Classification > ENGINEERING (090000) > ELECTRICAL AND ELECTRONIC ENGINEERING (090600) > Signal Processing (090609)
|Divisions:||Past > QUT Faculties & Divisions > Faculty of Built Environment and Engineering|
Past > Institutes > Information Security Institute
Past > Schools > School of Engineering Systems
|Copyright Owner:||Copyright 2011 please consult authors|
|Deposited On:||27 Sep 2011 09:23|
|Last Modified:||27 Sep 2011 09:23|
Repository Staff Only: item control page