Audio-visual speaker verification using continuous fused HMMs
Dean, David B., Sridharan, Sridha, & Wark, Timothy J. (2006) Audio-visual speaker verification using continuous fused HMMs. In HCSNet workshop on Use of vision in human-computer interaction, 1-3 November, Canberra, Australia.
This paper examines audio-visual speaker verication using a novel adaptation of fused hidden Markov mod- els, in comparison to output fusion of individual clas- siers in the audio and video modalities. A com- parison of both hidden Markov model (HMM) and Gaussian mixture model (GMM) classiers in both modalities under output fusion shows that the choice of audio classier is more important than video. Al- though temporal information allows a HMM to out- perform a GMM individually in video, this temporal information does not carry through to output fusion with an audio classier, where the dierence between the two video classiers is minor. An adaptation of fused hidden Markov models, designed to be more ro- bust to within-speaker variation, is used to show that the temporal relationship between video observations and audio states can be harnessed to reduce errors in audio-visual speaker verication when compared to output fusion.
Impact and interest:
Citation countsare sourced monthly fromand citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
Full-text downloadsdisplays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.
|Item Type:||Conference Paper|
|Keywords:||audio, visual speaker recognition (AVSPR), fused hidden Markov model (FHMM)|
|Subjects:||Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100) > Pattern Recognition and Data Mining (080109)|
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100) > Natural Language Processing (080107)
|Divisions:||Past > QUT Faculties & Divisions > Faculty of Built Environment and Engineering|
|Copyright Owner:||Copyright 2006 Australian Computer Society|
|Copyright Statement:||Reproduced in accordance with the copyright policy of the publisher.|
|Deposited On:||08 Nov 2006|
|Last Modified:||22 Feb 2013 16:44|
Repository Staff Only: item control page