QUT ePrints

Audio-visual speaker verification using continuous fused HMMs

Dean, David B., Sridharan, Sridha, & Wark, Timothy J. (2006) Audio-visual speaker verification using continuous fused HMMs. In HCSNet workshop on Use of vision in human-computer interaction, 1-3 November, Canberra, Australia.

Abstract

This paper examines audio-visual speaker verication using a novel adaptation of fused hidden Markov mod- els, in comparison to output fusion of individual clas- siers in the audio and video modalities. A com- parison of both hidden Markov model (HMM) and Gaussian mixture model (GMM) classiers in both modalities under output fusion shows that the choice of audio classier is more important than video. Al- though temporal information allows a HMM to out- perform a GMM individually in video, this temporal information does not carry through to output fusion with an audio classier, where the dierence between the two video classiers is minor. An adaptation of fused hidden Markov models, designed to be more ro- bust to within-speaker variation, is used to show that the temporal relationship between video observations and audio states can be harnessed to reduce errors in audio-visual speaker verication when compared to output fusion.

Impact and interest:

Citation countsare sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

823 since deposited on 08 Nov 2006
50 in the past twelve months

Full-text downloadsdisplays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 5390
Item Type: Conference Paper
Additional URLs:
Keywords: audio, visual speaker recognition (AVSPR), fused hidden Markov model (FHMM)
ISSN: 1445-1336
Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100) > Pattern Recognition and Data Mining (080109)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100) > Natural Language Processing (080107)
Divisions: Past > QUT Faculties & Divisions > Faculty of Built Environment and Engineering
Copyright Owner: Copyright 2006 Australian Computer Society
Copyright Statement: Reproduced in accordance with the copyright policy of the publisher.
Deposited On: 08 Nov 2006
Last Modified: 22 Feb 2013 16:44

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page