Robust speaker verification via fusion of speech and lip modalities

Wark, T., Sridharan, S., & Chandran, V. (1999) Robust speaker verification via fusion of speech and lip modalities. In Proceedings of the 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, Phoenix, Arizona, pp. 3061-3064.

[img] Published Version (PDF 437kB)
Administrators only | Request a copy from author

View at publisher


This paper investigates the use of lip information, in conjunction with speech information, for robust speaker verification in the presence of background noise. It has been previously shown in our own work, and in the work of others, that features extracted from a speaker's moving lips hold speaker dependencies which are complementary with speech features. We demonstrate that the fusion of lip and speech information allows for a highly robust speaker verification system which outperforms the performance of either sub-system. We present a new technique for determining the weighting to be applied to each modality so as to optimize the performance of the fused system. Given a correct weighting, lip information is shown to be highly effective for reducing the false acceptance and false rejection error rates in the presence of background noise

Impact and interest:

12 citations in Scopus
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 45590
Item Type: Conference Paper
Refereed: No
Keywords: acoustic noise, audio-visual systems, feature extraction, gesture recognition, sensor fusion, speaker recognition, background noise, error rates, false acceptance, false rejection, features extraction, fusion, moving lips, performance, robust speaker verification, speech features, speech information, weighting
DOI: 10.1109/ICASSP.1999.757487
ISBN: 0780350413
ISSN: 1520-6149
Divisions: Past > QUT Faculties & Divisions > Faculty of Built Environment and Engineering
Past > Schools > School of Engineering Systems
Copyright Owner: Copyright 1999 IEEE
Copyright Statement: Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Deposited On: 17 Oct 2011 01:54
Last Modified: 17 Oct 2011 01:54

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page