Improved phonetic and lexical speaker recognition through MAP adaptation

Vogt, Robert J., Baker, Brendan J., & Sridharan, Sridha (2004) Improved phonetic and lexical speaker recognition through MAP adaptation. In Odyssey: The Speaker and Language Workshop, 31 May - 3 June 2004, Toledo, Spain.


High level features such as phone and word n-grams have been shown to be effective for speaker recognition, particularly when used along side traditional acoustic speaker recognition techniques. The applicability of these high-level recognition systems is impeded by the large training data requirements needed to build robust and stable speaker models. This paper describes an extension to an existing phone n-gram based speaker recognition technique, whereby MAP adaptation is used in the speaker model training process. Results obtained for the NIST 2003 Speaker Recognition Extended Data Task indicate that a significant improvement in performance can be gained through the use of this model estimation technique. In our tests, we were able to improve performance over the baseline system, and at the same time, halved the training data requirement. Further experimentation using MAP adaptation on word n-gram models also showed improvement over baseline results, suggesting that the technique could be applied to other multinomial distribution feature sets.

Impact and interest:

Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

214 since deposited on 06 Nov 2008
12 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 15494
Item Type: Conference Paper
Refereed: Yes
Additional URLs:
Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100) > Natural Language Processing (080107)
Divisions: Past > QUT Faculties & Divisions > Faculty of Built Environment and Engineering
Past > Institutes > Information Security Institute
Copyright Owner: Copyright 2004 International Speech Communication Association (ISCA)
Copyright Statement: Reproduced in accordance with the copyright policy of the publisher.
Deposited On: 06 Nov 2008 00:00
Last Modified: 09 Jun 2010 13:06

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page