Gaussian mixture modelling of broad phonetic and syllabic events for text-independent speaker verification
Baker, Brendan J., Vogt, Robert J., & Sridharan, Sridha (2005) Gaussian mixture modelling of broad phonetic and syllabic events for text-independent speaker verification. In Eurospeech/Interspeech : Proceedings of the 9th European Conference on Speech Communication and Technology 2005, 4-8 September 2005, Lisbon, Portugal.
This paper examines the usefulness of a multilingual broad syllable-based framework for text-independent speaker verification. Syllabic segmentation is used in order to obtain a convenient unit for constrained and more detailed model generation. Gaussian mixture models are chosen as a suitable modelling paradigm for initial testing of the framework. Promising results are presented for the NIST 2003 speaker recognition evaluation corpus. The syllable-based modelling technique is shown to outperform a state-of-the-art baseline GMM system. A simple selective reduction of the syllable set is also shown to give further improvement in performance. Overall, the syllable based framework presents itself as valid alternative to text-constrained speaker verification systems, with the advantage of being multilingual. The framework allows for future testing of alternative modelling paradigms, feature sets and qualitative analysis.
Impact and interest:
Citation counts are sourced monthly from and citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.
|Item Type:||Conference Paper|
|Subjects:||Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100) > Natural Language Processing (080107)|
|Divisions:||Past > QUT Faculties & Divisions > Faculty of Built Environment and Engineering
Past > Institutes > Information Security Institute
Past > Schools > School of Engineering Systems
|Copyright Owner:||Copyright 2005 International Speech Communication Association (ISCA)|
|Copyright Statement:||Reproduced in accordance with the copyright policy of the publisher.|
|Deposited On:||06 Nov 2008|
|Last Modified:||29 Feb 2012 13:13|
Repository Staff Only: item control page