Acoustic adaptation in cross database audio visual SHMM training for phonetic spoken term detection

Kalantari, Shahram, Dean, David B., Sridharan, Sridha, Ghaemmaghami, Houman, & Fookes, Clinton B. (2015) Acoustic adaptation in cross database audio visual SHMM training for phonetic spoken term detection. In Proceedings of the Third Edition Workshop on Speech, Language and Audio in Multimedia, Association for Computing Machinery, Brisbane, Qld, pp. 11-14.

View at publisher


Visual information in the form of lip movements of the speaker has been shown to improve the performance of speech recognition and search applications. In our previous work, we proposed cross database training of synchronous hidden Markov models (SHMMs) to make use of external large and publicly available audio databases in addition to the relatively small given audio visual database. In this work, the cross database training approach is improved by performing an additional audio adaptation step, which enables audio visual SHMMs to benefit from audio observations of the external audio models before adding visual modality to them. The proposed approach outperforms the baseline cross database training approach in clean and noisy environments in terms of phone recognition accuracy as well as spoken term detection (STD) accuracy.

Impact and interest:

0 citations in Scopus
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 86033
Item Type: Conference Paper
Refereed: Yes
Additional URLs:
Keywords: audio visual spoken term detection, cross database training
DOI: 10.1145/2802558.2814648
ISBN: 978-1-4503-3749-6
Divisions: Current > Schools > School of Electrical Engineering & Computer Science
Current > QUT Faculties and Divisions > Science & Engineering Faculty
Current > Research Centres > Smart Services CRC
Copyright Owner: Copyright 2015 ACM
Deposited On: 15 Sep 2015 23:49
Last Modified: 16 Dec 2015 01:43

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page