A Likelihood-Maximizing Framework for Enhanced In-Car Speech Recognition Based on Speech Dialog System Interaction
Kleinschmidt, Tristan, Sridharan, Sridha, & Mason, Michael (2011) A Likelihood-Maximizing Framework for Enhanced In-Car Speech Recognition Based on Speech Dialog System Interaction. In Digital Signal Processing for In-Vehicle Systems and Safety. Springer Science+Business Media, Germany, pp. 159-174.
Speech recognition in car environments has been identified as a valuable means for reducing driver distraction when operating noncritical in-car systems. Under such conditions, however, speech recognition accuracy degrades significantly, and techniques such as speech enhancement are required to improve these accuracies. Likelihood-maximizing (LIMA) frameworks optimize speech enhancement algorithms based on recognized state sequences rather than traditional signal-level criteria such as maximizing signal-to-noise ratio. LIMA frameworks typically require calibration utterances to generate optimized enhancement parameters that are used for all subsequent utterances. Under such a scheme, suboptimal recognition performance occurs in noise conditions that are significantly different from that present during the calibration session – a serious problem in rapidly changing noise environments out on the open road. In this chapter, we propose a dialog-based design that allows regular optimization iterations in order to track the ever-changing noise conditions. Experiments using Mel-filterbank noise subtraction (MFNS) are performed to determine the optimization requirements for vehicular environments and show that minimal optimization is required to improve speech recognition, avoid over-optimization, and ultimately assist with semireal-time operation. It is also shown that the proposed design is able to provide improved recognition performance over frameworks incorporating a calibration session only.
Impact and interest:
Citation counts are sourced monthly from and citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
|Item Type:||Book Chapter|
|Keywords:||Automatic speech recognition (ASR) ; In-car speech recognition, Mel-filterbank noise subtraction (MFNS)|
|Subjects:||Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100)|
|Divisions:||Current > Schools > School of Electrical Engineering & Computer Science
Current > QUT Faculties and Divisions > Science & Engineering Faculty
|Deposited On:||12 Jan 2015 04:04|
|Last Modified:||29 Oct 2015 16:36|
Repository Staff Only: item control page