Likelihood-maximising frameworks for enhanced in-car speech recognition
Kleinschmidt, Tristan, Sridharan, Sridha, & Mason, Michael (2009) Likelihood-maximising frameworks for enhanced in-car speech recognition. In 4th Biennial Workshop on DSP for In-Vehicle Systems and Safety, 25-27 June, 2009, Dallas, TX, USA.
Abstract
Speech recognition in car environments has been identified as a valuable means for reducing driver distraction when operating non-critical in-car systems. Likelihood-maximising (LIMA) frameworks optimise speech enhancement algorithms based on recognised state sequences rather than traditional signal-level criteria such as maximising signal-to-noise ratio. Previously presented LIMA frameworks require calibration utterances to generate optimised enhancement parameters which are used for all subsequent utterances. Sub-optimal recognition performance occurs in noise conditions which are significantly different from that present during the calibration session - a serious problem in rapidly changing noise environments. We propose a dialog-based design which allows regular optimisation iterations in order to track the changing noise conditions. Experiments using Mel-filterbank spectral subtraction are performed to determine the optimisation requirements for vehicular environments and show that minimal optimisation assists real-time operation with improved speech recognition accuracy. It is also shown that the proposed design is able to provide improved recognition performance over frameworks incorporating a calibration session.
Citations:
Citation countsare sourced monthly from Scopus and Web of Science citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science generally from 1980 onwards.
Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.
Full-text downloads:
Full-text downloadsdisplays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.
| ID Code: | 26037 |
|---|---|
| Item Type: | Conference Paper |
| Keywords: | In-vehicle speech technology, Robust speech recognition, Speech enhancement, Optimisation, Dialog systems |
| Subjects: | Australian and New Zealand Standard Research Classification > ENGINEERING (090000) > ELECTRICAL AND ELECTRONIC ENGINEERING (090600) > Signal Processing (090609) |
| Divisions: | Past > QUT Faculties & Divisions > Faculty of Built Environment and Engineering Past > Institutes > Information Security Institute Past > Schools > School of Engineering Systems |
| Deposited On: | 02 Jul 2009 11:59 |
| Last Modified: | 09 Jun 2010 23:52 |
Export: EndNote | Dublin Core | BibTeX
Repository Staff Only: item control page