QUT ePrints

Multiple cameras for audio-visual speech recognition in an automotive environment

Navarathna, Rajitha, Dean, David B., Sridharan, Sridha, & Lucey, Patrick J. (2012) Multiple cameras for audio-visual speech recognition in an automotive environment. Computer Speech and Language.

View at publisher

Abstract

Audio-visualspeechrecognition, or the combination of visual lip-reading with traditional acoustic speechrecognition, has been previously shown to provide a considerable improvement over acoustic-only approaches in noisy environments, such as that present in an automotive cabin. The research presented in this paper will extend upon the established audio-visualspeechrecognition literature to show that further improvements in speechrecognition accuracy can be obtained when multiple frontal or near-frontal views of a speaker's face are available. A series of visualspeechrecognition experiments using a four-stream visual synchronous hidden Markov model (SHMM) are conducted on the four-camera AVICAR automotiveaudio-visualspeech database. We study the relative contribution between the side and central orientated cameras in improving visualspeechrecognition accuracy. Finally combination of the four visual streams with a single audio stream in a five-stream SHMM demonstrates a relative improvement of over 56% in word recognition accuracy when compared to the acoustic-only approach in the noisiest conditions of the AVICAR database.

Impact and interest:

1 citations in Scopus
Search Google Scholar™
1 citations in Web of Science®

Citation countsare sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 52968
Item Type: Journal Article
Additional Information: Available online as "In Press, Corrected Proof " version
Keywords: AVASR, AVICAR database, Speech recognition, Multi-stream HMM, Automotive environment
DOI: 10.1016/j.csl.2012.07.005
ISSN: 0885-2308
Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100) > Computer Vision (080104)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100) > Pattern Recognition and Data Mining (080109)
Australian and New Zealand Standard Research Classification > ENGINEERING (090000) > ELECTRICAL AND ELECTRONIC ENGINEERING (090600) > Signal Processing (090609)
Divisions: Current > Schools > School of Earth, Environmental & Biological Sciences
Current > Schools > School of Electrical Engineering & Computer Science
Current > QUT Faculties and Divisions > Science & Engineering Faculty
Deposited On: 08 Aug 2012 10:26
Last Modified: 10 Aug 2012 09:13

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page