QUT ePrints

Patch-Based Analysis of Visual Speech From Multiple Views

Lucey, Patrick J., Potamianos, Gerasimons, & Sridharan, Sridha (2008) Patch-Based Analysis of Visual Speech From Multiple Views. In Goecke, Roland, Lucey, Patrick J., & Lucey, Simon (Eds.) International Conference on Auditory-Visual Speech Processing, 26-29 September, Tangalooma, Australia.

Abstract

Obtaining a robust feature representation of visual speech is of crucial importance in the design of audio-visual automatic speech recognition systems. In the literature, when visual appearance based features are employed for this purpose, they are typically extracted using a "holistic" approach. Namely, a transformation of the pixel values of the entire region-of-interest (ROI) is obtained, with the ROI covering the speaker's mouth and often surrounding facial area. In this paper, we instead consider a "patch" based visual feature extraction approach, within the appearance based framework. In particular, we conduct a novel analysis to determine which areas (patches) of the mouth ROI are the most informative for visual speech. Furthermore, we extend this analysis beyond the traditional frontal views, by investigating profile views as well. Not surprisingly, and for both frontal and profile views, we conclude that the central mouth patches are the most informative, but less so than the holistic features of the entire ROI. Nevertheless, fusion of holistic and the best patch based features further improves visual speech recognition performance, compared to either feature set alone. Finally, we discuss scenarios where the patch based approach may be preferable to holistic features.

Impact and interest:

Citation countsare sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

88 since deposited on 20 Oct 2008
17 in the past twelve months

Full-text downloadsdisplays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 15247
Item Type: Conference Paper
Additional URLs:
ISBN: 9780646495033
Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100) > Image Processing (080106)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100) > Natural Language Processing (080107)
Divisions: Past > QUT Faculties & Divisions > Faculty of Built Environment and Engineering
Past > Institutes > Information Security Institute
Copyright Owner: Copyright 2008 AVISA (the Auditory-VIsual Speech Association)
Deposited On: 20 Oct 2008
Last Modified: 29 Feb 2012 23:46

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page