Visual front-end wars : Viola-Jones face detector vs Fourier Lucas-Kanade

Kalantari, Shahram, Navarathna, Rajitha, Dean, David B., & Sridharan, Sridha (2013) Visual front-end wars : Viola-Jones face detector vs Fourier Lucas-Kanade. In Denis , Burnham & Jonas , Beskow (Eds.) International Conference on Auditory Visual Speech Processing 2013, 29 August - 1 September 2013, Ternélia resort Le Pré du Lac, Annecy, France.


The performance of visual speech recognition (VSR) systems are significantly influenced by the accuracy of the visual front-end. The current state-of-the-art VSR systems use off-the-shelf face detectors such as Viola- Jones (VJ) which has limited reliability for changes in illumination and head poses. For a VSR system to perform well under these conditions, an accurate visual front end is required. This is an important problem to be solved in many practical implementations of audio visual speech recognition systems, for example in automotive environments for an efficient human-vehicle computer interface. In this paper, we re-examine the current state-of-the-art VSR by comparing off-the-shelf face detectors with the recently developed Fourier Lucas-Kanade (FLK) image alignment technique. A variety of image alignment and visual speech recognition experiments are performed on a clean dataset as well as with a challenging automotive audio-visual speech dataset. Our results indicate that the FLK image alignment technique can significantly outperform off-the shelf face detectors, but requires frequent fine-tuning.

Impact and interest:

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

80 since deposited on 22 Sep 2013
10 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 62749
Item Type: Conference Paper
Refereed: Yes
Additional URLs:
Keywords: Visual Front-ends, Viola-Jones, Fourier Lucas-Kanade, Visual Speech Recognition
Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000)
Divisions: Current > Schools > School of Electrical Engineering & Computer Science
Current > QUT Faculties and Divisions > Science & Engineering Faculty
Copyright Owner: Copyright 2013 [please consult the author]
Deposited On: 22 Sep 2013 23:32
Last Modified: 24 Sep 2013 05:19

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page