Visual voice activity detection using frontal versus profile views
Navarathna, Rajitha, Dean, David B., Sridharan, Sridha, Fookes, Clinton B., & Lucey, Patrick J. (2011) Visual voice activity detection using frontal versus profile views. In The International Conference on Digital Image Computing : Techniques and Applications (DICTA2011), 6-8 December 2011, Sheraton Noosa Resort & Spa, Noosa, QLD.
Visual activity detection of lip movements can be used to overcome the poor performance of voice activity detection based solely in the audio domain, particularly in noisy acoustic conditions. However, most of the research conducted in visual voice activity detection (VVAD) has neglected addressing variabilities in the visual domain such as viewpoint variation. In this paper we investigate the effectiveness of the visual information from the speaker’s frontal and profile views (i.e left and right side views) for the task of VVAD. As far as we are aware, our work constitutes the first real attempt to study this problem.
We describe our visual front end approach and the Gaussian mixture model (GMM) based VVAD framework, and report the experimental results using the freely available CUAVE database.
The experimental results show that VVAD is indeed possible from profile views and we give a quantitative comparison of VVAD based on frontal and profile views The results presented are useful in the development of multi-modal Human Machine Interaction (HMI) using a single camera, where the speaker’s face may not always be frontal.
Impact and interest:
Citation counts are sourced monthly from and citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.
|Item Type:||Conference Paper|
|Keywords:||lip movements, voice detection|
|Subjects:||Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100)|
|Divisions:||Past > QUT Faculties & Divisions > Faculty of Built Environment and Engineering
Past > Institutes > Information Security Institute
Past > Schools > School of Engineering Systems
|Copyright Owner:||Copyright 2011 [please consult the author]|
|Deposited On:||17 Oct 2011 23:40|
|Last Modified:||19 Oct 2011 09:49|
Repository Staff Only: item control page