Spatio temporal feature evaluation for action recognition

Umakanthan, Sabanadesan, Denman, Simon, Sridharan, Sridha, Fookes, Clinton B., & Wark, Tim (2012) Spatio temporal feature evaluation for action recognition. In Proceedings of The 2012 International Conference on Digital Image Computing Techniques and Applications (DICTA 12), IEEE, Fremantle, Western Australia, pp. 1-8.

View at publisher


Spatio-Temporal interest points are the most popular feature representation in the field of action recognition. A variety of methods have been proposed to detect and describe local patches in video with several techniques reporting state of the art performance for action recognition. However, the reported results are obtained under different experimental settings with different datasets, making it difficult to compare the various approaches. As a result of this, we seek to comprehensively evaluate state of the art spatio- temporal features under a common evaluation framework with popular benchmark datasets (KTH, Weizmann) and more challenging datasets such as Hollywood2. The purpose of this work is to provide guidance for researchers, when selecting features for different applications with different environmental conditions. In this work we evaluate four popular descriptors (HOG, HOF, HOG/HOF, HOG3D) using a popular bag of visual features representation, and Support Vector Machines (SVM)for classification. Moreover, we provide an in-depth analysis of local feature descriptors and optimize the codebook sizes for different datasets with different descriptors. In this paper, we demonstrate that motion based features offer better performance than those that rely solely on spatial information, while features that combine both types of data are more consistent across a variety of conditions, but typically require a larger codebook for optimal performance.

Impact and interest:

7 citations in Scopus
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 56607
Item Type: Conference Paper
Refereed: Yes
Keywords: Detectors, Feature extraction, Histograms, Humans, Support vector machines, Training, Vocabulary
DOI: 10.1109/DICTA.2012.6411720
ISBN: 9781467321815
Divisions: Current > Schools > School of Electrical Engineering & Computer Science
Current > QUT Faculties and Divisions > Science & Engineering Faculty
Copyright Owner: Copyright 2012 IEEE Inc. All rights reserved.
Copyright Statement: Copyright and Reprint Permissions
Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limit of U.S. copyright law for private use of patrons those articles in this volume that carry a code at the bottom of the first page, provided the per-copy fee indicated in the code is paid through Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.
Deposited On: 22 Jan 2013 22:32
Last Modified: 12 Jun 2013 15:28

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page