Supervised latent dirichlet allocation models for efficient activity representation

Umakanthan, Sabanadesan, Denman, Simon, Fookes, Clinton B., & Sridharan, Sridha (2014) Supervised latent dirichlet allocation models for efficient activity representation. In Bouzerdoum, Abdesselam, Wang, Lei, & Ogunbona, Philip (Eds.) Proceedings of the 2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA), IEEE, Wollongong, NSW, pp. 1-6.

View at publisher


Local spatio-temporal features with a Bag-of-visual words model is a popular approach used in human action recognition. Bag-of-features methods suffer from several challenges such as extracting appropriate appearance and motion features from videos, converting extracted features appropriate for classification and designing a suitable classification framework. In this paper we address the problem of efficiently representing the extracted features for classification to improve the overall performance. We introduce two generative supervised topic models, maximum entropy discrimination LDA (MedLDA) and class- specific simplex LDA (css-LDA), to encode the raw features suitable for discriminative SVM based classification. Unsupervised LDA models disconnect topic discovery from the classification task, hence yield poor results compared to the baseline Bag-of-words framework. On the other hand supervised LDA techniques learn the topic structure by considering the class labels and improve the recognition accuracy significantly. MedLDA maximizes likelihood and within class margins using max-margin techniques and yields a sparse highly discriminative topic structure; while in css-LDA separate class specific topics are learned instead of common set of topics across the entire dataset. In our representation first topics are learned and then each video is represented as a topic proportion vector, i.e. it can be comparable to a histogram of topics. Finally SVM classification is done on the learned topic proportion vector. We demonstrate the efficiency of the above two representation techniques through the experiments carried out in two popular datasets. Experimental results demonstrate significantly improved performance compared to the baseline Bag-of-features framework which uses kmeans to construct histogram of words from the feature vectors.

Impact and interest:

1 citations in Scopus
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 82220
Item Type: Conference Paper
Refereed: Yes
Keywords: Feature extraction, Gesture recognition, Image classification, Image representation, Statistical analysis, Support vector machines, Unsupervised learning
DOI: 10.1109/DICTA.2014.7008130
ISBN: 9781479954094
Divisions: Current > QUT Faculties and Divisions > Science & Engineering Faculty
Copyright Owner: Copyright 2014 by IEEE
Deposited On: 05 Mar 2015 23:57
Last Modified: 24 Jun 2017 06:03

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page