Supervised latent dirichlet allocation models for efficient activity representation
Umakanthan, Sabanadesan, Denman, Simon, Fookes, Clinton B., & Sridharan, Sridha (2014) Supervised latent dirichlet allocation models for efficient activity representation. In Bouzerdoum, Abdesselam, Wang, Lei, & Ogunbona, Philip (Eds.) Proceedings of the 2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA), IEEE, Wollongong, NSW, pp. 1-6.
Local spatio-temporal features with a Bag-of-visual words model is a popular approach used in human action recognition. Bag-of-features methods suffer from several challenges such as extracting appropriate appearance and motion features from videos, converting extracted features appropriate for classification and designing a suitable classification framework. In this paper we address the problem of efficiently representing the extracted features for classification to improve the overall performance. We introduce two generative supervised topic models, maximum entropy discrimination LDA (MedLDA) and class- specific simplex LDA (css-LDA), to encode the raw features suitable for discriminative SVM based classification. Unsupervised LDA models disconnect topic discovery from the classification task, hence yield poor results compared to the baseline Bag-of-words framework. On the other hand supervised LDA techniques learn the topic structure by considering the class labels and improve the recognition accuracy significantly. MedLDA maximizes likelihood and within class margins using max-margin techniques and yields a sparse highly discriminative topic structure; while in css-LDA separate class specific topics are learned instead of common set of topics across the entire dataset. In our representation first topics are learned and then each video is represented as a topic proportion vector, i.e. it can be comparable to a histogram of topics. Finally SVM classification is done on the learned topic proportion vector. We demonstrate the efficiency of the above two representation techniques through the experiments carried out in two popular datasets. Experimental results demonstrate significantly improved performance compared to the baseline Bag-of-features framework which uses kmeans to construct histogram of words from the feature vectors.
Impact and interest:
Citation counts are sourced monthly from and citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
|Item Type:||Conference Paper|
|Keywords:||Feature extraction, Gesture recognition, Image classification, Image representation, Statistical analysis, Support vector machines, Unsupervised learning|
|Divisions:||Current > Schools > School of Electrical Engineering & Computer Science
Current > QUT Faculties and Divisions > Science & Engineering Faculty
|Copyright Owner:||Copyright 2014 by IEEE|
|Deposited On:||05 Mar 2015 23:57|
|Last Modified:||08 Mar 2015 22:34|
Repository Staff Only: item control page