Enhancing human action recognition with region proposals

Rezazadegan, Fahimeh, Shirazi, Sareh, Sunderhauf, Niko, Milford, Michael, & Upcroft, Ben (2015) Enhancing human action recognition with region proposals. In Australasian Conference on Robotics and Automation (ACRA2015), 2-4 December 2015, Australian National University, Canberra, A.C.T.

View at publisher (open access)


Deep convolutional network models have dominated recent work in human action recognition as well as image classification. However, these methods are often unduly influenced by the image background, learning and exploiting the presence of cues in typical computer vision datasets. For unbiased robotics applications, the degree of variation and novelty in action backgrounds is far greater than in computer vision datasets. To address this challenge, we propose an “action region proposal” method that, informed by optical flow, extracts image regions likely to contain actions for input into the network both during training and testing. In a range of experiments, we demonstrate that manually segmenting the background is not enough; but through active action region proposals during training and testing, state-of-the-art or better performance can be achieved on individual spatial and temporal video components. Finally, we show by focusing attention through action region proposals, we can further improve upon the existing state-of-the-art in spatio-temporally fused action recognition performance.

Impact and interest:

Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

78 since deposited on 16 Dec 2015
34 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 91267
Item Type: Conference Paper
Refereed: Yes
Additional URLs:
Keywords: action recognition, CNN, deep learning, optical flow, region proposal, human activity detection, temporal information, Neural networks
Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100) > Computer Vision (080104)
Australian and New Zealand Standard Research Classification > ENGINEERING (090000) > ELECTRICAL AND ELECTRONIC ENGINEERING (090600) > Control Systems Robotics and Automation (090602)
Divisions: Current > Research Centres > ARC Centre of Excellence for Robotic Vision
Current > QUT Faculties and Divisions > Science & Engineering Faculty
Copyright Owner: Copyright 2015 [Please consult with Author]
Deposited On: 16 Dec 2015 03:00
Last Modified: 22 Jun 2017 14:48

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page