期刊论文

【摘要】

Temporal information plays a significant role in video-based human action recognition. How to effectively extract the spatialâtemporal characteristics of actions in videos has always been a challenging problem. Most existing methods acquire spatial and temporal cues in videos individually. In this article, we propose a new effective representation for depth video sequences, called hierarchical dynamic depth projected difference images that can aggregate the action spatial and temporal information simultaneously at different temporal scales. We firstly project depth video sequences onto three orthogonal Cartesian views to capture the 3D shape and motion information of human actions. Hierarchical dynamic depth projected difference images are constructed with the rank pooling in each projected view to hierarchically encode the spatialâtemporal motion dynamics in depth videos. Convolutional neural networks can automatically learn discriminative features from images and have been extended to video classification because of their superior performance. To verify the effectiveness of hierarchical dynamic depth projected difference images representation, we construct a hierarchical dynamic depth projected difference imagesâbased action recognition framework where hierarchical dynamic depth projected difference images in three views are fed into three identical pretrained convolutional neural networks independently for finely retuning. We design three classification schemes in the framework and different schemes utilize different convolutional neural network layers to compare their effects on action recognition. Three views are combined to describe the actions more comprehensively in each classification scheme. The proposed framework is evaluated on three challenging public human action data sets. Experiments indicate that our method has better performance and can provide discriminative spatialâtemporal information for human action recognition in depth videos.

【授权许可】

CC BY

【预览】

附件列表
Files	Size	Format	View
RO201910251300002ZK.pdf	2156KB	PDF	download

International Journal of Advanced Robotic Systems
Hierarchical dynamic depth projected difference imagesâbased action recognition in videos with convolutional neural networks

HanboWu¹
关键词: Human action recognition; depth videos; rank pooling; dynamic images; CNN;
DOI : 10.1177/1729881418825093
学科分类：自动化工程
来源: InTech
PDF


	文献评价指标
	下载次数：10次	浏览次数：3次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】