2019 2nd International Conference on Advanced Materials, Intelligent Manufacturing and Automation | |
I3D-LSTM: A New Model for Human Action Recognition | |
Wang, Xianyuan^1 ; Miao, Zhenjiang^1 ; Zhang, Ruyi^1 ; Hao, Shanshan^1 | |
School of Computer and Information Technology, Beijing Jiaotong University, Haidian District, Beijing | |
100044, China^1 | |
关键词: Action recognition; Convolution neural network; Human-action recognition; Optimal choice; Recurrent neural network (RNN); Research topics; Spatial-temporal features; Temporal features; | |
Others : https://iopscience.iop.org/article/10.1088/1757-899X/569/3/032035/pdf DOI : 10.1088/1757-899X/569/3/032035 |
|
来源: IOP | |
【 摘 要 】
Action recognition has already been a heated research topic recently, which attempts to classify different human actions in videos. The current main-stream methods generally utilize ImageNet-pretrained model as features extractor, however it's not the optimal choice to pretrain a model for classifying videos on a huge still image dataset. What's more, very few works notice that 3D convolution neural network(3D CNN) is better for low-level spatial-temporal features extraction while recurrent neural network(RNN) is better for modelling high-level temporal feature sequences. Consequently, a novel model is proposed in our work to address the two problems mentioned above. First, we pretrain 3D CNN model on huge video action recognition dataset Kinetics to improve generality of the model. And then long short term memory(LSTM) is introduced to model the high-level temporal features produced by the Kinetics-pretrained 3D CNN model. Our experiments results show that the Kinetics-pretrained model can generally outperform ImageNet-pretrained model. And our proposed network finally achieve leading performance on UCF-101 dataset.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
I3D-LSTM: A New Model for Human Action Recognition | 629KB | download |