6th conference on Advances in Optoelectronics and Micro/nano-optics | |
Violent Interaction Detection in Video Based on Deep Learning | |
Zhou, Peipei^1,2,3,4 ; Ding, Qinghai^1,5 ; Luo, Haibo^1,3,4 ; Hou, Xinglin^1,2,3,4 | |
Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang | |
110016, China^1 | |
University of Chinese Academy of Sciences, Beijing | |
100049, China^2 | |
Key Laboratory of Opto-Electronic Information Processing CAS, Shenyang | |
110016, China^3 | |
The Key Lab of Image Understanding and Computer Vision, Liaoning Province, Shenyang | |
110016, China^4 | |
Space Star Technology Co. LTD, Beijing | |
100086, China^5 | |
关键词: Acceleration fields; Activity recognition; Convolutional networks; Interaction detection; Statistic feature; Temporal networks; Video surveillance; Vision-based methods; | |
Others : https://iopscience.iop.org/article/10.1088/1742-6596/844/1/012044/pdf DOI : 10.1088/1742-6596/844/1/012044 |
|
来源: IOP | |
【 摘 要 】
Violent interaction detection is of vital importance in some video surveillance scenarios like railway stations, prisons or psychiatric centres. Existing vision-based methods are mainly based on hand-crafted features such as statistic features between motion regions, leading to a poor adaptability to another dataset. En lightened by the development of convolutional networks on common activity recognition, we construct a FightNet to represent the complicated visual violence interaction. In this paper, a new input modality, image acceleration field is proposed to better extract the motion attributes. Firstly, each video is framed as RGB images. Secondly, optical flow field is computed using the consecutive frames and acceleration field is obtained according to the optical flow field. Thirdly, the FightNet is trained with three kinds of input modalities, i.e., RGB images for spatial networks, optical flow images and acceleration images for temporal networks. By fusing results from different inputs, we conclude whether a video tells a violent event or not. To provide researchers a common ground for comparison, we have collected a violent interaction dataset (VID), containing 2314 videos with 1077 fight ones and 1237 no-fight ones. By comparison with other algorithms, experimental results demonstrate that the proposed model for violent interaction detection shows higher accuracy and better robustness.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
Violent Interaction Detection in Video Based on Deep Learning | 560KB | download |