期刊论文详细信息
Journal of Biometrics & Biostatistics
Audio-Visual Person Recognition Using Deep Convolutional Neural Networks
article
Sagar Vegad1  Harsh Patel1  Hanqi Zhuang2  Mehul Naik3 
[1]Department of Computer Science and Technology, Nirma University Ahmedabad
[2]Department of Computer and Electrical Engineering and Computer Science
[3]Department of Electronics Communication Engineering, Nirma University
关键词: CNN;    Face recognition;    Mel-spectrogram;    Multi-modal;    Speaker recognition;    VGG16 model;   
DOI  :  10.4172/2155-6180.1000377
来源: Hilaris Publisher
PDF
【 摘 要 】
Protection of data integrity and person identity has been an active research area for many years. Among the techniques investigated, developing multi-modal recognition systems using audio and face signals for people authentication holds a promising future due to its ease of use. A challenge in developing such a multi-modal recognition system is to improve its reliability for a practical application. In this paper, an efficient audio-visual bimodal recognition system which uses Deep Convolution Neural Networks (CNNs) as a primary model architecture. First, two separate Deep CNN models are trained with the help of audio and facial features, respectively. The outputs of these CNN models are then combined/fused to predict the identity of the subject. Implementation details with regard to data fusion are discussed in a great length in the paper. Through experimental verification, the proposed bimodal fusion approach is superior in accuracy performance when compared with any single modal recognition systems and with published results using the same data-set.
【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO202307140003941ZK.pdf 1072KB PDF download
  文献评价指标  
  下载次数:0次 浏览次数:0次