期刊论文详细信息
Journal of Biometrics & Biostatistics | |
Audio-Visual Person Recognition Using Deep Convolutional Neural Networks | |
article | |
Sagar Vegad1  Harsh Patel1  Hanqi Zhuang2  Mehul Naik3  | |
[1]Department of Computer Science and Technology, Nirma University Ahmedabad | |
[2]Department of Computer and Electrical Engineering and Computer Science | |
[3]Department of Electronics Communication Engineering, Nirma University | |
关键词: CNN; Face recognition; Mel-spectrogram; Multi-modal; Speaker recognition; VGG16 model; | |
DOI : 10.4172/2155-6180.1000377 | |
来源: Hilaris Publisher | |
【 摘 要 】
Protection of data integrity and person identity has been an active research area for many years. Among the techniques investigated, developing multi-modal recognition systems using audio and face signals for people authentication holds a promising future due to its ease of use. A challenge in developing such a multi-modal recognition system is to improve its reliability for a practical application. In this paper, an efficient audio-visual bimodal recognition system which uses Deep Convolution Neural Networks (CNNs) as a primary model architecture. First, two separate Deep CNN models are trained with the help of audio and facial features, respectively. The outputs of these CNN models are then combined/fused to predict the identity of the subject. Implementation details with regard to data fusion are discussed in a great length in the paper. Through experimental verification, the proposed bimodal fusion approach is superior in accuracy performance when compared with any single modal recognition systems and with published results using the same data-set.【 授权许可】
Unknown
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202307140003941ZK.pdf | 1072KB | download |