期刊论文

【摘要】

Protection of data integrity and person identity has been an active research area for many years. Among the techniques investigated, developing multi-modal recognition systems using audio and face signals for people authentication holds a promising future due to its ease of use. A challenge in developing such a multi-modal recognition system is to improve its reliability for a practical application. In this paper, an efficient audio-visual bimodal recognition system which uses Deep Convolution Neural Networks (CNNs) as a primary model architecture. First, two separate Deep CNN models are trained with the help of audio and facial features, respectively. The outputs of these CNN models are then combined/fused to predict the identity of the subject. Implementation details with regard to data fusion are discussed in a great length in the paper. Through experimental verification, the proposed bimodal fusion approach is superior in accuracy performance when compared with any single modal recognition systems and with published results using the same data-set.

【授权许可】

Unknown

【预览】

附件列表
Files	Size	Format	View
RO202307140003941ZK.pdf	1072KB	PDF	download

Journal of Biometrics & Biostatistics
Audio-Visual Person Recognition Using Deep Convolutional Neural Networks
article
Sagar Vegad¹ Harsh Patel¹ Hanqi Zhuang² Mehul Naik³
[1] Department of Computer Science and Technology, Nirma University Ahmedabad;Department of Computer and Electrical Engineering and Computer Science;Department of Electronics Communication Engineering, Nirma University
关键词: CNN; Face recognition; Mel-spectrogram; Multi-modal; Speaker recognition; VGG16 model;
DOI : 10.4172/2155-6180.1000377
来源: Hilaris Publisher
PDF


	文献评价指标
	下载次数：7次	浏览次数：1次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】