期刊论文详细信息
ECTI Transactions on Computer and Information Technology
Persons facial image synthesis from audio with Generative Adversarial Networks
article
Huzaifa Maniyar1  Suneeta V. Budihal1  Saroja V. Siddamal1 
[1] KLE Technological University
关键词: Generative Adversarial Network(GAN);    image synthesis;    speech processing;    Deep Learning;   
DOI  :  10.37936/ecti-cit.2022162.246995
学科分类:医学(综合)
来源: Electrical Engineering/Electronics, Computer, Communications and Information Technology Association
PDF
【 摘 要 】

This paper proposes to build a framework with Generative Adversarial Network (GANs) to synthesize a person's facial image from audio input. Image and speech are the two main sources of information exchange between two entities. In some data intensive applications, a large amount of audio has to be translated into an understandable image format, with automated system, without human interference. This paper provides an end-to-end model for intelligible image reconstruction from an audio signal. The model uses a GAN architecture, which generates image features using audio waveforms for image synthesis. The model was created to produce facial images from audio of individual identities of a synthesized image of the speakers, based on the training dataset. The images of labelled persons are generated using excitation signals and the method obtained results with an accuracy of 96.88% for ungrouped data and 93.91% for grouped data.

【 授权许可】

CC BY-NC-ND   

【 预 览 】
附件列表
Files Size Format View
RO202307090004792ZK.pdf 1566KB PDF download
  文献评价指标  
  下载次数:14次 浏览次数:4次