| ECTI Transactions on Computer and Information Technology | |
| Persons facial image synthesis from audio with Generative Adversarial Networks | |
| article | |
| Huzaifa Maniyar1  Suneeta V. Budihal1  Saroja V. Siddamal1  | |
| [1] KLE Technological University | |
| 关键词: Generative Adversarial Network(GAN); image synthesis; speech processing; Deep Learning; | |
| DOI : 10.37936/ecti-cit.2022162.246995 | |
| 学科分类:医学(综合) | |
| 来源: Electrical Engineering/Electronics, Computer, Communications and Information Technology Association | |
PDF
|
|
【 摘 要 】
This paper proposes to build a framework with Generative Adversarial Network (GANs) to synthesize a person's facial image from audio input. Image and speech are the two main sources of information exchange between two entities. In some data intensive applications, a large amount of audio has to be translated into an understandable image format, with automated system, without human interference. This paper provides an end-to-end model for intelligible image reconstruction from an audio signal. The model uses a GAN architecture, which generates image features using audio waveforms for image synthesis. The model was created to produce facial images from audio of individual identities of a synthesized image of the speakers, based on the training dataset. The images of labelled persons are generated using excitation signals and the method obtained results with an accuracy of 96.88% for ungrouped data and 93.91% for grouped data.
【 授权许可】
CC BY-NC-ND
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO202307090004792ZK.pdf | 1566KB |
PDF