期刊论文

【摘要】

The success of supervised learning techniques for automatic speech processing does not always extend to problems with limited annotated speech. Unsupervised representation learning aims at utilizing unlabelled data to learn a transformation that makes speech easily distinguishable for classification tasks, whereby deep auto-encoder variants have been most successful in finding such representations. This paper proposes a novel mechanism to incorporate geometric position of speech samples within the global structure of an unlabelled feature set. Regression to the geometric position is also added as an additional constraint for the representation learning auto-encoder. The representation learnt by the proposed model has been evaluated over a supervised classification task for limited vocabulary keyword spotting, with the proposed representation outperforming the commonly used cepstral features by about 9% in terms of classification accuracy, despite using a limited amount of labels during supervision. Furthermore, a small keyword dataset has been collected for Kadazan, an indigenous, low-resourced Southeast Asian language. Analysis for the Kadazan dataset also confirms the superiority of the proposed representation for limited annotation. The results are significant as they confirm that the proposed method can learn unsupervised speech representations effectively for classification tasks with scarce labelled data.

【授权许可】

Unknown

PeerJ Computer Science
Spatial position constraint for unsupervised learning of speech representations

Mohammad Ali Humayun¹ Hayati Yassin¹ Pg Emeroylariffion Abas¹
[1] Faculty of Integrated Technologies, Universiti Brunei Darussalam, Jalan Tungku Link, Brunei;
关键词: Low resource speech; Representation learning; Multitasking; Geometric constraint;
DOI : 10.7717/peerj-cs.650
来源: DOAJ


	文献评价指标
	下载次数：0次	浏览次数：3次

【 摘 要 】

【 授权许可】

【摘要】

【授权许可】