期刊论文

【摘要】

Phonetic analysis of speech, in general, requires the alignment of audio samples to its phonetic transcription. This could be done manually for a couple of files, but as the corpus grows large, it becomes infeasibly time-consuming. This paper describes the evolution process toward creating free resources for phonetic alignment in Brazilian Portuguese (BP) using Kaldi, a toolkit that achieves state of the art for open-source speech recognition, within a toolkit we call UFPAlign. The contributions of this work are then twofold: developing resources to perform forced alignment in BP, including the release of scripts to train acoustic models via Kaldi, as well as the resources themselves under open licenses; and bringing forth a comparison to other two phonetic aligners that provide resources for BP, namely EasyAlign and Montreal Forced Aligner (MFA), the latter being also Kaldi-based. Evaluation took place in terms of phone boundary and intersection over union metrics over a dataset of 385 hand-aligned utterances, and results show that Kaldi-based aligners perform better overall, and that UFPAlign models are more accurate than MFA’s. Furthermore, complex deep-learning-based approaches still do not improve performance compared to simpler models.

【授权许可】

CC BY

【预览】

附件列表
Files	Size	Format	View
RO202202185241508ZK.pdf	3125KB	PDF	download

EURASIP Journal on Advances in Signal Processing
Free resources for forced phonetic alignment in Brazilian Portuguese based on Kaldi toolkit

Cassio Batista¹ Nelson Neto¹ Ana Larissa Dias¹
[1] Computer Science Graduate Program, FalaBrasil Group, Federal University of Pará, Rua Augusto Corrêa, 1, 66075–110, Belém, Brazil;
关键词: Forced aligner; Phonetic alignment; Speech segmentation; Acoustic modeling; Kaldi; Brazilian Portuguese;
DOI : 10.1186/s13634-022-00844-9
来源: Springer
PDF


	文献评价指标
	下载次数：0次	浏览次数：2次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】