期刊论文详细信息
EURASIP Journal on Advances in Signal Processing
Free resources for forced phonetic alignment in Brazilian Portuguese based on Kaldi toolkit
Cassio Batista1  Nelson Neto1  Ana Larissa Dias1 
[1] Computer Science Graduate Program, FalaBrasil Group, Federal University of Pará, Rua Augusto Corrêa, 1, 66075–110, Belém, Brazil;
关键词: Forced aligner;    Phonetic alignment;    Speech segmentation;    Acoustic modeling;    Kaldi;    Brazilian Portuguese;   
DOI  :  10.1186/s13634-022-00844-9
来源: Springer
PDF
【 摘 要 】

Phonetic analysis of speech, in general, requires the alignment of audio samples to its phonetic transcription. This could be done manually for a couple of files, but as the corpus grows large, it becomes infeasibly time-consuming. This paper describes the evolution process toward creating free resources for phonetic alignment in Brazilian Portuguese (BP) using Kaldi, a toolkit that achieves state of the art for open-source speech recognition, within a toolkit we call UFPAlign. The contributions of this work are then twofold: developing resources to perform forced alignment in BP, including the release of scripts to train acoustic models via Kaldi, as well as the resources themselves under open licenses; and bringing forth a comparison to other two phonetic aligners that provide resources for BP, namely EasyAlign and Montreal Forced Aligner (MFA), the latter being also Kaldi-based. Evaluation took place in terms of phone boundary and intersection over union metrics over a dataset of 385 hand-aligned utterances, and results show that Kaldi-based aligners perform better overall, and that UFPAlign models are more accurate than MFA’s. Furthermore, complex deep-learning-based approaches still do not improve performance compared to simpler models.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202202185241508ZK.pdf 3125KB PDF download
  文献评价指标  
  下载次数:0次 浏览次数:2次