期刊论文详细信息
Applied Sciences
A Pronunciation Prior Assisted Vowel Reduction Detection Framework with Multi-Stream Attention Method
Pengyuan Zhang1  Li Wang1  Zongming Liu1  Zhihua Huang2 
[1] Key Laboratory of Speech Acoustics & Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China;
关键词: CALL;    CAPT;    second language acquisition;    vowel reduction detection;    end-to-end ASR;   
DOI  :  10.3390/app11188321
来源: DOAJ
【 摘 要 】

Vowel reduction is a common pronunciation phenomenon in stress-timed languages like English. Native speakers tend to weaken unstressed vowels into a schwa-like sound. It is an essential factor that makes the accent of language learners sound unnatural. To improve vowel reduction detection in a phoneme recognition framework, we propose an end-to-end vowel reduction detection method that introduces pronunciation prior knowledge as auxiliary information. In particular, we have designed two methods for automatically generating pronunciation prior sequences from reference texts and have implemented a main and auxiliary encoder structure that uses hierarchical attention mechanisms to utilize the pronunciation prior information and acoustic information dynamically. In addition, we also propose a method to realize the feature enhancement after encoding by using the attention mechanism between different streams to obtain expanded multi-streams. Compared with the HMM-DNN hybrid method and the general end-to-end method, the average F1 score of our approach for the two types of vowel reduction detection increased by 8.8% and 6.9%, respectively. The overall phoneme recognition rate increased by 5.8% and 5.0%, respectively. The experimental part further analyzes why the pronunciation prior knowledge auxiliary input is effective and the impact of different pronunciation prior knowledge types on performance.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:1次