学位论文详细信息
Machine learning approaches to improving mispronunciation detection on an imbalanced corpus
Imbalanced Learning;Sampling Methods;Pronunciation Error Detection;Spoken Language Assessment;Computer Assisted Language Learning
Yang, Xuesong ; Hasegawa-Johnson ; Mark A.
关键词: Imbalanced Learning;    Sampling Methods;    Pronunciation Error Detection;    Spoken Language Assessment;    Computer Assisted Language Learning;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/89050/YANG-THESIS-2015.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】
This thesis reports the investigations into the task of phone-level pronunciation error detection, the performance of which is heavily affected by the imbalanced distribution of the classes in a manually annotated data set of non-native English (Read Aloud responses from the TOEFL Junior Pilot assessment). In order to address problems caused by this extreme class imbalance, two machine learning approaches, cost-sensitive learning and over-sampling, are explored to improve the classification performance. Specifically, approaches which assigned weights inversely proportional to class frequencies and synthetic minority over-sampling technique (SMOTE) were applied to a range of classifiers using feature sets that included information about the acoustic signal, the linguistic properties of the utterance, and word identity. Empirical experiments demonstrate that both balancing approaches lead to a substantial performance improvement (in terms of f1 score) over the baseline on this extremely imbalanced data set. In addition, this thesis also discusses which features are the most important and which classifiers are most effective for the task of identifying phone-level pronunciation errors in non-native speech.
【 预 览 】
附件列表
Files Size Format View
Machine learning approaches to improving mispronunciation detection on an imbalanced corpus 743KB PDF download
  文献评价指标  
  下载次数:10次 浏览次数:42次