学位论文

【摘要】

This thesis reports the investigations into the task of phone-level pronunciation error detection, the performance of which is heavily affected by the imbalanced distribution of the classes in a manually annotated data set of non-native English (Read Aloud responses from the TOEFL Junior Pilot assessment). In order to address problems caused by this extreme class imbalance, two machine learning approaches, cost-sensitive learning and over-sampling, are explored to improve the classification performance. Specifically, approaches which assigned weights inversely proportional to class frequencies and synthetic minority over-sampling technique (SMOTE) were applied to a range of classifiers using feature sets that included information about the acoustic signal, the linguistic properties of the utterance, and word identity. Empirical experiments demonstrate that both balancing approaches lead to a substantial performance improvement (in terms of f1 score) over the baseline on this extremely imbalanced data set. In addition, this thesis also discusses which features are the most important and which classifiers are most effective for the task of identifying phone-level pronunciation errors in non-native speech.

【预览】

附件列表
Files	Size	Format	View
Machine learning approaches to improving mispronunciation detection on an imbalanced corpus	743KB	PDF	download


Machine learning approaches to improving mispronunciation detection on an imbalanced corpus
Imbalanced Learning;Sampling Methods;Pronunciation Error Detection;Spoken Language Assessment;Computer Assisted Language Learning
Yang, Xuesong ; Hasegawa-Johnson ; Mark A.
关键词: Imbalanced Learning; Sampling Methods; Pronunciation Error Detection; Spoken Language Assessment; Computer Assisted Language Learning;
Others : https://www.ideals.illinois.edu/bitstream/handle/2142/89050/YANG-THESIS-2015.pdf?sequence=1&isAllowed=y
美国\|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF


	文献评价指标
	下载次数：10次	浏览次数：42次

【 摘 要 】

【 预 览 】

【摘要】

【预览】