学位论文

【摘要】

We present a method to automatically generate syntactic analogy datasets for the evaluation of word representations in an unsupervised manner. The automatic generation also allows for customization in terms of word-frequencies, syntactic rules, part-of-speech tags and size of the dataset. We show the ability of our method to generate cross-lingual analogy task datasets for languages other than English, where evaluation datasets are limited if not nonexistent, by constructing datasets for French, German, Spanish, Arabic and Hebrew. Our method clusters pairs of words into morphological rules in an unsupervised manner, using which we generate analogy questions for different rules. We show the quality of an automatically generated dataset by checking the correlation of the performance of different word representations on it with the performance of the same representations on the Google analogy dataset. The values exhibited a high correlation of 95%. Moreover, we showcase the benefits of customization through studying the performance of different word representations when varying the frequency of words in the dataset.

【预览】

附件列表
Files	Size	Format	View
Automatic generation of tunable analogy benchmarks for word representations	404KB	PDF	download


Automatic generation of tunable analogy benchmarks for word representations
Natural Language Processing (NLP);word representations;evaluation;semantics;morphology
Sakakini, Tarek J ; Viswanath ; Pramod ; Bhat ; Suma
关键词: Natural Language Processing (NLP); word representations; evaluation; semantics; morphology;
Others : https://www.ideals.illinois.edu/bitstream/handle/2142/92859/SAKAKINI-THESIS-2016.pdf?sequence=1&isAllowed=y
美国\|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF


	文献评价指标
	下载次数：14次	浏览次数：24次

【 摘 要 】

【 预 览 】

【摘要】

【预览】