期刊论文

【摘要】

Coreference resolution systems aim to recognize and cluster together mentions of the same underlying entity. While there exist large amounts of research on broadly spoken languages such as English and Chinese, research on coreference in other languages is comparably scarce. In this work we first present SentiCoref 1.0 - a coreference resolution dataset for Slovene language that is comparable to English-based corpora. Further, we conduct a series of analyses using various complex models that range from simple linear models to current state-of-the-art deep neural coreference approaches leveraging pre-trained contextual embeddings. Apart from SentiCoref, we evaluate models also on a smaller coref149 Slovene dataset to justify the creation of a new corpus. We investigate robustness of the models using cross-domain data and data augmentations. Models using contextual embeddings achieve the best results - up to 0.92 average F1 score for the SentiCoref dataset. Cross-domain experiments indicate that SentiCoref allows the models to learn more general patterns, which enables them to outperform models, learned on coref149 only.

【授权许可】

CC BY-NC-ND

【预览】

附件列表
Files	Size	Format	View
RO202307150003277ZK.pdf	871KB	PDF	download

Computer Science and Information Systems
Neural coreference resolution for Slovene language
article
Matej Klemen¹ Slavko Žitnik¹
[1] University of Ljubljana, Faculty of Computer and Information Science Vecna pot 113
关键词: coreference resolution; Slovene language; neural networks; word embeddings;
DOI : 10.2298/CSIS201120060K
学科分类：土木及结构工程学
来源: Computer Science and Information Systems
PDF


	文献评价指标
	下载次数：7次	浏览次数：0次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】