会议论文

【摘要】

Vector space models benefit from using an outside corpus to train the model. It is, however, unclear what constitutes a good training corpus. We have investigated the effect on summary quality when using various language resources to train a vector space based extraction summarizer. This is done by evaluating the performance of the summarizer utilizing vector spaces built from corpora from different genres, partitioned from the Swedish SUC-corpus. The corpora are also characterized using a variety of lexical measures commonly used in readability studies. The performance of the summarizer is measured by comparing automatically produced summaries to human created gold standard summaries using the ROUGE F-score. Our results show that the genre of the training corpus does not have a significant effect on summary quality. However, evaluating the variance in the F-score between the genres based on lexical measures as independent variables in a linear regression model, shows that vector spaces created from texts with high syntactic complexity, high word variation, short sentences and few long words produce better summaries.

【预览】

附件列表
Files	Size	Format	View
A good space: Lexical predictors in vector space evaluation	418KB	PDF	download

Eighth International Conference on Language Resources and Evaluation
A good space: Lexical predictors in vector space evaluation

Christian Smith ; Henrik Danielsson ; Arne Jo¨nsson
Others : http://www.lrec-conf.org/proceedings/lrec2012/pdf/335_Paper.pdf PID : 51640

来源: CEUR
PDF


	文献评价指标
	下载次数：6次	浏览次数：4次

【 摘 要 】

【 预 览 】

【摘要】

【预览】