会议论文详细信息
International Conference on Computing and Applied Informatics 2016
Rhetorical Sentence Categorization for Scientific Paper Using Word2Vec Semantic Representation
物理学;计算机科学
Rachman, G.H.^1 ; Khodra, M.L.^1 ; Widyantoro, D.H.^1
School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung, Indonesia^1
关键词: 10-fold cross-validation;    Rhetorical categorization;    Rhetorical structure;    Scientific papers;    Semantic representation;    Semantic similarity;    Text categorization;    word2vec;   
Others  :  https://iopscience.iop.org/article/10.1088/1742-6596/801/1/012070/pdf
DOI  :  10.1088/1742-6596/801/1/012070
学科分类:计算机科学(综合)
来源: IOP
PDF
【 摘 要 】

One of some ways to summarize scientific papers is by employing rhetorical structure of sentences. Determining rhetorical sentence itself passes through the process of text categorization. In order to get good performance, some works in text categorization have been done by employing semantic similarity words. Therefore, this paper aims to present the rhetorical sentence categorization from scientific paper by using selected features, added previous label, and Word2Vec to capture semantic similarity words. Then, this paper shows the result of employing resampling for balancing the existing instances per class and combining resampling and Word2Vec representation itself. Every experiment is tested in two classifiers, namely IBk and J48 tree. It shows that the use of previous label, Word2Vec (Skip-Gram), and resampling improves performance. After doing all the experiments in the 10-fold cross-validation, the highest performance of F-measure is achieved 84.97% by combining Word2Vec (Skip-Gram), all features, and resampling.

【 预 览 】
附件列表
Files Size Format View
Rhetorical Sentence Categorization for Scientific Paper Using Word2Vec Semantic Representation 949KB PDF download
  文献评价指标  
  下载次数:8次 浏览次数:45次