期刊论文详细信息
International Journal of Computational Intelligence Systems 卷:13
Tolerance Rough Set-Based Bag-of-Words Model for Document Representation
关键词: Document representation;    Tolerance rough set;    Bag-of-Words;   
DOI  :  10.2991/ijcis.d.200808.001
来源: DOAJ
【 摘 要 】

Document representation is one of the foundations of natural language processing. The bag-of-words (BoW) model, as the representative of document representation models, is a method with the properties of simplicity and validity. However, the traditional BoW model has the drawbacks of sparsity and lacking of latent semantic relations. In this paper, to solve these mentioned problems, we propose two tolerance rough set-based BOW models, called as TRBoW1 and TRBoW2 according to different weight calculation methods. Different from the popular representation methods of supervision, they are unsupervised and no prior knowledge required. Extending each document to its upper approximation with TRBoW1 or TRBoW2, the semantic relations among documents are mined and document vectors become denser. Comparative experiments on various document representation methods for text classification on different datasets have verified optimal performance of our methods.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:0次