International Journal of Computational Intelligence Systems | 卷:13 |
Tolerance Rough Set-Based Bag-of-Words Model for Document Representation | |
关键词: Document representation; Tolerance rough set; Bag-of-Words; | |
DOI : 10.2991/ijcis.d.200808.001 | |
来源: DOAJ |
【 摘 要 】
Document representation is one of the foundations of natural language processing. The bag-of-words (BoW) model, as the representative of document representation models, is a method with the properties of simplicity and validity. However, the traditional BoW model has the drawbacks of sparsity and lacking of latent semantic relations. In this paper, to solve these mentioned problems, we propose two tolerance rough set-based BOW models, called as TRBoW1 and TRBoW2 according to different weight calculation methods. Different from the popular representation methods of supervision, they are unsupervised and no prior knowledge required. Extending each document to its upper approximation with TRBoW1 or TRBoW2, the semantic relations among documents are mined and document vectors become denser. Comparative experiments on various document representation methods for text classification on different datasets have verified optimal performance of our methods.
【 授权许可】
Unknown