期刊论文详细信息
Engineering and Applied Science Research
Concept-based one-class SVM classifier with supervised term weighting scheme for imbalanced sentiment classification
关键词: imbalanced sentiment classification;    word2vec;    page rank;    one-class svm;    concept-based method;    supervised term weighting;    hotel reviews;   
DOI  :  
来源: DOAJ
【 摘 要 】

Imbalanced sentiment is one of the key classification issues. Many studies have proposed imbalanced sentiment classification improvements, but the topic remains problematic as a major challenge. This paper proposes a method, called “concept-based one-class SVM classifier”, to address imbalanced sentiment classification that consists of three main techniques. First, we apply Word2Vec and PageRank algorithms to extract “concepts” and their related terms (called “members”) embedded in texts. The corpus of “concepts” is then used to prepare the dataset by replacing words with the “concepts”. This reduces term ambiguity and also the size of word vectors.Second, supervised term weighting (STW) schemes are applied to determine the importance of a word in a document of a specific class. This reflects the class distinguishing power of each term. Finally, the one-class support vector machine (SVM) algorithm is used for sentiment classifier modeling. This has proved useful for imbalanced data classification, especially when the minority class lacks structure and is predominantly composed of small disjuncts or outliers. By combining these techniques, our proposed method may be able to competently identify and distinguish between the characteristics of each class, especially in the context of an imbalanced data scenario. After validating the proposed method with the hotel review dataset, and running experiments with different imbalanced ratios, our proposed method returned satisfactory results of recall, precision, and F1. We then selected the best model generated from our method and compared the results to the state-of-the-art method. Our proposed method returned better results than the state-of-the-art method, with improved scores of F1 at 3.19%. Moreover, if considering for the computational processing time, our proposed method is faster than the state-of-the-art method.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:0次