会议论文详细信息
2017 International Conference on Artificial Intelligence Applications and Technologies
Comparisons and Selections of Features and Classifiers for Short Text Classification
计算机科学
Wang, Ye^1 ; Zhou, Zhi^2 ; Jin, Shan^1 ; Liu, Debin^2 ; Lu, Mi^1
Department of Electrical and Computer Engineering, Texas A and M University, United States^1
Social Credits Ltd, United States^2
关键词: Artificial intelligence methods;    Conventional machines;    Data mining algorithm;    K-nearest neighbors;    Logistic regressions;    Shenzhen stock exchanges;    Short text classifications;    TF-IDF weighting;   
Others  :  https://iopscience.iop.org/article/10.1088/1757-899X/261/1/012018/pdf
DOI  :  10.1088/1757-899X/261/1/012018
学科分类:计算机科学(综合)
来源: IOP
PDF
【 摘 要 】

Short text is considerably different from traditional long text documents due to its shortness and conciseness, which somehow hinders the applications of conventional machine learning and data mining algorithms in short text classification. According to traditional artificial intelligence methods, we divide short text classification into three steps, namely preprocessing, feature selection and classifier comparison. In this paper, we have illustrated step-by-step how we approach our goals. Specifically, in feature selection, we compared the performance and robustness of the four methods of one-hot encoding, tf-idf weighting, word2vec and paragraph2vec, and in the classification part, we deliberately chose and compared Naive Bayes, Logistic Regression, Support Vector Machine, K-nearest Neighbor and Decision Tree as our classifiers. Then, we compared and analysed the classifiers horizontally with each other and vertically with feature selections. Regarding the datasets, we crawled more than 400,000 short text files from Shanghai and Shenzhen Stock Exchanges and manually labeled them into two classes, the big and the small. There are eight labels in the big class, and 59 labels in the small class.

【 预 览 】
附件列表
Files Size Format View
Comparisons and Selections of Features and Classifiers for Short Text Classification 836KB PDF download
  文献评价指标  
  下载次数:16次 浏览次数:71次