2017 International Conference on Artificial Intelligence Applications and Technologies | |
Comparisons and Selections of Features and Classifiers for Short Text Classification | |
计算机科学 | |
Wang, Ye^1 ; Zhou, Zhi^2 ; Jin, Shan^1 ; Liu, Debin^2 ; Lu, Mi^1 | |
Department of Electrical and Computer Engineering, Texas A and M University, United States^1 | |
Social Credits Ltd, United States^2 | |
关键词: Artificial intelligence methods; Conventional machines; Data mining algorithm; K-nearest neighbors; Logistic regressions; Shenzhen stock exchanges; Short text classifications; TF-IDF weighting; | |
Others : https://iopscience.iop.org/article/10.1088/1757-899X/261/1/012018/pdf DOI : 10.1088/1757-899X/261/1/012018 |
|
学科分类:计算机科学(综合) | |
来源: IOP | |
【 摘 要 】
Short text is considerably different from traditional long text documents due to its shortness and conciseness, which somehow hinders the applications of conventional machine learning and data mining algorithms in short text classification. According to traditional artificial intelligence methods, we divide short text classification into three steps, namely preprocessing, feature selection and classifier comparison. In this paper, we have illustrated step-by-step how we approach our goals. Specifically, in feature selection, we compared the performance and robustness of the four methods of one-hot encoding, tf-idf weighting, word2vec and paragraph2vec, and in the classification part, we deliberately chose and compared Naive Bayes, Logistic Regression, Support Vector Machine, K-nearest Neighbor and Decision Tree as our classifiers. Then, we compared and analysed the classifiers horizontally with each other and vertically with feature selections. Regarding the datasets, we crawled more than 400,000 short text files from Shanghai and Shenzhen Stock Exchanges and manually labeled them into two classes, the big and the small. There are eight labels in the big class, and 59 labels in the small class.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
Comparisons and Selections of Features and Classifiers for Short Text Classification | 836KB | download |