| Scientific Research and Essays | |
| A hybrid approach for text categorization by using x2 statistic, principal component analysis and particle swarm optimization | |
| Harun Uğuz1  | |
| 关键词: Text categorization; feature selection; particle swarm optimization; principal component analysis; x2 statistic.; | |
| DOI : 10.5897/SRE10.1214 | |
| 学科分类:社会科学、人文和艺术(综合) | |
| 来源: Academic Journals | |
PDF
|
|
【 摘 要 】
Today, the number of text documents in digital form is progressively increasing and text categorization becomes the key technology of dealing with organizing text data. A major problem of text categorization is a huge-scale number of features. Most of those are useless, irrelevant or redundant for text categorization. Therefore, these features can decrease the classification performance. In order to eliminate this deficiency, feature selection is often used in text categorization for the purpose of reducing the dimensionality of the feature space and improving the performance of text categorization.In this study, in order to improve the performance of text categorization, a hybrid approach is suggested based onx2statistic, particle swarm optimization (PSO) and principal component analysis (PCA). In this context, initially, each term within the document is ranked depending on their importance for the classification usingx2statisticmethod and, particle swarm optimization (PSO) and principal component analysis (PCA) feature selection and feature extraction methods are applied separately on the terms of which importance are ranked in decreasing order and dimension reduction is carried out. In this way, during the text categorization, less importance terms are ignored, feature selection and feature extraction methods are applied on the highest importance terms, and cost of computational time and complexity to be occurred in the course of the application are reduced.To evaluate the effectiveness of purposed model, experiments were conducted using K-nearest neighbor (KNN) and C4.5 decision tree algorithm on Reuters-21578 and Classic3 datasets collection for text categorization. The experimental evaluation showed that the proposed model was effective for text categorization.
【 授权许可】
CC BY
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO201902010745504ZK.pdf | 525KB |
PDF