期刊论文详细信息
Malaysian Journal of Computer Science
Comparative Study of Feature Selection Approaches for Urdu Text Categorization
Tehseen Zia1  Muhammad Pervez Akhter1  Qaiser Abbas1 
关键词: Text Categorization;    Feature Selection;    Urdu;    Performance Evaluation;    Test Collection;   
DOI  :  
学科分类:社会科学、人文和艺术(综合)
来源: University of Malaya * Faculty of Computer Science and Information Technology
PDF
【 摘 要 】

This paper presentsacomparative study of feature selection methods for Urdu text categorization. Fivewellknownfeatureselection methods were analyzedby means ofsixrecognized classification algorithms: supportvector machines (with linear, polynomial and radial basis kernels), naive Bayes, k-nearest neighbour (KNN),and decision tree (i.e. J48). Experimentations are performed on two test collections includinga standardEMILLE collection and a naive collection. We have found that information gain, Chi statistics, and symmetricaluncertainfeature selection methods have uniformly performed in mostly cases. We also found that no solofeature selection technique is best for every classifier.That is,naive Bayes and J48 have advantage with gainratio than other feature selection methods. Similarly, support vector machines (SVM) and KNN classifiers haveshown top performance with information gain.Generally,linear SVM with any of feature selection methods outperformedother classifiers on moderate-size naive collection.Conversely, naive Bayes with any of featureselection technique has an advantage over other classifiers for a small-size EMILLE corpus.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO201912010262693ZK.pdf 1096KB PDF download
  文献评价指标  
  下载次数:20次 浏览次数:17次