期刊论文

【摘要】

This paper presentsacomparative study of feature selection methods for Urdu text categorization. Fivewellknownfeatureselection methods were analyzedby means ofsixrecognized classification algorithms: supportvector machines (with linear, polynomial and radial basis kernels), naive Bayes, k-nearest neighbour (KNN),and decision tree (i.e. J48). Experimentations are performed on two test collections includinga standardEMILLE collection and a naive collection. We have found that information gain, Chi statistics, and symmetricaluncertainfeature selection methods have uniformly performed in mostly cases. We also found that no solofeature selection technique is best for every classifier.That is,naive Bayes and J48 have advantage with gainratio than other feature selection methods. Similarly, support vector machines (SVM) and KNN classifiers haveshown top performance with information gain.Generally,linear SVM with any of feature selection methods outperformedother classifiers on moderate-size naive collection.Conversely, naive Bayes with any of featureselection technique has an advantage over other classifiers for a small-size EMILLE corpus.

【授权许可】

Unknown

【预览】

附件列表
Files	Size	Format	View
RO201912010262693ZK.pdf	1096KB	PDF	download

Malaysian Journal of Computer Science
Comparative Study of Feature Selection Approaches for Urdu Text Categorization

Tehseen Zia¹ Muhammad Pervez Akhter¹ Qaiser Abbas¹
关键词: Text Categorization; Feature Selection; Urdu; Performance Evaluation; Test Collection;
DOI :
学科分类：社会科学、人文和艺术（综合）
来源: University of Malaya * Faculty of Computer Science and Information Technology
PDF


	文献评价指标
	下载次数：20次	浏览次数：17次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】