期刊论文

【摘要】

Problem statement: Text documents are the unstructured databases that contain raw data collection. The clustering techniques are used group up the text documents with reference to its similarity. Approach: The feature selection techniques were used to improve the efficiency and accuracy of clustering process. The feature selection was done by eliminate the redundant and irrelevant items from the text document contents. Statistical methods were used in the text clustering and feature selection algorithm. The cube size is very high and accuracy is low in the term based text clustering and feature selection method. The semantic clustering and feature selection method was proposed to improve the clustering and feature selection mechanism with semantic relations of the text documents. The proposed system was designed to identify the semantic relations using the ontology. The ontology was used to represent the term and concept relationship. Results: The synonym, meronym and hypernym relationships were represented in the ontology. The concept weights were estimated with reference to the ontology. The concept weight was used for the clustering process. The system was implemented in two methods. They were term clustering with feature selection and semantic clustering with feature selection. Conclusion: The performance analysis was carried out with the term clustering and semantic clustering methods. The accuracy and efficiency factors were analyzed in the performance analysis.

【授权许可】

Unknown

【预览】

附件列表
Files	Size	Format	View
RO201911300594404ZK.pdf	96KB	PDF	download

Journal of Computer Science
Integrated Clustering and Feature Selection Scheme for Text Documents. \| Science Publications

M. Thangamani¹ P. Thangaraj¹
关键词: Clustering; text mining; ontology; feature selection; document clustering;
DOI : 10.3844/jcssp.2010.536.541
学科分类：计算机科学（综合）
来源: Science Publications
PDF


	文献评价指标
	下载次数：13次	浏览次数：19次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】