期刊论文详细信息
Journal of Systemics, Cybernetics and Informatics
Using Statistical Properties to Enhance Text Categorization
Rached Zantout1  Ziad Osman1 
[1] ;
关键词: Statistical Properties;    text categorization;    text mining;    data mining;   
DOI  :  
来源: DOAJ
【 摘 要 】

Statistical properties extracted from text are useful in many areas. Knowing who authored some text or knowing the category of a text is among the uses of collecting such statistics. In this paper, language-independent properties of text are studied using two categorized corpora of news articles. It is observed that the properties do not depend on the corpus nor on its size. Several interesting properties are identified which enable minimizing the training set for an intelligent categorization system. Aside from text categorization, the properties can be used to compare the information content between different corpora. The properties can also be used to compare the rate of new information content between different corpora.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:0次