期刊论文详细信息
Perspectivas em Ciência da Informação
Use of noun phrases in automatic classification of electronic documents
Souza, Renato Rocha1  Alarmsoft Tecnologia em Segurança1  Maia, Luiz Cláudio1  Escola de Ciência da Informação1 
关键词: Text analysis;    Clustering;    Automatic indexing;    Noun phrases;    Natural language processing;   
DOI  :  10.1590/S1413-99362010000100009
学科分类:农业科学(综合)
来源: Universidade Federal de Minas Gerais * Escola de Biblioteconomia
PDF
【 摘 要 】

This research work presents a proposal for the classification of electronic documents using techniques and algorithms based on natural language processing and noun phrases indexing along with plain keywords. Two tools, OGMA and Weka, were used for the experiments proposed. OGMA was developed by the author to automate the extraction of noun phrases and to perform the calculation of the weight of each term in the process of document indexing for each of the six proposed methods. The WEKA was used to analyze the OGMA results using the algorithms of clustering and classification "Simplekmeans" and "NaiveBayes", respectively. This process resulted in a percentage value indicating how many documents were classified correctly. The best performing methods were those with the terms without stopwords and the classified and scored noun phrases.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO201912010171504ZK.pdf 391KB PDF download
  文献评价指标  
  下载次数:4次 浏览次数:19次