会议论文详细信息
EKAW'2000 Workshop on Ontologies and Texts
The Th(IC)2 Initiative: CorpusBased Thesaurus Construction for Indexing WWW Documents
计算机科学;社会科学(总论)
Nathalie Aussenac-Gilles* and Didier Bourigault** ; ** Université Toulouse Le Mirail ; Etudes et Recherches en Syntaxe et Sémantique (ERSS) Maison de la recherche ; 5 ; allées Antonio Machado ; 31048 TOULOUSE Cedex (F)
PID  :  80289
来源: CEUR
PDF
【 摘 要 】

This working paper reports on the early stages of our contribution to theTh(IC) project, in which, together with other French research teams, we want to test and demonstrate the interest of corpus analysis methods to design domain knowledge models. The project should lead to produce a thesaurus in French about KE research. The main stages of the method that we apply to thisexeprimentare (a) setting up a corpus, (b) selecting, adapting and combining the use of relevant NLP tools, (c) interpreting and validating their results, from which terms, lexical relations or classes are extracted, and finally (d) structuring them into a semantic network. We present the LEXTER system used to automatically extract from a corpus a list of term candidates that could later be considered as descriptors. We also comments upon the validation protocol that we set up : it relies on an interface

【 预 览 】
附件列表
Files Size Format View
The Th(IC)2 Initiative: CorpusBased Thesaurus Construction for Indexing WWW Documents 252KB PDF download
  文献评价指标  
  下载次数:3次 浏览次数:20次