期刊论文详细信息
Information
Term-Community-Based Topic Detection with Variable Resolution
Simon Odrowski1  Andreas Hamm1 
[1] Think Tank, German Aerospace Center (DLR), 51147 Cologne, Germany;
关键词: text mining;    natural language processing;    topic modeling;    term ranking;    community detection;    corpus analysis;   
DOI  :  10.3390/info12060221
来源: DOAJ
【 摘 要 】

Network-based procedures for topic detection in huge text collections offer an intuitive alternative to probabilistic topic models. We present in detail a method that is especially designed with the requirements of domain experts in mind. Like similar methods, it employs community detection in term co-occurrence graphs, but it is enhanced by including a resolution parameter that can be used for changing the targeted topic granularity. We also establish a term ranking and use semantic word-embedding for presenting term communities in a way that facilitates their interpretation. We demonstrate the application of our method with a widely used corpus of general news articles and show the results of detailed social-sciences expert evaluations of detected topics at various resolutions. A comparison with topics detected by Latent Dirichlet Allocation is also included. Finally, we discuss factors that influence topic interpretation.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:4次