科技报告详细信息
Mining Information from Heterogenous Sources: A Topic Modelling Approach
Ghosh, Rumi ; Asur, Sitaram
HP Development Company
关键词: experimentation;    method;    measurement;    heterogenous sources;    integrating sources;    topic models;    data mining;   
RP-ID  :  HPL-2013-83
学科分类:计算机科学(综合)
美国|英语
来源: HP Labs
PDF
【 摘 要 】

In recent years, the phenomenal growth and popularity of social media, news and discussion websites has led to a vast number of information sources available online. These sources generate massive amounts of real-time content on a daily basis making it increasingly difficult to glean true and useful information from them. Automatically categorizing and compressing important contextual informationfrom these sources is crucial for tasks such as web document classification and summarization. Therefore, in this paper, we propose a novel topic modeling framework Probabilistic Source LDA which is designed to handle heterogeneous sources. Probabilistic Source LDA can compute latent topics for each source, maintain topic-topic correspondence between sources and yet retain the distinct identity of each individual source. Therefore, it helps to mine and organize correlated information from many di erent sources. At the same time, it aids in automatically reducing noise and redundancy in the information gathered. Using real data on the US elections 2012, we demonstrate that our Probabilistic Source LDA method can extract highly relevant latent topics while maintaining topic-topic congruence between differentsources.

【 预 览 】
附件列表
Files Size Format View
RO201804100000433LZ 581KB PDF download
  文献评价指标  
  下载次数:15次 浏览次数:20次