科技报告

【摘要】

In this paper we propose a framework of topic modeling ensembles, a novel solution to combine the models learned by topic modeling over each partition of the whole corpus. It has the potentials for applications such as distributed topic modeling for large corpora, and incremental topic modeling for rapidly growing corpora. Since only the base models, not the original documents, are required in the ensemble, all these applications can be performed in a privacy preserving manner. We explore the theoretical foundation of the proposed framework, give its geometric interpretation, and implement it for both PLSA and LDA. The evaluation of the implementations over the synthetic and real- life data sets shows that the proposed framework is much more efficient than modeling the original corpus directly while achieves comparable effectiveness in terms of perplexity and classification accuracy.

【预览】

附件列表
Files	Size	Format	View
RO201804100002707LZ	558KB	PDF	download


Topic Modeling Ensembles

Shen, Zhiyong ; Luo, Ping ; Yang, Shengwen ; Shen, Xukun
HP Development Company
关键词: Topic model; Ensemble;
RP-ID : HPL-2010-158
学科分类：计算机科学（综合）
美国\|英语
来源: HP Labs
PDF


	文献评价指标
	下载次数：52次	浏览次数：29次

【 摘 要 】

【 预 览 】

【摘要】

【预览】