期刊论文详细信息
Entropy
Renormalization Analysis of Topic Models
Sergei Koltcov1  Vera Ignatenko1 
[1] Laboratory for Social and Cognitive Informatics, National Research University Higher School of Economics, 55/2 Sedova St., 192148 St. Petersburg, Russia;
关键词: topic modeling;    renormalization theory;    optimal number of topics;    Renyi entropy;   
DOI  :  10.3390/e22050556
来源: DOAJ
【 摘 要 】

In practice, to build a machine learning model of big data, one needs to tune model parameters. The process of parameter tuning involves extremely time-consuming and computationally expensive grid search. However, the theory of statistical physics provides techniques allowing us to optimize this process. The paper shows that a function of the output of topic modeling demonstrates self-similar behavior under variation of the number of clusters. Such behavior allows using a renormalization technique. A combination of renormalization procedure with the Renyi entropy approach allows for quick searching of the optimal number of topics. In this paper, the renormalization procedure is developed for the probabilistic Latent Semantic Analysis (pLSA), and the Latent Dirichlet Allocation model with variational Expectation–Maximization algorithm (VLDA) and the Latent Dirichlet Allocation model with granulated Gibbs sampling procedure (GLDA). The experiments were conducted on two test datasets with a known number of topics in two different languages and on one unlabeled test dataset with an unknown number of topics. The paper shows that the renormalization procedure allows for finding an approximation of the optimal number of topics at least 30 times faster than the grid search without significant loss of quality.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:0次