| PATTERN RECOGNITION | 卷:103 |
| Textual data summarization using the Self-Organized Co-Clustering model | |
| Article | |
| Selosse, Margot1  Jacques, Julien1  Biernacki, Christophe2,3  | |
| [1] Univ Lyon, Lyon & ERIC EA3083 2, 5 Ave Pierre Mendes, Bron 69500, France | |
| [2] Univ Lille, UFR Math, Cite Sci, Villeneuve Dascq 59655, France | |
| [3] INRIA, 40 Av Halley,Bat A,Pk Plaza, Villeneuve Dascq 59650, France | |
| 关键词: Co-Clustering; Document-term matrix; Latent block model; | |
| DOI : 10.1016/j.patcog.2020.107315 | |
| 来源: Elsevier | |
PDF
|
|
【 摘 要 】
Recently, different studies have demonstrated the use of co-clustering, a data mining technique which simultaneously produces row-clusters of observations and column-clusters of features. The present work introduces a novel co-clustering model to easily summarize textual data in a document-term format. In addition to highlighting homogeneous co-clusters as other existing algorithms do we also distinguish noisy co-clusters from significant co-clusters, which is particularly useful for sparse document-term matrices. Furthermore, our model proposes a structure among the significant co-clusters, thus providing improved interpretability to users. The approach proposed contends with state-of-the-art methods for document and term clustering and offers user-friendly results. The model relies on the Poisson distribution and on a constrained version of the Latent Block Model, which is a probabilistic approach for co-clustering. A Stochastic Expectation-Maximization algorithm is proposed to run the model's inference as well as a model selection criterion to choose the number of co-clusters. Both simulated and real data sets illustrate the efficiency of this model by its ability to easily identify relevant co-clusters. (C) 2020 Elsevier Ltd. All rights reserved.
【 授权许可】
Free
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| 10_1016_j_patcog_2020_107315.pdf | 2587KB |
PDF