期刊论文详细信息
IEEE Access
Distributed Clustering of Text Collections
Juan Zamora1  Marcelo Mendoza2  Hector Allende-Cid3 
[1] Instituto de Estad&x00ED;lica de Valpara&x00ED;stica, Pontificia Universidad Cat&x00F3;
关键词: Distributed algorithms;    distributed text clustering;    high dimensional data;   
DOI  :  10.1109/ACCESS.2019.2949455
来源: DOAJ
【 摘 要 】

Current data processing tasks require efficient approaches capable of dealing with large databases. A promising strategy consists in distributing the data along with several computers that partially solve the undertaken problem. Finally, these partial answers are integrated to obtain a final solution. We introduce distributed shared nearest neighbors (D-SNN), a novel clustering algorithm that work with disjoint partitions of data. Our algorithm produces a global clustering solution that achieves a competitive performance regarding centralized approaches. The algorithm works effectively with high dimensional data, being advisable for document clustering tasks. Experimental results over five data sets show that our proposal is competitive in terms of quality performance measures when compared to state of the art methods.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:2次