期刊论文

【摘要】

Big data analysis requires the presence of large computing powers, which is not always feasible. And so, it became necessary to develop new clustering algorithms capable of such data processing. This study proposes a new parallel clustering algorithm based on the k-means algorithm. It significantly reduces the exponential growth of computations. The proposed algorithm splits a dataset into batches while preserving the characteristics of the initial dataset and increasing the clustering speed. The idea is to define cluster centroids, which are also clustered, for each batch. According to the obtained centroids, the data points belong to the cluster with the nearest centroid. Real large datasets are used to conduct the experiments to evaluate the effectiveness of the proposed approach. The proposed approach is compared with k-means and its modification. The experiments show that the proposed algorithm is a promising tool for clustering large datasets in comparison with the k-means algorithm.

【授权许可】

CC BY|CC BY-ND|CC BY-NC|CC BY-NC-ND

【预览】

附件列表
Files	Size	Format	View
RO202107100000033ZK.pdf	240KB	PDF	download

CAAI Transactions on Intelligence Technology
Efficient algorithm for big data clustering on single machine
article
Rasim M. Alguliyev¹ Ramiz M. Aliguliyev¹ Lyudmila V. Sukhostat¹
[1] Institute of Information Technology, Azerbaijan National Academy of Sciences
关键词: pattern clustering; data analysis; Big Data; big data clustering; single machine; big data analysis; computing powers; clustering algorithms; data processing; k-means algorithm; initial dataset; clustering speed; cluster centroids; data points; C6130 Data handling techniques;
DOI : 10.1049/trit.2019.0048
学科分类：数学（综合）
来源: Wiley
PDF


	文献评价指标
	下载次数：6次	浏览次数：0次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】