Journal of inequalities and applications | |
Performance of Rand’s C statistics in clustering analysis: an application to clustering the regions of Turkey | |
Sinan Saraç1  | |
关键词: Rand’s C statistics; hierarchical clustering methods; distance measures; | |
DOI : 10.1186/1029-242X-2013-142 | |
学科分类:数学(综合) | |
来源: SpringerOpen | |
【 摘 要 】
When a clustering problem is encountered, the researcher must be aware that choosing an incorrect clustering method and distance measure may significantly affect the results of the analysis. The purpose of this study is to determine the best clustering method and distance measure in cluster analysis and to cluster the regions of Turkey on the basis of this result. In hierarchical clustering, there are several clustering methods and distance measures. For comparison of the clustering methods and distance measures, Rand’s C statistic is one of the best methods. Rand’s comparative statistic C takes on values from 0.0 to 1.0 inclusive that may be used to compare two resultant clusterings produced by applying clustering methods to a data set with unknown structure or to assess the performance of a clustering method on a data set with known structure. In this study, the seven regions of Turkey are clustered by all the clustering methods and distance measures. Related with the social and economic indicators, the final cluster number is taken as three. Then, according to Rand’s C statistics, all possible pairs of distance measures for all clustering methods in hierarchical clustering are compared, and the results are given in the related tables. According to the results of all possible comparisons, Ward’s method is found to be the best among others, and final clustering of the regions is applied according to Ward’s clustering measure.
【 授权许可】
CC BY
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO201902019436301ZK.pdf | 268KB | download |