学位论文详细信息
Approach to Evaluating Clustering Using Classification Labelled Data
clustering;empirical study;Computer Science
Luu, Tuong
University of Waterloo
关键词: clustering;    empirical study;    Computer Science;   
Others  :  https://uwspace.uwaterloo.ca/bitstream/10012/5720/1/Luu_Tuong.pdf
瑞士|英语
来源: UWSPACE Waterloo Institutional Repository
PDF
【 摘 要 】

Cluster analysis has been identified as a core task in data mining for which many different algorithms have been proposed. The diversity, on one hand, provides us a wide collection of tools. On the other hand, the profusion of options easily causes confusion. Given a particular task, users do not know which algorithm is good since it is not clear how clustering algorithms should be evaluated. As a consequence, users often select clustering algorithm in a very adhoc manner.A major challenge in evaluating clustering algorithms is the scarcity of real datawith a ;;correct;; ground truth clustering. This is in stark contrastto the situation for classification tasks, where thereare abundantly many data sets labeled with their correct classifications. As a result, clustering research often relies on labeled data to evaluate and compare the results of clustering algorithms.We present a new perspective on how to use labeled data for evaluating clustering algorithms, and develop an approach for comparing clustering algorithms on the basis of classification labeled data. We then use this approach to support a novel technique for choosing among clustering algorithms when no labels are available. We use these tools to demonstrate that the utility of an algorithm depends on the specific clustering task. Investigating a set of common clusteringalgorithms, we demonstrate that there are cases where each one of themoutputs better clusterings. In contrast to the current trend of looking for a superior clustering algorithm, our findings demonstrate the need for a variety of different clustering algorithms.

【 预 览 】
附件列表
Files Size Format View
Approach to Evaluating Clustering Using Classification Labelled Data 927KB PDF download
  文献评价指标  
  下载次数:11次 浏览次数:36次