学位论文详细信息
Estimating the Number of Clusters in Cluster Analysis
High-dimensional Data;Noise Features;Jump Function;Distortion;Cluster Analysis
Dasah, Julius Berry ; David Dickey, Committee Member,Leonard Stefanski, Committee Co-Chair,Dennis Boos, Committee Chair,Jason osborne, Committee Member,Dasah, Julius Berry ; David Dickey ; Committee Member ; Leonard Stefanski ; Committee Co-Chair ; Dennis Boos ; Committee Chair ; Jason osborne ; Committee Member
University:North Carolina State University
关键词: High-dimensional Data;    Noise Features;    Jump Function;    Distortion;    Cluster Analysis;   
Others  :  https://repository.lib.ncsu.edu/bitstream/handle/1840.16/4606/etd.pdf?sequence=1&isAllowed=y
美国|英语
来源: null
PDF
【 摘 要 】

In many applied fields of study such as medicine, psychology, ecology, taxonomy and finance one has to deal with massive amounts of noisy but structured data. A question that often arises in this context is whether or not the observations in these datafall into some "natural" groups, and if so, how many groups? This dissertation proposes a new quantity, called the [it maximal jump function], for assessing the number of groups in a data set. The estimated maximal jump functionmeasures the excess transformed [it distortion] attainable by fitting an extra cluster to a data set. By [it distortion,] we mean the average distance between each observation and its nearest cluster center. [it Distortion] $ d g$ in the above sense, is a measure of the error incurred by fitting $g$ clusters to a data set. Three stopping rules based on the maximal jump function are proposed for determining the number of groups in a data set.A new procedure for clustering data sets with a common covariance structure is also introduced. The proposed methods are tested on a wide variety of real data including DNA microarray data sets as well as on high-dimensional simulated data possessing numerous "noisy" features⁄dimensions. Also, to show the effectiveness of the proposed methods,comparisons are made to some well known clustering methods.

【 预 览 】
附件列表
Files Size Format View
Estimating the Number of Clusters in Cluster Analysis 1259KB PDF download
  文献评价指标  
  下载次数:15次 浏览次数:37次