Dasah, Julius Berry ; David Dickey, Committee Member,Leonard Stefanski, Committee Co-Chair,Dennis Boos, Committee Chair,Jason osborne, Committee Member,Dasah, Julius Berry ; David Dickey ; Committee Member ; Leonard Stefanski ; Committee Co-Chair ; Dennis Boos ; Committee Chair ; Jason osborne ; Committee Member
In many applied fields of study such as medicine, psychology, ecology, taxonomy and finance one has to deal with massive amounts of noisy but structured data. A question that often arises in this context is whether or not the observations in these datafall into some "natural" groups, and if so, how many groups? This dissertation proposes a new quantity, called the [it maximal jump function], for assessing the number of groups in a data set. The estimated maximal jump functionmeasures the excess transformed [it distortion] attainable by fitting an extra cluster to a data set. By [it distortion,] we mean the average distance between each observation and its nearest cluster center. [it Distortion] $ d g$ in the above sense, is a measure of the error incurred by fitting $g$ clusters to a data set. Three stopping rules based on the maximal jump function are proposed for determining the number of groups in a data set.A new procedure for clustering data sets with a common covariance structure is also introduced. The proposed methods are tested on a wide variety of real data including DNA microarray data sets as well as on high-dimensional simulated data possessing numerous "noisy" features⁄dimensions. Also, to show the effectiveness of the proposed methods,comparisons are made to some well known clustering methods.
【 预 览 】
附件列表
Files
Size
Format
View
Estimating the Number of Clusters in Cluster Analysis