学位论文详细信息
On Model-Based Semi-Supervised Clustering
Semi-superivsed clustering;k-means;clustering;k-means++;approximation algorithm;GMM;Applied Mathematics & Statistics
Yoder, JordanPark, Youngser ;
Johns Hopkins University
关键词: Semi-superivsed clustering;    k-means;    clustering;    k-means++;    approximation algorithm;    GMM;    Applied Mathematics & Statistics;   
Others  :  https://jscholarship.library.jhu.edu/bitstream/handle/1774.2/40715/YODER-DISSERTATION-2016.pdf?sequence=1&isAllowed=y
瑞士|英语
来源: JOHNS HOPKINS DSpace Repository
PDF
【 摘 要 】

We consider an extension of model-based clustering to the semi-supervised case, where some of the data are pre-labeled. We provide a derivation of the Bayesian Information Criterion (BIC) approximation to the Bayes factor in this setting. We then use the BIC to the select number of clusters and the variables useful for clustering.We discuss some considerations for $O(1)$ terms in information criteria when performing model-based clustering.Next, we explore a novel method for the initialization of the EM algorithm for the semi-supervised case using modifications to the k-means++ algorithm to account for the labels.Then, we derive an improved theoretical bound on expected cost and observe improved performance in simulated and real data examples.This analysis provides theoretical justification for a typically linear time semi-supervised clustering algorithm.We show how this algorithms outperforms related semi-supervised k-means-style algorithms on several datasets.Finally, we demonstrate semi-supervised model based clustering with our improved k-means++ initialization on two applications.First, we identify behaviotypes in a fly larva dataset.Next, we nominate interesting vertices in graphs using two types of supervision.

【 预 览 】
附件列表
Files Size Format View
On Model-Based Semi-Supervised Clustering 11668KB PDF download
  文献评价指标  
  下载次数:9次 浏览次数:37次