学位论文

【摘要】

We consider an extension of model-based clustering to the semi-supervised case, where some of the data are pre-labeled. We provide a derivation of the Bayesian Information Criterion (BIC) approximation to the Bayes factor in this setting. We then use the BIC to the select number of clusters and the variables useful for clustering.We discuss some considerations for $O(1)$ terms in information criteria when performing model-based clustering.Next, we explore a novel method for the initialization of the EM algorithm for the semi-supervised case using modifications to the k-means++ algorithm to account for the labels.Then, we derive an improved theoretical bound on expected cost and observe improved performance in simulated and real data examples.This analysis provides theoretical justification for a typically linear time semi-supervised clustering algorithm.We show how this algorithms outperforms related semi-supervised k-means-style algorithms on several datasets.Finally, we demonstrate semi-supervised model based clustering with our improved k-means++ initialization on two applications.First, we identify behaviotypes in a fly larva dataset.Next, we nominate interesting vertices in graphs using two types of supervision.

【预览】

附件列表
Files	Size	Format	View
On Model-Based Semi-Supervised Clustering	11668KB	PDF	download


On Model-Based Semi-Supervised Clustering
Semi-superivsed clustering;k-means;clustering;k-means++;approximation algorithm;GMM;Applied Mathematics & Statistics
Yoder, JordanPark, Youngser ;
Johns Hopkins University
关键词: Semi-superivsed clustering; k-means; clustering; k-means++; approximation algorithm; GMM; Applied Mathematics & Statistics;
Others : https://jscholarship.library.jhu.edu/bitstream/handle/1774.2/40715/YODER-DISSERTATION-2016.pdf?sequence=1&isAllowed=y
瑞士\|英语
来源: JOHNS HOPKINS DSpace Repository
PDF


	文献评价指标
	下载次数：21次	浏览次数：42次

【 摘 要 】

【 预 览 】

【摘要】

【预览】