学位论文

【摘要】

Often, machine learning and big data concepts are applied to problems without a proper appreciation of their limitations or domain context. At the same time there is a growing appreciation for the ability of networks to represent more complex connections between data points than previous structures. However, established machine learning approaches rarely take advantage of such structures and must be adapted. We present here a method that utilizes patterns of connections within heterogeneous networks to score items by their similarity to an input set. We apply the idea of meta-paths as an abstraction to counteract typical big data problems of noise and overfitting. We also aim to demystify the black-box nature of machine learning by providing intuitive feedback about why items are considered similar. While the method presented here is generalizable to any domain, the specific examples explored are within the genomics domain. The final tool, GeneSet MAPR, is especially useful in a domain with little ground truth and a huge volume of noisy, uncertain data. We show that GeneSet MAPR performs better at discovering related but concealed data points than an approach using the same data without abstraction, as well as a an established state-of-the-art approach that works on a network but ignores the heterogeneous patterns. It does this while providing details the other methods cannot.

【预览】

附件列表
Files	Size	Format	View
GeneSet MAPR: Characterization of gene sets through heterogeneous network patterns	1646KB	PDF	download


GeneSet MAPR: Characterization of gene sets through heterogeneous network patterns
graph theory;network;meta-paths;bioinformatics;machine learning;pattern recognition;big data;statistical analysis;p-value
Linkowski, Gregory ; Vasudevan ; Shobha
关键词: graph theory; network; meta-paths; bioinformatics; machine learning; pattern recognition; big data; statistical analysis; p-value;
Others : https://www.ideals.illinois.edu/bitstream/handle/2142/101057/LINKOWSKI-THESIS-2018.pdf?sequence=1&isAllowed=y
美国\|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF


	文献评价指标
	下载次数：30次	浏览次数：63次

【 摘 要 】

【 预 览 】

【摘要】

【预览】