学位论文详细信息
Bayesian nonparametric clusterings in relational and high-dimensional settings with applications in bioinformatics.
Bayesian nonparametrics;Machine learning;Clustering;Bioinformatics;Relational and high dimensional
Dazhuo Li
University:University of Louisville
Department:Computer Engineering and Computer Science
关键词: Bayesian nonparametrics;    Machine learning;    Clustering;    Bioinformatics;    Relational and high dimensional;   
Others  :  https://ir.library.louisville.edu/cgi/viewcontent.cgi?article=1826&context=etd
美国|英语
来源: The Universite of Louisville's Institutional Repository
PDF
【 摘 要 】

Recent advances in high throughput methodologies offer researchers the ability to understand complex systems via high dimensional and multi-relational data. One example is the realm of molecular biology where disparate data (such as gene sequence, gene expression, and interaction information) are available for various snapshots of biological systems. This type of high dimensional and multirelational data allows for unprecedented detailed analysis, but also presents challenges in accounting for all the variability. High dimensional data often has a multitude of underlying relationships, each represented by a separate clustering structure, where the number of structures is typically unknown a priori. To address the challenges faced by traditional clustering methods on high dimensional and multirelational data, we developed three feature selection and cross-clustering methods: 1) infinite relational model with feature selection (FIRM) which incorporates the rich information of multirelational data; 2) Bayesian Hierarchical Cross-Clustering (BHCC), a deterministic approximation to Cross Dirichlet Process mixture (CDPM) and to cross-clustering; and 3) randomized approximation (RBHCC), based on a truncated hierarchy. An extension of BHCC, Bayesian Congruence Measuring (BCM), is proposed to measure incongruence between genes and to identify sets of congruent loci with identical evolutionary histories. We adapt our BHCC algorithm to the inference of BCM, where the intended structure of each view (congruent loci) represents consistent evolutionary processes. We consider an application of FIRM on categorizing mRNA and microRNA. The model uses latent structures to encode the expression pattern and the gene ontology annotations. We also apply FIRM to recover the categories of ligands and proteins, and to predict unknown drug-target interactions, where latent categorization structure encodes drug-target interaction, chemical compound similarity, and amino acid sequence similarity. BHCC and RBHCC are shown to have improved predictive performance (both in terms of cluster membership and missing value prediction) compared to traditional clustering methods. Our results suggest that these novel approaches to integrating multi-relational information have a promising future in the biological sciences where incorporating data related to varying features is often regarded as a daunting task.

【 预 览 】
附件列表
Files Size Format View
Bayesian nonparametric clusterings in relational and high-dimensional settings with applications in bioinformatics. 13230KB PDF download
  文献评价指标  
  下载次数:49次 浏览次数:16次