学位论文

【摘要】

Various high-throughput technologies have fueled advances in biomedical research in the last decade. Two typical examples are gene expression and genomic hybridization microarrays that quantify RNA and DNA levels respectively. High-dimensional data sets generated by these technologies presented novel opportunities to discover relationships not only among interrogating probes (i.e genes) but also among interrogated specimens (i.e samples). At the same time, however, the necessity to model the variability within and between different high-throughput platforms has created novel statistical challenges. In this thesis, I address the opportunities and challenges with three algorithms. First, I present DynBoost, a new method to infer gene-gene dependence relationships and nonlinear dynamics in gene regulatory networks. DynBoost is a flexible boosting algorithm that shares features from L2-boosting and randomization-based algorithms to perform the tasks of parameter learning and network inference. The performance of the proposed algorithm was evaluated on a number of benchmark data sets from the DREAM3 challenge and the results strongly indicated that it outperformed existing approaches. Second, I revisit consensus clustering (CC) and some other clustering methods in the context of unsupervised sample subtype discovery. I show that many unsupervised partitioning methods are able to divide homogeneous data into pre-specified numbers of clusters, and CC is able to show apparent stability of such chance partitioning of random data. I conclude that CC is a powerful tool for minimizing false negatives in the presence of genuine structure, but can lead to false positives in the exploratory phase of many studies if the implementation and inference are not carried out with caution in line with particular prudent practices. Lastly, I present MPCBS, a new method that integrates DNA copy number analysis across different platforms by pooling statistical evidence during segmentation. I show by comparing the integrated analysis of Affymetrix and Illumina SNP array data with Agilent and fosmid clone end-sequencing results on 8 HapMap samples that MPCBS achieves improved spatial resolution, detection power, and provides a natural consensus across platforms.

【预览】

附件列表
Files	Size	Format	View
Developing and Application of Statistical Algorithms for High-Demensional Biological Data Analysis	17797KB	PDF	download


Developing and Application of Statistical Algorithms for High-Demensional Biological Data Analysis
Consensus Clustering;Unsupervised Class Discovery;Reverse-engineering Gene Regulatory Networks;DNA Copy Number Estimation;Operator-valued Kernels;TCGA Glioblastoma Multiforme;Computer Science;Molecular;Cellular and Developmental Biology;Science (General);Statistics and Numeric Data;Engineering;Health Sciences;Science;Bioinformatics
Senbabaoglu, YasinD ; ; alche-Buc, Florence ;
University of Michigan
关键词: Consensus Clustering; Unsupervised Class Discovery; Reverse-engineering Gene Regulatory Networks; DNA Copy Number Estimation; Operator-valued Kernels; TCGA Glioblastoma Multiforme; Computer Science; Molecular; Cellular and Developmental Biology; Science (General); Statistics and Numeric Data; Engineering; Health Sciences; Science; Bioinformatics;
Others : https://deepblue.lib.umich.edu/bitstream/handle/2027.42/94038/yasinsen_1.pdf?sequence=1&isAllowed=y
瑞士\|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF


	文献评价指标
	下载次数：4次	浏览次数：5次

【 摘 要 】

【 预 览 】

【摘要】

【预览】