This dissertation introduces novel statistical methods for the analysis of high-dimensional biomedical data. First, we present a normalization method for DNA methylation microarray data, called Functional normalization (Funnorm). The methods extends quantile normalization to remove unwanted variation using control probes. Using several cancer datasets from The Cancer Genome Atlas (TCGA), we show that Funnorm improves the replication of the biological findings in cancer studies. Second, we present a between-scan normalization method for structural magnetic resonance imaging (MRI) data. We use voxels that are not associated with the outcome of interest, for instance the cerebrospinal fluid (CSF) voxels in the ventricles, to model the unwanted variation across scans. We show that our method, called Removal of Artificial Voxel Effect by Linear regression (RAVEL), improves the replicability of the voxels associated with Alzheimer;;s diseases estimated from T1-weighted images. Third, we present a computational method that predicts A/B compartments as revealed by Hi-C data. using long-range correlations in epigenetic data.Analysis of Hi-C data has shown that the genome can be divided into two compartments called A/B compartments. These compartments are cell-type specific and are associated with open and closed chromatin. We show that A/B compartments can reliably be estimated using data from the Illumina 450k DNA methylation microarray, DNase hypersensitivity sequencing, single-cell ATAC sequencing and single-cell whole-genome bisulfite sequencing. Finally, we present shinyMethyl, a Bioconductor package for interactive quality control of DNA methylation data from the Illumina 450k array. shinyMethyl makes it easy to perform quality assessment of large-scale methylation datasets, such as epigenome-wide association studies or the datasets available through TCGA.
【 预 览 】
附件列表
Files
Size
Format
View
Statistical methods for epigenetic data and structural magnetic resonance imaging