With the emergence of high throughput technology, proper interpretation of data has become critical for many aspects of biomedical research.My dissertation explores two major issues in gene expression profile microarray data analysis.One is quantification of variation across and among species and its effect on biological interpretation.The second part of my work is to develop better statistical estimates that can account for different sources of variation for significant gene detection.A previously published dataset of oligonucleotide array data for three primate species was analyzed with linear mixed models.By decomposing the variation of expression into different explanatory factors, the differences among species as well as between tissues was revealed at the expression level.Issues of cross-species hybridization and expression divergence compared to mutation-drift equilibrium were addressed.The power and flexibility of the linear mixed model framework for detection of differentially expressed genes was then explored with a dataset that includes spiked-in controls.The impact of probe-level sequence variation on cross-hybridization was detected through a Gibb's sampling method that highlights potential problems for short oligonucleotide microarray data analysis.A motif as short as fifteen bases can possibly cause significant cross-hybridization.Finally, a bivariate model using information from both perfect match probes and mismatch probes was proposed as a means to increase the statistical power for detection of significant differences in gene expression.The improved performance of the method was demonstrated through Monte Carlo simulation.The detection power can increase as much as 20% with 5% false positive rate under certain circumstances.
【 预 览 】
附件列表
Files
Size
Format
View
Analysis of Gene Expression Profiles with Linear Mixed Models