期刊论文详细信息
PeerJ
Monte Carlo simulation of OLS and linear mixed model inference of phenotypic effects on gene expression
article
Jeffrey A. Walker1 
[1] Department of Biological Sciences, University of Southern Maine
关键词: Self-contained test;    Type I error;    Type S error;    Type M error;    Power;    Permutation;    Multiple outcomes;    Gene set analysis;   
DOI  :  10.7717/peerj.2575
学科分类:社会科学、人文和艺术(综合)
来源: Inra
PDF
【 摘 要 】

BackgroundSelf-contained tests estimate and test the association between a phenotype and mean expression level in a gene set defined a priori. Many self-contained gene set analysis methods have been developed but the performance of these methods for phenotypes that are continuous rather than discrete and with multiple nuisance covariates has not been well studied. Here, I use Monte Carlo simulation to evaluate the performance of both novel and previously published (and readily available via R) methods for inferring effects of a continuous predictor on mean expression in the presence of nuisance covariates. The motivating data are a high-profile dataset which was used to show opposing effects of hedonic and eudaimonic well-being (or happiness) on the mean expression level of a set of genes that has been correlated with social adversity (the CTRA gene set). The original analysis of these data used a linear model (GLS) of fixed effects with correlated error to infer effects of Hedonia and Eudaimonia on mean CTRA expression.MethodsThe standardized effects of Hedonia and Eudaimonia on CTRA gene set expression estimated by GLS were compared to estimates using multivariate (OLS) linear models and generalized estimating equation (GEE) models. The OLS estimates were tested using O’Brien’s OLS test, Anderson’s permutation rF2-test, two permutation F-tests (including GlobalAncova), and a rotation z-test (Roast). The GEE estimates were tested using a Wald test with robust standard errors. The performance (Type I, II, S, and M errors) of all tests was investigated using a Monte Carlo simulation of data explicitly modeled on the re-analyzed dataset.ResultsGLS estimates are inconsistent between data sets, and, in each dataset, at least one coefficient is large and highly statistically significant. By contrast, effects estimated by OLS or GEE are very small, especially relative to the standard errors. Bootstrap and permutation GLS distributions suggest that the GLS results in downward biased standard errors and inflated coefficients. The Monte Carlo simulation of error rates shows highly inflated Type I error from the GLS test and slightly inflated Type I error from the GEE test. By contrast, Type I error for all OLS tests are at the nominal level. The permutation F-tests have ∼1.9X the power of the other OLS tests. This increased power comes at a cost of high sign error (∼10%) if tested on small effects.DiscussionThe apparently replicated pattern of well-being effects on gene expression is most parsimoniously explained as “correlated noise” due to the geometry of multiple regression. The GLS for fixed effects with correlated error, or any linear mixed model for estimating fixed effects in designs with many repeated measures or outcomes, should be used cautiously because of the inflated Type I and M error. By contrast, all OLS tests perform well, and the permutation F-tests have superior performance, including moderate power for very small effects.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202307100014766ZK.pdf 682KB PDF download
  文献评价指标  
  下载次数:0次 浏览次数:0次