期刊论文详细信息
BMC Bioinformatics
The behaviour of random forest permutation-based variable importance measures under predictor correlation
Research Article
Carolin Strobl1  Andreas Ziegler2  James D Malley3  Kristin K Nicodemus4 
[1] Department für Statistik, Ludwig-Maximilians Universität München, Ludwigstr. 33, 80539, München, Germany;Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Maria-Goeppert-Str 1, 23562, Lübeck, Germany;Mathematical and Statistical Computing Laboratory, Division of Computational Bioscience, Center for Information Technology, National Institutes of Health, 20892, Bethesda, Maryland, USA;Statistical Genetics, Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, OX3 7BN, Oxford, UK;Department of Clinical Pharmacology, University of Oxford, Old Road Campus Research Building, Off Roosevelt Drive, OX3 7DQ, Oxford, UK;Genes, Cognition and Psychosis Program, Intramural Research Program, National Institute of Mental Health, National Institutes of Health, Room 4S-235, 10 Center Drive, 20892, Bethesda, Maryland, USA;
关键词: Linear Regression Model;    Random Forest;    Correlate Variable;    Bivariate Model;    Correlate Predictor;   
DOI  :  10.1186/1471-2105-11-110
 received in 2009-08-27, accepted in 2010-02-27,  发布年份 2010
来源: Springer
PDF
【 摘 要 】

BackgroundRandom forests (RF) have been increasingly used in applications such as genome-wide association and microarray studies where predictor correlation is frequently observed. Recent works on permutation-based variable importance measures (VIMs) used in RF have come to apparently contradictory conclusions. We present an extended simulation study to synthesize results.ResultsIn the case when both predictor correlation was present and predictors were associated with the outcome (HA), the unconditional RF VIM attributed a higher share of importance to correlated predictors, while under the null hypothesis that no predictors are associated with the outcome (H0) the unconditional RF VIM was unbiased. Conditional VIMs showed a decrease in VIM values for correlated predictors versus the unconditional VIMs under HA and was unbiased under H0. Scaled VIMs were clearly biased under HA and H0.ConclusionsUnconditional unscaled VIMs are a computationally tractable choice for large datasets and are unbiased under the null hypothesis. Whether the observed increased VIMs for correlated predictors may be considered a "bias" - because they do not directly reflect the coefficients in the generating model - or if it is a beneficial attribute of these VIMs is dependent on the application. For example, in genetic association studies, where correlation between markers may help to localize the functionally relevant variant, the increased importance of correlated predictors may be an advantage. On the other hand, we show examples where this increased importance may result in spurious signals.

【 授权许可】

CC BY   
© Nicodemus et al; licensee BioMed Central Ltd. 2010

【 预 览 】
附件列表
Files Size Format View
RO202311099001286ZK.pdf 732KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  文献评价指标  
  下载次数:2次 浏览次数:0次