期刊论文详细信息
BMC Bioinformatics
Intervention in prediction measure: a new approach to assessing variable importance for random forests
Methodology Article
Irene Epifanio1 
[1] Departament de Matemàtiques and Institut de Matemàtiques i Aplicacions de Castelló, Universitat Jaume I, Campus del Riu Sec, 12071, Castelló, Spain;
关键词: Random forest;    Variable importance measure;    Multivariate response;    Feature selection;    Conditional inference trees;   
DOI  :  10.1186/s12859-017-1650-8
 received in 2017-01-13, accepted in 2017-04-25,  发布年份 2017
来源: Springer
PDF
【 摘 要 】

BackgroundRandom forests are a popular method in many fields since they can be successfully applied to complex data, with a small sample size, complex interactions and correlations, mixed type predictors, etc. Furthermore, they provide variable importance measures that aid qualitative interpretation and also the selection of relevant predictors. However, most of these measures rely on the choice of a performance measure. But measures of prediction performance are not unique or there is not even a clear definition, as in the case of multivariate response random forests.MethodsA new alternative importance measure, called Intervention in Prediction Measure, is investigated. It depends on the structure of the trees, without depending on performance measures. It is compared with other well-known variable importance measures in different contexts, such as a classification problem with variables of different types, another classification problem with correlated predictor variables, and problems with multivariate responses and predictors of different types.ResultsSeveral simulation studies are carried out, showing the new measure to be very competitive. In addition, it is applied in two well-known bioinformatics applications previously used in other papers. Improvements in performance are also provided for these applications by the use of this new measure.ConclusionsThis new measure is expressed as a percentage, which makes it attractive in terms of interpretability. It can be used with new observations. It can be defined globally, for each class (in a classification problem) and case-wise. It can easily be computed for any kind of response, including multivariate responses. Furthermore, it can be used with any algorithm employed to grow each individual tree. It can be used in place of (or in addition to) other variable importance measures.

【 授权许可】

CC BY   
© The Author(s) 2017

【 预 览 】
附件列表
Files Size Format View
RO202311104320125ZK.pdf 878KB PDF download
Fig. 6 47KB Image download
Fig. 1 86KB Image download
【 图 表 】

Fig. 1

Fig. 6

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  文献评价指标  
  下载次数:7次 浏览次数:2次