| BMC Bioinformatics | |
| The parameter sensitivity of random forests | |
| Methodology Article | |
| Barbara F.F. Huang1  Paul C. Boutros2  | |
| [1] Informatics and Bio-computing Program, Ontario Institute for Cancer Research, Toronto, Canada;Informatics and Bio-computing Program, Ontario Institute for Cancer Research, Toronto, Canada;Department of Medical Biophysics, University of Toronto, Toronto, Canada;Department of Pharmacology and Toxicology, University of Toronto, Toronto, Canada;MaRS Centre, 661 University Avenue, Suite 510, M5G 0A3, Toronto, Ontario, Canada; | |
| 关键词: Machine-learning; Random forest; Parameterization; Computational biology; Ensemble methods; Optimization; Microarray; SeqControl; | |
| DOI : 10.1186/s12859-016-1228-x | |
| received in 2015-12-15, accepted in 2016-08-26, 发布年份 2016 | |
| 来源: Springer | |
PDF
|
|
【 摘 要 】
BackgroundThe Random Forest (RF) algorithm for supervised machine learning is an ensemble learning method widely used in science and many other fields. Its popularity has been increasing, but relatively few studies address the parameter selection process: a critical step in model fitting. Due to numerous assertions regarding the performance reliability of the default parameters, many RF models are fit using these values. However there has not yet been a thorough examination of the parameter-sensitivity of RFs in computational genomic studies. We address this gap here.ResultsWe examined the effects of parameter selection on classification performance using the RF machine learning algorithm on two biological datasets with distinct p/n ratios: sequencing summary statistics (low p/n) and microarray-derived data (high p/n). Here, p, refers to the number of variables and, n, the number of samples. Our findings demonstrate that parameterization is highly correlated with prediction accuracy and variable importance measures (VIMs). Further, we demonstrate that different parameters are critical in tuning different datasets, and that parameter-optimization significantly enhances upon the default parameters.ConclusionsParameter performance demonstrated wide variability on both low and high p/n data. Therefore, there is significant benefit to be gained by model tuning RFs away from their default parameter settings.
【 授权许可】
CC BY
© The Author(s). 2016
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO202311102301045ZK.pdf | 3276KB |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]
- [37]
- [38]
- [39]
- [40]
- [41]
- [42]
- [43]
- [44]
- [45]
- [46]
- [47]
- [48]
- [49]
- [50]
- [51]
- [52]
- [53]
- [54]
- [55]
- [56]
- [57]
- [58]
- [59]
- [60]
- [61]
- [62]
- [63]
- [64]
- [65]
- [66]
- [67]
- [68]
- [69]
- [70]
- [71]
- [72]
- [73]
- [74]
- [75]
- [76]
- [77]
- [78]
- [79]
- [80]
- [81]
- [82]
- [83]
- [84]
- [85]
- [86]
- [87]
PDF