Random forests (RFs) isone of the most widely employed machine learning algorithms for general classification tasks due to its speed, ease-of-use, and excellent empirical performance. Recent large-scale comparisons of classification algorithms have concluded that RFs outperform many other classifiers on a variety of datasets. However, the trees in a RF are constructed via a series of recursive axis-aligned splits, rendering the learning procedure sensitive to the orientation of the data. Several studies have proposed ``oblique;;;; decision forest methods to address this limitation, which search for good splits that aren;;t constrained to be axis-aligned. In this work, we explore how properties of the split selection procedure relate to empirical and theoretical performance. We then establish a generalized decision forest framework called Randomer Forests (RerFs), which encompasses RFs and many previously proposed decision forest algorithms as particular instantiations.With this framework in mind, we propose a default instantiation and provide theoretical and experimental evidence motivating its use. Additionally, we demonstrate how our framework can exploit prior domain knowledge to boost performance. Last, we use RerF to identify important biomarkers for ovarian cancer classification and learn a classifier with high sensitivity and specificity.
【 预 览 】
附件列表
Files
Size
Format
View
Generalized Linear Splitting Rules in Decision Forests