Tree-structured Classification for Multivariate Binary Responses
hit rate;recursive partitioning;multivariate binary response;QSAR;classification tree
Wang, Jiuzhou ; Marc G. Genton, Committee Co-Chair,Leonard A. Stefanski, Committee Chair,S. Stanley Young, Committee Member,John F. Monahan, Committee Member,Wang, Jiuzhou ; Marc G. Genton ; Committee Co-Chair ; Leonard A. Stefanski ; Committee Chair ; S. Stanley Young ; Committee Member ; John F. Monahan ; Committee Member
In this work, a new algorithm of tree-structured classification for multivariate binary responses, the majority-vote method, is proposed. The majority-vote method is a variation of the original work of Breiman et al (1984) on Classification And Regression Trees. The majority-vote method is similar to CART in that both methods use node impurity as the basis of the splitting rules. The majority-vote method differs from CART in that it determines tree size by choosing an optimal threshold value so that the cross-validated hit rate is maximized, whereas CART uses cost-complexity pruning to determine the optimal tree size. The original motivation of this work is to handle incomplete data, missing and censoring, in a Quantitative Structure Activity Relationship (QSAR) context, where the responses are continuous measurements of activity levels. We proceed by discretizing the responses into binary variables and using the majority-vote method to analyze the resulting binary responses. The performance of the majority-vote method is compared to its continuous response counterpart, MultiSCAM, a tree-structured algorithm for analyzing multivariate continuous responses. Multivariate analysis of variance (MANOVA) is used to evaluate the relative information loss due to discretization. The predictivity of the majority-vote method is evaluated by hit rate, a commonly used criterion in drug discovery. Simulation studies show that the majority-vote method outperforms MultiSCAM for censored data in that it yields higher hit rates.
【 预 览 】
附件列表
Files
Size
Format
View
Tree-structured Classification for Multivariate Binary Responses