期刊论文详细信息
Journal of Data Science
Sparse Learning with Non-convex Penalty in Multi-classification
article
Nan Li1  Hao Helen Zhang2 
[1] Department of Epidemiology and Cancer Control, St. Jude Children’s Research Hospital;Department of Mathematics, University of Arizona
关键词: logistic regression;    SCAD;    supnorm;    SVM;    variable selection;   
DOI  :  10.6339/20-JDS1000
学科分类:土木及结构工程学
来源: JDS
PDF
【 摘 要 】

Multi-classification is commonly encountered in data science practice, and it has broad applications in many areas such as biology, medicine, and engineering. Variable selection in multiclass problems is much more challenging than in binary classification or regression problems. In addition to estimating multiple discriminant functions for separating different classes, we need to decide which variables are important for each individual discriminant function as well as for the whole set of functions. In this paper, we address the multi-classification variable selection problem by proposing a new form of penalty, supSCAD, which first groups all the coefficients of the same variable associated with all the discriminant functions altogether and then imposes the SCAD penalty on the supnorm of each group. We apply the new penalty to both soft and hard classification and develop two new procedures: the supSCAD multinomial logistic regression and the supSCAD multi-category support vector machine. Our theoretical results show that, with a proper choice of the tuning parameter, the supSCAD multinomial logistic regression can identify the underlying sparse model consistently and enjoys oracle properties even when the dimension of predictors goes to infinity. Based on the local linear and quadratic approximation to the non-concave SCAD and nonlinear multinomial log-likelihood function, we show that the new procedures can be implemented efficiently by solving a series of linear or quadratic programming problems. Performance of the new methods is illustrated by simulation studies and real data analysis of the Small Round Blue Cell Tumors and the Semeion Handwritten Digit data sets.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202307150000432ZK.pdf 217KB PDF download
  文献评价指标  
  下载次数:0次 浏览次数:0次