期刊论文详细信息
BMC Bioinformatics
Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification
Yong Liang2  Cheng Liu2  Xin-Ze Luan2  Kwong-Sak Leung1  Tak-Ming Chan1  Zong-Ben Xu3  Hai Zhang3 
[1] Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
[2] Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Macau, China
[3] Faculty of Science, Xi’an Jiaotong University, Xian, China
关键词: Cancer classification;    Sparse logistic regression;    Gene selection;   
Others  :  1087838
DOI  :  10.1186/1471-2105-14-198
 received in 2012-07-04, accepted in 2013-05-30,  发布年份 2013
PDF
【 摘 要 】

Background

Microarray technology is widely used in cancer diagnosis. Successfully identifying gene biomarkers will significantly help to classify different cancer types and improve the prediction accuracy. The regularization approach is one of the effective methods for gene selection in microarray data, which generally contain a large number of genes and have a small number of samples. In recent years, various approaches have been developed for gene selection of microarray data. Generally, they are divided into three categories: filter, wrapper and embedded methods. Regularization methods are an important embedded technique and perform both continuous shrinkage and automatic gene selection simultaneously. Recently, there is growing interest in applying the regularization techniques in gene selection. The popular regularization technique is Lasso (L1), and many L1 type regularization terms have been proposed in the recent years. Theoretically, the Lq type regularization with the lower value of q would lead to better solutions with more sparsity. Moreover, the L1/2 regularization can be taken as a representative of Lq (0 < q < 1) regularizations and has been demonstrated many attractive properties.

Results

In this work, we investigate a sparse logistic regression with the L1/2 penalty for gene selection in cancer classification problems, and propose a coordinate descent algorithm with a new univariate half thresholding operator to solve the L1/2 penalized logistic regression. Experimental results on artificial and microarray data demonstrate the effectiveness of our proposed approach compared with other regularization methods. Especially, for 4 publicly available gene expression datasets, the L1/2 regularization method achieved its success using only about 2 to 14 predictors (genes), compared to about 6 to 38 genes for ordinary L1 and elastic net regularization approaches.

Conclusions

From our evaluations, it is clear that the sparse logistic regression with the L1/2 penalty achieves higher classification accuracy than those of ordinary L1 and elastic net regularization approaches, while fewer but informative genes are selected. This is an important consideration for screening and diagnostic applications, where the goal is often to develop an accurate test using as few features as possible in order to control cost. Therefore, the sparse logistic regression with the L1/2 penalty is effective technique for gene selection in real classification problems.

【 授权许可】

   
2013 Liang et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117050811907.pdf 855KB PDF download
Figure 3. 54KB Image download
Figure 2. 57KB Image download
Figure 1. 50KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

【 参考文献 】
  • [1]Dudoit S, Fridlyand S, Speed TP: Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 2002, 97(457):77-87.
  • [2]Li T, Zhang C, Ogihara M: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 2004, 20:2429-2437.
  • [3]Lee JW, Lee JB, Park M, Song SH: An extensive evaluation of recent classification tools applied to microarray data. Com Stat Data Anal 2005, 48:869-885.
  • [4]Ding C, Peng H: Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput 2005, 3(2):185-205.
  • [5]Monari G, Dreyfus G: Withdrawing an example from the training set: an analytic estimation of its effect on a nonlinear parameterized model. Neurocomputing Letters 2000, 35:195-201.
  • [6]Rivals I, Personnaz L: MLPs (mono-layer polynomials and multi-layer perceptrons) for nonlinear modeling. J Mach Learning Res 2003, 3:1383-1398.
  • [7]Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing J, Caligiuri M, Bloomfield C, Lander E: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999, 286:531-537.
  • [8]Guyon I, Elisseff A: An Introduction to variable and feature selection. J Mach Learning Res 2003, 3:1157-1182.
  • [9]Shevade SK, Keerthi SS: A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 2003, 19:2246-2253.
  • [10]Tibshirani R: Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B 1996, 58:267-288.
  • [11]Fiedman J, Hastie T, Hofling H, Tibshirani R: Path wise coordinate optimization. Ann. Appl. Statist. 2007, 1:302-332.
  • [12]Fiedman J, Hastie T, Hofling H, Tibshirani R: Regularization paths for generalized linear models via coordinate descent. J. Statist. Softw. 2010, 33:1-22.
  • [13]Gavin CC, Talbot LC: Gene selection in cancer classification using sparse logistic regression with Bayesian regularization. Bioinformatics 2006, 22:2348-2355.
  • [14]Xu ZB, Zhang H, Wang Y, Chang XY, Liang Y: L1/2 regularization. Sci China Series F 2010, 40(3):1-11.
  • [15]Fan J, Li R: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 2001, 96:1348-1361.
  • [16]Zou H, Hastie T: Regularization and variable selection via the elastic net. J Royal Stat Soc Series B 2005, 67(2):301-320.
  • [17]Zhang CH: Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 2010, 38:894-942.
  • [18]Xu ZB, Chang XY, Xu FM, Zhang H: L1/2 Regularization: a thresholding representation theory and a fast solver. IEEE Transact Neural Networks Learn Syst 2012, 23(7):1013-1027.
  • [19]Sohn I, Kim J, Jung SH, Park C: Gradient lasso for Cox proportional hazards model. Bioinformatics 2009, 25(14):1775-1781.
  • [20]Yang K, Cai ZP, Li JZ, Lin GH: A stable gene selection in microarray data analysis. BMC Bioinformatics 2006, 7:228. BioMed Central Full Text
  • [21]Alon U, Barkai N, Notterman D, Gish K, Ybarra S, Mack D, Levine A: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Nat Acad Sci USA 1999, 96(12):6745-6750.
  • [22]Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, Gaasenbeek M, Amgel M, Reich M, Pinkus GS, Ray TS, Kovall MA, Last KW, Norton A, Lister TA, Mesirov J, Neuberg DS, Lander ES, Aster JC, Golub TR: Diffuse large B-cell lymphoma outcome prediction by gene expression profiling and supervised machine learning. Nat Med 2002, 8:68-74.
  • [23]Nagai A, Terashima M, Harada T, Shimode K, Takeuchi H, Murakawa Y, et al.: Cathepsin B and H activities and cystatin C concentrations in cerebrospinal fluid from patients with leptomeningeal metastasis. Clin Chim Acta 2003, 329:53-60.
  • [24]Moroz C, Traub L, Maymon R, Zahalka MA: A novel human ferritin subunit from placenta with immunosuppressive activity. J Biol Chem 2002, 277:12901-12905.
  • [25]Ben-Dor A, et al.: Tissue classification with gene expression profiles. J Comput Biol 2000, 7:559-583.
  • [26]Yang AJ, Song XY: Bayesian variable selection for disease classification using gene expression data. Bioinformatics 2010, 26:215-222.
  • [27]Li HD, Xu QS, Liang YZ: Random frog: an efficient reversible jump Markov chain Monte Carlo-like approach for variable selection with applications to gene selection and disease classification. Anal Chim Acta 2012, 740:20-26.
  • [28]Notterman DA, Alon U, Sierk AJ, Levine AJ: Minimax probability machine. Advances in neural processing systems. Cancer Res 2001, 61:3124-3130.
  • [29]Shailubhai K, Yu H, Karunanandaa K, Wang J, Eber S, Wang Y, Joo N, Kim H, Miedema B, Abbas S, Boddupalli S, Currie M, Forte L: Uroguanylin treatment suppeesses polyp formation in the Apc(Min/+) mouse and indices apoptosis in human colon adenocarcinoma cells via cyclic GMP. Cancer Res 2000, 60:5151-5157.
  • [30]Maglietta R, Addabbo A, Piepoli A, Perri F, Liuni S, Pesole G, Ancona N: Selection of relevant genes in cancer diagnosis based on their prediction accuracy. Art Intell Med 2007, 40:29-44.
  • [31]Wiese AH J, Lassmann S, Nahrig J, Rosenberg R, Hofler H, Ruger R, Werner M: Identification of gene signatures for invasive colorectal tumor cells. Cancer Detect Prev 2007, 31:282-295.
  • [32]Wang SL, Li XL, Zhang SW, Gui J, Huang DS: Tumor classification by combining PNN classifier ensemble with neighborhood rough set based gene reduction. Comp Biol Med 2010, 40:179-189.
  • [33]Dai JH, Xu Q: Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. App Soft Comp 2013, 13:211-221.
  文献评价指标  
  下载次数:74次 浏览次数:15次