期刊论文详细信息
BMC Research Notes
Variable selection methods for developing a biomarker panel for prediction of dengue hemorrhagic fever
Allan R Brasier1  Hyunsu Ju1 
[1]Institute for Translational Sciences, UTMB, Galveston, TX, USA
关键词: Data mining;    Bootstrap sampling;    Classification;    Variable selection;   
Others  :  1141685
DOI  :  10.1186/1756-0500-6-365
 received in 2013-03-19, accepted in 2013-09-10,  发布年份 2013
PDF
【 摘 要 】

Background

The choice of selection methods to identify important variables for binary classification modeling is critical to produce stable models that are interpretable, that generate accurate predictions and have minimum bias. This work is motivated by data on clinical and laboratory features of severe dengue infections (dengue hemorrhagic fever, DHF) obtained from 51 individuals enrolled in a prospective observational study of acute human dengue infections.

Results

We carry out a comprehensive performance comparison using several classification models for DHF over the dengue data set. We compared variable selection results by Multivariate Adaptive Regression Splines, Learning Ensemble, Random Forest, Bayesian Moving Averaging, Stochastic Search Variable Selection, and Generalized Regularized Logistics Regression. Model averaging methods (bagging, boosting and ensemble learners) have higher accuracy, but the generalized regularized regression model has the highest predictive power because the linearity assumptions of candidate predictors are strongly satisfied via deviance chi-square testing procedures. Bootstrapping applications for evaluating predictive regression coefficients in regularized regression model are performed.

Conclusions

Feature reduction methods introduce inherent biases and therefore are data-type dependent. We propose that these limitations can be overcome using an exhaustive approach for searching feature space. Using this approach, our results suggest that IL-10, platelet and lymphocyte counts are the major features for predicting dengue DHF on the basis of blood chemistries and cytokine measurements.

【 授权许可】

   
2013 Ju and Brasier; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150327110300879.pdf 704KB PDF download
Figure 5. 96KB Image download
Figure 4. 67KB Image download
Figure 3. 64KB Image download
Figure 2. 64KB Image download
Figure 1. 62KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

【 参考文献 】
  • [1]Brasier AR, Garcia J, Wiktorowicz JE, et al.: Discovery proteomics and nonparametric modeling pipeline in the development of a candidate biomarker panel for dengue hemorrhagic fever. Clin Transl Sci 2012, 5(1):8-10.
  • [2]Brasier AR, Ju H, Garcia J, et al.: A three-component biomarker panel for prediction of dengue hemorrhagic fever. The American Society of Tropical Medicine and Hygiene 2012, 86(2):341-348.
  • [3]Akaike H: A new look at the statistical model identification. IEEE Transactions on Automatic Control 1974, 19(6):716-723.
  • [4]Schwarz G: Estimating the dimension of a model. Ann Stat 1978, 6(2):461-464.
  • [5]Friedman JH: Greedy function approximation: A gradient boosting machine. Ann Stat 2001, 2:1189-1232.
  • [6]Friedman JH: Stochastic gradient boosting: Nonlinear methods and data mining. Computational Statistics and Data Analysis 2002, 38:367-378.
  • [7]Tibshirani R: Regression shrinkage and selection via the lasso. J R Stat Soc. Series B 1996, 58(1):267-288.
  • [8]Zou H, Hastie T: Regularization and variable selection via the elastic net. J R Stat Soc. Series B 2005, 67(2):301-320.
  • [9]Zou H: The adaptive lasso and its oracle properties. J Am Stat Assoc 2006, 101(476):1418-1429.
  • [10]Friedman JH: Fast sparse regression and classification. Int J Forecasting 2012, 28(3):722-738.
  • [11]Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci 2001, 98(9):5116-5121.
  • [12]Efron B, Tibshirani R, Storey JD, et al.: Empirical bayes analysis of a microarray experiment. J Am Stat Assoc 2001, 96:1151-1160.
  • [13]Breiman L: Bagging predictors. Mach Learn 1996, 24(2):123-140.
  • [14]Fridley BL: Bayesian variable selection and model selection methods for genetic association studies. Genet Epidemiol 2009, 33:27-37.
  • [15]George E, McCulloch R: Variable selection via gibbs sampling. J Am Stat Assoc 1993, 88(423):881-889.
  • [16]Albert J, Chib S: Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 1993, 88(422):669-679.
  • [17]Friedman JH: Multivariate adaptive regression splines. Ann Stat 1991, 19(1):1-67.
  • [18]Breiman L: Random Forests. Berkeley, CA: University of California; 2001.
  • [19]Hastie T, Tibshirani R: Generalized additive models. Stat Sci 1986, 1(3):297-318.
  文献评价指标  
  下载次数:22次 浏览次数:28次