期刊论文详细信息
BMC Bioinformatics
Stability SCAD: a powerful approach to detect interactions in large-scale genomic study
Jianwei Gou3  Yang Zhao6  Yongyue Wei6  Chen Wu4  Ruyang Zhang2  Yongyong Qiu6  Ping Zeng6  Wen Tan4  Dianke Yu4  Tangchun Wu1  Zhibin Hu5  Dongxin Lin4  Hongbing Shen5  Feng Chen6 
[1] Institute of Occupational Medicine and Ministry of Education Key Laboratory for Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
[2] Section of Clinical Epidemiology, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Cancer Center, Nanjing Medical University, Nanjing, China
[3] Department of Mathematical and Statistical Sciences, Nanjing Forestry University, Nanjing, China
[4] State Key Laboratory of Molecular Oncology and Department of Etiology and Carcinogenesis, Cancer Institute and Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
[5] State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, China
[6] Department of Epidemiology and Biostatistics and Ministry of Education (MOE) Key Lab for Modern Toxicology, School of Public Health, Nanjing Medical University, Nanjing, China
关键词: Stability selection;    Smoothly clipped absolute deviation (SCAD);    Penalized logistic regression;    Least absolute shrinkage and selection operator (LASSO);    Interaction;    Genome-wide association study (GWAS);   
Others  :  1087606
DOI  :  10.1186/1471-2105-15-62
 received in 2013-05-30, accepted in 2014-02-18,  发布年份 2014
PDF
【 摘 要 】

Background

Evidence suggests that common complex diseases may be partially due to SNP-SNP interactions, but such detection is yet to be fully established in a high-dimensional small-sample (small-n-large-p) study. A number of penalized regression techniques are gaining popularity within the statistical community, and are now being applied to detect interactions. These techniques tend to be over-fitting, and are prone to false positives. The recently developed stability least absolute shrinkage and selection operator (SLASSO) has been used to control family-wise error rate, but often at the expense of power (and thus false negative results).

Results

Here, we propose an alternative stability selection procedure known as stability smoothly clipped absolute deviation (SSCAD). Briefly, this method applies a smoothly clipped absolute deviation (SCAD) algorithm to multiple sub-samples, and then identifies cluster ensemble of interactions across the sub-samples. The proposed method was compared with SLASSO and two kinds of traditional penalized methods by intensive simulation. The simulation revealed higher power and lower false discovery rate (FDR) with SSCAD. An analysis using the new method on the previously published GWAS of lung cancer confirmed all significant interactions identified with SLASSO, and identified two additional interactions not reported with SLASSO analysis.

Conclusions

Based on the results obtained in this study, SSCAD presents to be a powerful procedure for the detection of SNP-SNP interactions in large-scale genomic data.

【 授权许可】

   
2014 Gou et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117022730400.pdf 722KB PDF download
Figure 1. 86KB Image download
【 图 表 】

Figure 1.

【 参考文献 】
  • [1]Moore JH, Asselbergs FW, Williams SM: Bioinformatics challenges for genome-wide association studies. Bioinformatics 2010, 26:445-455.
  • [2]Chen L, Yu G, Langefeld CD, Miller DJ, Guy RT, Raghuram J, Yuan X, Herrington DM, Wang Y: Comparative analysis of methods for detecting interacting loci. BMC Genomics 2011, 12:344. BioMed Central Full Text
  • [3]Ritchie M, Hahn L, Roodi N, Bailey L, Dupont W, Parl F, Moore J: Multifactor-dimensionality reduction reveals high order interactions among estrogenmetabolism genes in sporadic breast cancer. Am J Hum Genet 2001, 69:138-147.
  • [4]Yang C, Wan X, Yang Q, Xue H, Yu WC: Identifying main effects and epistatic interactions from large-scale snp data via adaptive group lasso. BMC Bioinforma 2010, 11:SupplS18.
  • [5]Tibshirani R: Regression shrinkage and selection via the lasso. J R Statist Soc B 1996, 58:267-288.
  • [6]Fan J, Li R: Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 2001, 96:1348-1360.
  • [7]Winham S, Wang C, Motsinger-Reif AA: A comparison of multifactor dimensionality reduction and L1-penalized regression to identify gene-gene interactions in genetic association studies. Stat Appl Genet Mol Biol 2011, 10(1):1-23.
  • [8]Alexander D, Lange K: Stability selection for genome-wide association. Genet Epidemiol 2011, 35:722-728.
  • [9]Meinshausen N, Bülmann P: Stability selection. J Roy Statist Soc Ser B 2010, 72:417-473.
  • [10]Su Z, Marchini J, Donnelly P: Hapgen2: simulation of multiple disease snps. Bioinformatics 2011, 27:2304-2305.
  • [11]Barrett JC, Fry B, Maller J, Daly MJ: Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 2005, 21(2):263-265.
  • [12]Hu Z, Wu C, Shi Y, Guo H, Zhao X, Yin Z, Yang L, Dai J, Hu L, Tan W, Li Z, Deng Q, Wang J, Wu W, Jin G, Jiang Y, Yu D, Zhou G, Chen H, Guan P, Chen Y, Shu Y, Xu L, Liu X, Liu L, Xu P, Han B, Bai C, Zhao Y, Zhang H, et al.: A genome-wide association study identifies two new lung cancer susceptibility loci at 13q12.12 and 22q12.2 in han chinese. Nat Genet 2011, 43:792-796.
  • [13]Bush WS, Moore JH: Chapter 11: Genome-wide association studies. PLoS Comput Biol 2012, 8(12):e1002822.
  • [14]Price AL: Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006, 38:904-909.
  • [15]Baldi P, Brunak S, Chauvin Y, Andersen CAF, Nielsen H: Assessing the accuracy of prediction algorithms for classification:an overview. Bioinformatics 2000, 16(5):412-424.
  • [16]Pepe MS: The Statistical Evaluation of Medical Tests for Classification and Prediction. New York: Oxford University Press; 2003.
  • [17]Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 1995, 57(1):289-300.
  • [18]Yu T: ROCS: Receiver Operating Characteristic Surface for class-skewed high-throughput data. Plos One 2012, 7(7):e40598.
  • [19]Culverhouse R, Suarez B, Lin J, Reich T: A perspective on epistasis: limits of models displaying no main effect. Am J Hum Genet 2002, 70:461-471.
  • [20]Cordell H: Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet 2009, 10(6):392-404.
  • [21]Shang J, Zhang J, Sun Y, Liu D, Ye D, Yin Y: Performance analysis of novel methods for detecting epistasis. BMC Bioinforma 2011, 12(1):475. BioMed Central Full Text
  • [22]Wang Y, Liu X, Robbins K, Rekaya R: AntEpiSeeker: detecting epistatic interactions for case-control studies using a two-stage ant colony optimization algorithm. BMC Res Notes 2010, 3:117. BioMed Central Full Text
  • [23]Wan X, Yang C, Yang Q, Xue H, Tang N, Yu W: Predictive rule inference for epistatic interaction detection in genome-wide association studies. Bioinformatics 2010, 26(1):30-37.
  • [24]Wan X, Yang C, Yang Q, Xue H, Fan X, Tang N, Yu W: BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies. Am J Hum Genet 2010, 87(3):325-340.
  • [25]Zhang X, Huang S, Zou F, Wang W: TEAM: efficient two-locus epistasis tests in human genome-wide association study. Bioinformatics 2010, 26(12):i217-i227.
  • [26]Leng C, Lin Y, Wahba G: A note on the lasso and related procedures in model selection. Stat Sinica 2006, 16:1273-1284.
  • [27]Van Steen K: Travelling the world of gene–gene interactions. Brief Bioinform 2012, 13(1):1-19.
  • [28]Haig D: Does heritability hide in epistasis between linked SNPs? Eur J Hum Genet 2011, 19:123.
  • [29]Mea E: Using biological networks to search for interacting loci in genome-wide association studies. Eur J Hum Genet 2009, 17:1231-1240.
  • [30]Tea K-T: EPIBLASTER-fast exhaustive two-locus epistasis detection strategy using graphical processing units. Eur J Hum Genet 2011, 19:465-471.
  文献评价指标  
  下载次数:32次 浏览次数:16次