期刊论文详细信息
BMC Bioinformatics
A computational pipeline for the development of multi-marker bio-signature panels and ensemble classifiers
Oliver P Günther5  Virginia Chen5  Gabriela Cohen Freue3  Robert F Balshaw3  Scott J Tebbutt6  Zsuzsanna Hollander1  Mandeep Takhar5  W Robert McMaster4  Bruce M McManus6  Paul A Keown2  Raymond T Ng7 
[1] Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, V6T 2B5, Canada
[2] Department of Medicine, University of British Columbia, Vancouver, BC, V5Z 1M9, Canada
[3] Department of Statistics, University of British Columbia, Vancouver, BC, V6T 1Z2, Canada
[4] Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada
[5] NCE CECR Prevention of Organ Failure (PROOF) Centre of Excellence, Vancouver, BC, V6Z 1Y6, Canada
[6] James Hogg Research Centre, St. Paul’s Hospital, University of British Columbia, Vancouver, BC, V6Z 1Y6, Canada
[7] Department of Computer Science, University of British Columbia, Vancouver, BC, V6T 1Z2, Canada
关键词: Classification;    Ensemble;    Proteomics;    Genomics;    Pipeline;    Computational;    Biomarkers;   
Others  :  1088044
DOI  :  10.1186/1471-2105-13-326
 received in 2012-04-03, accepted in 2012-12-04,  发布年份 2012
PDF
【 摘 要 】

Background

Biomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individualgenomic and proteomic classifiers in an ensemble?

Results

The first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity.

Conclusion

Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway.

【 授权许可】

   
2012 Günther et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117070502775.pdf 1140KB PDF download
Figure 6. 54KB Image download
Figure 5. 69KB Image download
Figure 4. 45KB Image download
Figure 3. 35KB Image download
Figure 2. 46KB Image download
Figure 1. 72KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

【 参考文献 】
  • [1]Fassett RG, Venuthurupalli SK, Gobe GC, Coombes JS, Cooper MA, Hoy WE: Biomarkers in chronic kidney disease: a review. Kidney Int 2011, 80:806-821.
  • [2]Vasan RS: Biomarkers of cardiovascular disease: molecular basis and practical considerations. Circulation 2006, 113:2335-2362.
  • [3]Dash PK, Zhao J, Hergenroeder G, Moore AN: Biomarkers for the diagnosis, prognosis, and evaluation of treatment efficacy for traumatic brain injury. Neurotherapeutics 2010, 7:100-114.
  • [4]Racusen LC, Solez K, Colvin RB, Bonsib SM, Castro MC, Cavallo T, Croker BP, Demetris AJ, Drachenberg CB, Fogo AB, Furness P, Gaber LW, Gibson IW, Glotz D, Goldberg JC, Grande J, Halloran PF, Hansen HE, Hartley B, Hayry PJ, Hill CM, Hoffman EO, Hunsicker LG, Lindblad AS, Yamaguchi Y: The Banff 97 working classification of renal allograft pathology. Kidney Int 1999, 55:713-723.
  • [5]Günther OP, Balshaw RF, Scherer A, Hollander Z, Mui A, Triche TJ, Freue GC, Li G, Ng RT, Wilson-McManus J, McMaster WR, McManus BM, Keown PA: Functional genomic analysis of peripheral blood during early acute renal allograft rejection. Transplantation 2009, 88:942-951.
  • [6]Freue GVC, Sasaki M, Meredith A, Günther OP, Bergman A, Takhar M, Mui A, Balshaw RF, Ng RT, Opushneva N, Hollander Z, Li G, Borchers CH, Wilson-McManus J, McManus BM, Keown PA, McMaster WR: Proteomic signatures in plasma during early acute renal allograft rejection. Mol Cell Proteomics 2010, 9:1954-1967.
  • [7]Flechner SM, Kurian SM, Head SR, Sharp SM, Whisenant TC, Zhang J, Chismar JD, Horvath S, Mondala T, Gilmartin T, Cook DJ, Kay SA, Walker JR, Salomon DR: Kidney transplant rejection and tissue injury by gene profiling of biopsies and peripheral blood lymphocytes. Am J Transplant 2004, 4:1475-1489.
  • [8]Kurian SM, Heilman R, Mondala TS, Nakorchevsky A, Hewel JA, Campbell D, Robison EH, Wang L, Lin W, Gaber L, Solez K, Shidban H, Mendez R, Schaffer RL, Fisher JS, Flechner SM, Head SR, Horvath S, Yates JR, Marsh CL, Salomon DR: Biomarkers for early and late stage chronic allograft nephropathy by proteogenomic profiling of peripheral blood. PLoS One 2009, 4:e6212.
  • [9]Perkins D, Verma M, Park KJ: Advances of genomic science and systems biology in renal transplantation: a review. Semin Immunopathol 2011, 33(2):211-218.
  • [10]Lin D, Hollander Z, Ng RT, Imai C, Ignaszewski A, Balshaw R, Freue GC, Wilson-McManus JE, Qasimi P, Meredith A, Mui A, Triche T, McMaster R, Keown PA, McManus BM: Whole blood genomic biomarkers of acute cardiac allograft rejection. J Heart Lung Transplant 2009, 28:927-935.
  • [11]Bernstein D, Williams GE, Eisen H, Mital S, Wohlgemuth JG, Klingler TM, Fang KC, Deng MC, Kobashigawa J: Gene expression profiling distinguishes a molecular signature for grade 1B mild acute cellular rejection in cardiac allograft recipients. J Heart Lung Transplant 2007, 26:1270-1280.
  • [12]Bloom G, Yang IV, Boulware D, Kwong KY, Coppola D, Eschrich S, Quackenbush J, Yeatman TJ: Multi-platform, multi-site, microarray-based human tumor classification. Am J Pathol 2004, 164:9-16.
  • [13]Li G, Zhang W, Zeng H, Chen L, Wang W, Liu J, Zhang Z, Cai Z: An integrative multi-platform analysis for discovering biomarkers of osteosarcoma. BMC Cancer 2009, 9:150.
  • [14]Kim-Anh LC, Debra R, Christèle R-G, Philippe B: A sparse PLS for variable selection when integrating omics data. Stat Appl Genet Mol Biol 2008., 7(1) Article 35
  • [15]Kittler J, Hatef M, Duin RPW, Matas J: On combining classifiers. IEEE Trans Pattern Anal Mach Intell 1998, 20:226-239.
  • [16]Rokach L: Ensemble-based classifiers. Artif Intell Rev 2010, 33:1-39.
  • [17]Polikar R: Ensemble based systems in decision making. Circ Syst Mag IEEE 2006, 6:21-45.
  • [18]Cohen Freue GV, Bergman A, Meredith A, Lam K, Sasaki M, Smith D, Hollander Z, Opushneva N, Takhar M, Lin D, Wilson-McManus J, Balshaw RF, Ng RT, Keown PA, McManus B, Borchers CH, McMaster WR: Computational biomarker pipeline from discovery to clinical implementation: human plasma proteomic biomarkers for cardiac transplantation. PLoS Comp Biounder review
  • [19]Cohen Freue GV, Hollander Z, Shen E, Zamar RH, Balshaw R, Scherer A, McManus B, Keown P, McMaster WR, Ng RT: MDQC: a new quality assessment method for microarrays based on quality control reports. Bioinformatics 2007, 23:3162-3169.
  • [20]Kauffmann A, Gentleman R, Huber W: ArrayQualityMetrics—a bioconductor package for quality assessment of microarray data. Bioinformatics 2009, 25:415-416.
  • [21]Günther OP, Lin D, Balshaw RF, Ng RT, Hollander Z, Wilson-McManus J, McMaster WR, McManus BM, Keown PA: Effects of sample timing and treatment on gene expression in early acute renal allograft rejection. Transplantation 2011, 91:323-329.
  • [22]Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP: Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 2003, 31:e15.
  • [23]Harbron C, Chang K-M, South MC: RefPlus: an R package extending the RMA Algorithm. Bioinformatics 2007, 23:2493-2494.
  • [24]Wu Z, Irizarry RA, Gentleman R, Martinez-Murillo F, Spencer F: A model-based background adjustment for oligonucleotide expression arrays. Am Stat Assoc 2004, 99:909-917.
  • [25]Hochreiter S, Clevert D-A, Obermayer K: A new summarization method for Affymetrix probe level data. Bioinformatics 2006, 22:943-949.
  • [26]Bourgon R, Gentleman R, Huber W: Independent filtering increases detection power for high-throughput experiments. Proc Nat Aca Sci USA 2010, 107:9546-9551.
  • [27]Smyth GK: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 2004., 3Article 3
  • [28]Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 2001, 98:5116-5121.
  • [29]Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Stat Soc Series B (Methodological) 1995, 57:289-300.
  • [30]Dudoit , Shaffer JP, Boldrick JC: Multiple hypothesis testing in microarray experiments. Stat Sci 2003, 18:71-103.
  • [31]Ben-Hur A, Ong CS, Sonnenburg S, Schölkopf B, Rätsch G: Support vector machines and kernels for computational biology. PLoS Comput Biol 2008, 4:e1000173.
  • [32]Breiman L: Random forests. Mach Learn 2001, 45:5-32.
  • [33]Friedman J, Hastie T, Tibshirani R: Regularization paths for generalized linear models via coordinate descent. J Stat Softw 2010, 33:1-22.
  • [34]Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second edition. New York: Corr. 3rd printing. Springer; 2009. http://www.springer.com/statistics/statistical+theory+and+methods/book/978-0-387-84857-0 webcite
  • [35]Tibshirani R, Hastie T, Narasimhan B, Chu G: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci USA 2002, 99:6567-6572.
  • [36]Zhang Q, Hughes-Oliver JM, Ng RT: A model-based ensembling approach for developing QSARs. J Chem Inform Model 2009, 49:1857-1865.
  • [37]Kuncheva LI, Whitaker CJ: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 2003, 51:181-207.
  • [38]Jahrer M, Töscher A, Legenstein R: Combining predictions for accurate recommender systems. Proc 16th ACM SIGKDD Int Conf Knowledge Discovery and Data Mining 2010, 693-702.
  • [39]Netflix Prize: Home. http://www.netflixprize.com/ webcite
  • [40]Lu J, Getz G, Miska EA, Alvarez-Saavedra E, Lamb J, Peck D, Sweet-Cordero A, Ebert BL, Mak RH, Ferrando AA, Downing JR, Jacks T, Horvitz HR, Golub TR: MicroRNA expression profiles classify human cancers. Nature 2005, 435:834-838.
  • [41]Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang C-H, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP, Poggio T, Gerald W, Loda M, Lander ES, Golub TR: Multiclass cancer diagnosis using tumor gene expression signatures. PNAS 2001, 98:15149-15154.
  • [42]Luo S-T, Cheng B-W: Diagnosing breast masses in digital mammography using feature selection and ensemble methods. J Med Syst 2010, 98(26):15149-15154.
  • [43]Oh S, Lee MS, Zhang B-T: Ensemble learning with active example selection for imbalanced biomedical data classification. IEEE/ACM Trans Comput Biol Bioinform 2011, 8:316-325.
  • [44]Afridi TH, Khan A, Lee YS: Mito-GSAAC: mitochondria prediction using genetic ensemble classifier and split amino acid composition. Amino Acids 2011, 42(4):1443-1454.
  • [45]Peeters P, Van Laecke S, Vanholder R: Acute kidney injury in solid organ transplant recipients. Acta Clin Belg Suppl 2007, 389-392.
  • [46]de Fijter JW: Rejection and function and chronic allograft dysfunction. Kidney Int Suppl 2010, 78(S119):S38-S41.
  • [47]Statnikov A, Tsamardinos I, Dosbayev Y, Aliferis CF: GEMS: a system for automated cancer diagnosis and biomarker discovery from microarray gene expression data. Int J Med Inform 2005, 74:491-503.
  • [48]Lee S: Mistakes in validating the accuracy of a prediction classifier in high-dimensional but small-sample microarray data. Stat Methods Med Res 2008, 17:635-642.
  • [49]Khatri P, Sirota M, Butte AJ: Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comp Bio 2012., 8(2)
  文献评价指标  
  下载次数:86次 浏览次数:41次