Journal of Clinical Bioinformatics | |
A novel tree-based procedure for deciphering the genomic spectrum of clinical disease entities | |
Philippe Broët1  Wilson Toussile2  Hervé Perdry2  Cyprien Mbogning3  | |
[1] Assistance Publique – Hôpitaux de Paris, Hôpital Paul Brousse, Villejuif, France;Faculty of Medicine Paris-Sud, 63 rue Gabriel Peri, 94276 Le Kremlin-Bicêtre, France;Inserm U669, 14-16 Avenue Paul-Vaillant-Couturier, 94807 Villejuif, France | |
关键词: Genomic; Disease taxonomy; Lung cancer; Tree-based regression; Recursive partitioning; | |
Others : 800939 DOI : 10.1186/2043-9113-4-6 |
|
received in 2013-12-23, accepted in 2014-04-08, 发布年份 2014 | |
【 摘 要 】
Background
Dissecting the genomic spectrum of clinical disease entities is a challenging task. Recursive partitioning (or classification trees) methods provide powerful tools for exploring complex interplay among genomic factors, with respect to a main factor, that can reveal hidden genomic patterns. To take confounding variables into account, the partially linear tree-based regression (PLTR) model has been recently published. It combines regression models and tree-based methodology. It is however computationally burdensome and not well suited for situations for which a large number of exploratory variables is expected.
Methods
We developed a novel procedure that represents an alternative to the original PLTR procedure, and considered different selection criteria. A simulation study with different scenarios has been performed to compare the performances of the proposed procedure to the original PLTR strategy.
Results
The proposed procedure with a Bayesian Information Criterion (BIC) achieved good performances to detect the hidden structure as compared to the original procedure. The novel procedure was used for analyzing patterns of copy-number alterations in lung adenocarcinomas, with respect to Kirsten Rat Sarcoma Viral Oncogene Homolog gene (KRAS) mutation status, while controlling for a cohort effect. Results highlight two subgroups of pure or nearly pure wild-type KRAS tumors with particular copy-number alteration patterns.
Conclusions
The proposed procedure with a BIC criterion represents a powerful and practical alternative to the original procedure. Our procedure performs well in a general framework and is simple to implement.
【 授权许可】
2014 Mbogning et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20140708001650565.pdf | 815KB | download | |
Figure 3. | 195KB | Image | download |
Figure 6. | 18KB | Image | download |
Figure 5. | 58KB | Image | download |
Figure 4. | 65KB | Image | download |
Figure 3. | 22KB | Image | download |
Figure 2. | 18KB | Image | download |
Figure 1. | 62KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
Figure 3.
【 参考文献 】
- [1]Roberts P, Stinchcombe T: Kras mutation: should we test for it, and does it matter? J Clin Oncol 2013, 31(8):1112-21.
- [2]Rajagopalan H, Lengauer C: Aneuploidy and cancer. Nature 2004, 432:338-341.
- [3]Breiman L, Olshen JH, Stone CJ: Classification and Regression Trees. Belmont, California: Wadsworth International Group; 1984.
- [4]Breiman L: Random forest. Technical Report, Department of Statistics, University of California at Berkeley. 2002
- [5]Diaz-Uriarte R, Alvarez de Andrés S: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 2006, 7(1):1-13. BioMed Central Full Text
- [6]Guan X, Chance MR, Barnholtz-Sloan JS: Splitting random forest (srf) for determining compact sets of genes that distinguish between cancer subtypes. J Clin Bioinform 2012, 2(1):1-12.
- [7]Liaw A, Wiener M: Classification and regression by randomforest. R News 2002, 2(3):18-22.
- [8]Chen J, Yu K, Hsing A, Therneau TM: A partially linear tree-based regression model for assessing complex joint gene-gene and gene-environment effects. Genet Epidemiol 2007, 31:238-251.
- [9]Yu K, Wheeler W, Li Q, Bergen AW, Caporaso N, Chatterjee N, Chen J: A partially linear tree-based regression model for multivariate outcomes. Biometrics 2010, 66(1):89-96.
- [10]Akaike H: A new look at the statistical model identification. IEEE Trans Automat Control 1974, AC-19:716-723.
- [11]Schwarz G: Estimating the dimension of a model. Ann Stat 1978, 6:461-464.
- [12]Fan J, Zhang C, Zhang J: Generalized likelihood ratio statistics and wilks phenomenon. Ann Stat 2001, 29(1):153-193.
- [13]Broët P, Dalmasso C, Tan E, Alifano M, Zhang S, Wu J, Lee M, Régnard J, Lim D, Koong H, Agasthian T, Miller L, Lim E, Camilleri-Broët S, Tan P: Genomic profiles specific to patient ethnicity in lung adenocarcinoma. Clin Cancer Res 2011, 17(11):3542-50.
- [14]Dalmasso C, Broët P: Detection of chromosomal abnormalities using high resolution arrays in clinical cancer research. J Biomed Inform 2011, 44(6):936-942.