期刊论文详细信息
BMC Systems Biology
Decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes
Stephen W Edwards3  Jane E Gallagher4  Edward E Hudgens4  Pierre R Bushel6  Elaine Cohen Hubal5  David M Reif2  ClarLynda R Williams-DeVane1 
[1] Present address: Julius L. Chambers Biomedical/Biotechnology Research Institute, Biology Department-North Carolina Central University, Durham, NC 27707, USA;Present address: Biological Sciences Department, North Carolina State University, Raleigh, NC 27695, USA;National Health and Environmental Effects Research Laboratory – Integrated Systems Toxicology Division, U.S. Environmental Protection Agency, Research Triangle Park, Durham, NC 27711, USA;National Health and Environmental Effects Research Laboratory – Environmental Public Health Division, U.S. Environmental Protection Agency, Research Triangle Park, Durham, NC 27711, USA;National Center for Computational Toxicology, U.S. Environmental Protection Agency, Research Triangle Park, Durham, NC 27711, USA;Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park, Durham, North Carolina 27709, USA
关键词: Integrated analysis;    Gene Expression;    Endotypes;    Asthma;   
Others  :  1141820
DOI  :  10.1186/1752-0509-7-119
 received in 2012-09-18, accepted in 2013-10-18,  发布年份 2013
PDF
【 摘 要 】

Background

Complex diseases are often difficult to diagnose, treat and study due to the multi-factorial nature of the underlying etiology. Large data sets are now widely available that can be used to define novel, mechanistically distinct disease subtypes (endotypes) in a completely data-driven manner. However, significant challenges exist with regard to how to segregate individuals into suitable subtypes of the disease and understand the distinct biological mechanisms of each when the goal is to maximize the discovery potential of these data sets.

Results

A multi-step decision tree-based method is described for defining endotypes based on gene expression, clinical covariates, and disease indicators using childhood asthma as a case study. We attempted to use alternative approaches such as the Student’s t-test, single data domain clustering and the Modk-prototypes algorithm, which incorporates multiple data domains into a single analysis and none performed as well as the novel multi-step decision tree method. This new method gave the best segregation of asthmatics and non-asthmatics, and it provides easy access to all genes and clinical covariates that distinguish the groups.

Conclusions

The multi-step decision tree method described here will lead to better understanding of complex disease in general by allowing purely data-driven disease endotypes to facilitate the discovery of new mechanisms underlying these diseases. This application should be considered a complement to ongoing efforts to better define and diagnose known endotypes. When coupled with existing methods developed to determine the genetics of gene expression, these methods provide a mechanism for linking genetics and exposomics data and thereby accounting for both major determinants of disease.

【 授权许可】

   
2013 Williams-DeVane et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150327143808470.pdf 1794KB PDF download
Figure 6. 98KB Image download
Figure 5. 97KB Image download
Figure 4. 84KB Image download
Figure 3. 99KB Image download
Figure 2. 59KB Image download
Figure 1. 62KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

【 参考文献 】
  • [1]Auffray C, Adcock IM, Chung KF, Djukanovic R, Pison C, Sterk PJ: An integrative systems biology approach to understanding pulmonary diseases. Chest 2010, 137:1410-1416.
  • [2]Vineis P, Perera F: Molecular epidemiology and biomarkers in etiologic cancer research: the new in light of the old. Cancer Epidemiol Biomarkers Prev 2007, 16:1954-1965.
  • [3]Ballman KV: Genetics and genomics: gene expression microarrays. Circulation 2008, 118:1593-1597.
  • [4]Sen B, Mahadevan B, DeMarini DM: Transcriptional responses to complex mixtures: a review. Mutation research 2007, 636:144-177.
  • [5]Fry RC, Navasumrit P, Valiathan C, Svensson JP, Hogan BJ, Luo M, Bhattacharya S, Kandjanapa K, Soontararuks S, Nookabkaew S, Mahidol C, Ruchirawat M, Samson LD: Activation of inflammation/NF-kappaB signaling in infants born to arsenic-exposed mothers. PLoS Genet 2007, 3:e207.
  • [6]Jost-Albrecht K, Hofstetter W: Gene expression by human monocytes from peripheral blood in response to exposure to metals. J Biomed Mater Res B Appl Biomater 2006, 76:449-455.
  • [7]van Leeuwen DM, van Herwijnen MH, Pedersen M, Knudsen LE, Kirsch-Volders M, Sram RJ, Staal YC, Bajak E, van Delft JH, Kleinjans JC: Genome-wide differential gene expression in children exposed to air pollution in the Czech Republic. Mutat Res 2006, 600:12-22.
  • [8]Lobenhofer EK, Auman JT, Blackshear PE, Boorman GA, Bushel PR, Cunningham ML, Fostel JM, Gerrish K, Heinloth AN, Irwin RD, Malarkey DE, Merrick BA, Sieber SO, Tucker CJ, Ward SM, Wilson RE, Hurban P, Tennant RW, Paules RS: Gene expression response in target organ and whole blood varies as a function of target organ injury phenotype. Genome Biol 2008, 9:R100. BioMed Central Full Text
  • [9]Grond-Ginsbach C, Hummel M, Wiest T, Horstmann S, Pfleger K, Hergenhahn M, Hollstein M, Mansmann U, Grau AJ, Wagner S: Gene expression in human peripheral blood mononuclear cells upon acute ischemic stroke. J Neurol 2008, 255:723-731.
  • [10]Grunblatt E, Bartl J, Zehetmayer S, Ringel TM, Bauer P, Riederer P, Jacob CP: Gene expression as peripheral biomarkers for sporadic Alzheimer's disease. J Alzheimers Dis 2009, 16:627-634.
  • [11]Han M, Liew CT, Zhang HW, Chao S, Zheng R, Yip KT, Song ZY, Li HM, Geng XP, Zhu LX, Lin JJ, Marshall KW, Liew CC: Novel blood-based, five-gene biomarker set for the detection of colorectal cancer. Clin Cancer Res 2008, 14:455-460.
  • [12]Bushel PR, Wolfinger RD, Gibson G: Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes. BMC Syst Biol 2007, 1:15. BioMed Central Full Text
  • [13]Gottschalg E, Moore NE, Ryan AK, Travis LC, Waller RC, Pratt S, Atmaca M, Kind CN, Fry JR: Phenotypic anchoring of arsenic and cadmium toxicity in three hepatic-related cell systems reveals compound- and cell-specific selective up-regulation of stress protein expression: implications for fingerprint profiling of cytotoxicity. Chem Biol Interact 2006, 161:251-261.
  • [14]Paules R: Phenotypic anchoring: linking cause and effect. Environ Health Perspect 2003, 111:A338-A339.
  • [15]Gibson G: The environmental contribution to gene expression profiles. Nat Rev Genet 2008, 9:575-581.
  • [16]Hemminki K, Lorenzo Bermejo J, Forsti A: The balance between heritable and environmental aetiology of human disease. Nat Rev Genet 2006, 7(Hemminki K, Lorenzo Bermejo J, Forsti A):958-965.
  • [17]Gao L, Tsai YJ, Grigoryev DN, Barnes KC: Host defense genes in asthma and sepsis and the role of the environment. Curr Opin Allergy Clin Immunol 2007, 7:459-467.
  • [18]Peden DB: Influences on the development of allergy and asthma. Toxicology 2002, 181–182:323-328.
  • [19]Collins FS: The case for a US prospective cohort study of genes and environment. Nature 2004, 429:475-477.
  • [20]Manolio TA, Collins FS: Genes, environment, health, and disease: facing up to complexity. Hum Hered 2007, 63:63-66.
  • [21]Haldar P, Pavord ID, Shaw DE, Berry MA, Thomas M, Brightling CE, Wardlaw AJ, Green RH: Cluster analysis and clinical asthma phenotypes. Am J Respir Crit Care Med 2008, 178:218-224.
  • [22]Just J, Gouvis-Echraghi R, Couderc R, Guillemot-Lambert N, Saint-Pierre P: Novel severe wheezy young children phenotypes: Boys atopic multiple-trigger and girls nonatopic uncontrolled wheeze. J Allergy Clin Immunol 2012, 130(103–110):e108.
  • [23]Lotvall J, Akdis CA, Bacharier LB, Bjermer L, Casale TB, Custovic A, Lemanske RF Jr, Wardlaw AJ, Wenzel SE, Greenberger PA: Asthma endotypes: a new approach to classification of disease entities within the asthma syndrome. J Allergy Clin Immunol 2011, 127:355-360.
  • [24]Wenzel SE: Asthma: defining of the persistent adult phenotypes. Lancet 2006, 368:804-813.
  • [25]Moore WC, Bleecker ER, Curran-Everett D, Erzurum SC, Ameredes BT, Bacharier L, Calhoun WJ, Castro M, Chung KF, Clark MP, Dweik RA, Fitzpatrick AM, Gaston B, Hew M, Hussain I, Jarjour NN, Israel E, Levy BD, Murphy JR, Peters SP, Teague WG, Meyers DA, Busse WW, Wenzel SE: National Heart LBIsSARP: Characterization of the severe asthma phenotype by the National Heart, Lung, and Blood Institute's Severe Asthma Research Program. J Allergy Clin Immunol 2007, 119:405-413.
  • [26]Cho MH, Washko GR, Hoffmann TJ, Criner GJ, Hoffman EA, Martinez FJ, Laird N, Reilly JJ, Silverman EK: Cluster analysis in severe emphysema subjects using phenotype and genotype data: an exploratory investigation. Respir Res 2010, 11:30. BioMed Central Full Text
  • [27]Bush A, Menzies-Gow A: Phenotypic differences between pediatric and adult asthma. Proc Am Thorac Soc 2009, 6:712-719.
  • [28]Fitzpatrick AM, Teague WG, Meyers DA, Peters SP, Li X, Li H, Wenzel SE, Aujla S, Castro M, Bacharier LB, Gaston BM, Bleecker ER, Moore WC, National Institutes of Health/National Heart L: Heterogeneity of severe asthma in childhood: confirmation by cluster analysis of children in the National Institutes of Health/National Heart, Lung, and Blood Institute Severe Asthma Research Program. J Allergy Clin Immunol 2011, 127:382-389. e381-313
  • [29]Moore WC, Meyers DA, Wenzel SE, Teague WG, Li H, Li X, D'Agostino R Jr, Castro M, Curran-Everett D, Fitzpatrick AM, Gaston B, Jarjour NN, Sorkness R, Calhoun WJ, Chung KF, Comhair SA, Dweik RA, Israel E, Peters SP, Busse WW, Erzurum SC, Bleecker ER: National Heart L, Blood Institute's Severe Asthma Research P: Identification of asthma phenotypes using cluster analysis in the Severe Asthma Research Program. Am J Respir Crit Care Med 2010, 181:315-323.
  • [30]Weatherall M, Travers J, Shirtcliffe PM, Marsh SE, Williams MV, Nowitz MR, Aldington S, Beasley R: Distinct clinical phenotypes of airways disease defined by cluster analysis. Eur Respir J 2009, 34:812-818.
  • [31]Kelley CF, Mannino DM, Homa DM, Savage-Brown A, Holguin F: Asthma phenotypes, risk factors, and measures of severity in a national sample of US children. Pediatrics 2005, 115:726-731.
  • [32]Bhakta NR, Woodruff PG: Human asthma phenotypes: from the clinic, to cytokines, and back again. Immunol Rev 2011, 242:220-232.
  • [33]Woodruff PG, Modrek B, Choy DF, Jia G, Abbas AR, Ellwanger A, Koth LL, Arron JR, Fahy JV: T-helper type 2-driven inflammation defines major subphenotypes of asthma. Am J Respir Crit Care Med 2009, 180:388-395.
  • [34]Woodruff PG, Boushey HA, Dolganov GM, Barker CS, Yang YH, Donnelly S, Ellwanger A, Sidhu SS, Dao-Pick TP, Pantoja C, Erle DJ, Yamamoto KR, Fahy JV: Genome-wide profiling identifies epithelial cell genes associated with asthma and with treatment response to corticosteroids. Proc Natl Acad Sci USA 2007, 104:15858-15863.
  • [35]Choy DF, Modrek B, Abbas AR, Kummerfeld S, Clark HF, Wu LC, Fedorowicz G, Modrusan Z, Fahy JV, Woodruff PG, Arron JR: Gene expression patterns of Th2 inflammation and intercellular communication in asthmatic airways. J Immunol 2011, 186:1861-1869.
  • [36]Baines KJ, Simpson JL, Wood LG, Scott RJ, Gibson PG: Transcriptional phenotypes of asthma defined by gene expression profiling of induced sputum samples. J Allergy Clin Immunol 2011, 127:153-160. 160 e151-159
  • [37]Bjornsdottir US, Holgate ST, Reddy PS, Hill AA, McKee CM, Csimma CI, Weaver AA, Legault HM, Small CG, Ramsey RC, Ellis DK, Burke CM, Thompson PJ, Howarth PH, Wardlaw AJ, Bardin PG, Bernstein DI, Irving LB, Chupp GL, Bensch GW, Stahlman JE, Karetzky M, Baker JW, Miller RL, Goodman BH, Raible DG, Goldman SJ, Miller DK, Ryan JL, Dorner AJ, et al.: Pathways activated during human asthma exacerbation as revealed by gene expression patterns in blood. PLoS One 2011, 6:e21902.
  • [38]Verrills NM, Irwin JA, He XY, Wood LG, Powell H, Simpson JL, McDonald VM, Sim A, Gibson PG: Identification of novel diagnostic biomarkers for asthma and chronic obstructive pulmonary disease. Am J Respir Crit Care Med 2011, 183:1633-1643.
  • [39]Gallagher J, Hudgens E, Williams A, Inmon J, Rhoney S, Andrews G, Reif D, Heidenfelder B, Neas L, Williams R, Johnson M, Ozkaynak H, Edwards S, Hubal EC: Mechanistic indicators of childhood asthma (MICA) study: piloting an integrative design for evaluating environmental health. BMC Public Health 2011, 11:344. BioMed Central Full Text
  • [40]Hubbell E, Liu WM, Mei R: Robust estimators for expression analysis. Bioinformatics 2002, 18:1585-1592.
  • [41]Kaufman LR, Rousseeuw PJ: Finding Groups in Data: an introduction to cluster analysis. In Wiley Series in Probability and Statistics. Hoboken NJ: John Wiley & Sons, Inc; 2005.
  • [42]Hubert L: Approximate evaluation technique for the single-link and complete-link hierarchical clustering procedures. J Am Stat Assoc 1974, 69:698-704.
  • [43]Milligan GW, Cooper MC: An examination of procedures of determining the number of cluster in a data set. Psychometrika 1985, 50:159-179.
  • [44]Bushel PR: Clustering of gene expression data and end-point measurements by simulated annealing. J Bioinform Comput Biol 2009, 7:193-215.
  • [45]Quinlan JR: C4.5: programs for machine learning. In Morgan Kaufmann Series in Machine Learning. San Mateo, CA: Morgan Kaufmann Publishers, Inc; 1993.
  • [46]Breiman L (Ed): Random Forests. The Netherlands: Kluwer Academic Publishers; 2001.
  • [47]Anderson GP: Endotyping asthma: new insights into key pathogenic mechanisms in a complex, heterogeneous disease. Lancet 2008, 372:1107-1119.
  • [48]Wenzel S: Severe asthma: from characteristics to phenotypes to endotypes. Clin Exp Allergy 2012, 42:650-658.
  • [49]Leonardson AS, Zhu J, Chen Y, Wang K, Lamb JR, Reitman M, Emilsson V, Schadt EE: The effect of food intake on gene expression in human peripheral blood. Hum Mol Genet 2010, 19:159-169.
  • [50]Beyene J, Tritchler D, Bull SB, Cartier KC, Jonasdottir G, Kraja AT, Li N, Nock NL, Parkhomenko E, Rao JS, Stein CM, Sutradhar R, Waaijenborg S, Wang KS, Wang Y, Wolkow P: Multivariate analysis of complex gene expression and clinical phenotypes with genetic marker data. Genet Epidemiol 2007, 31(Suppl 1):S103-S109.
  • [51]Consortium WTCC: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 2007, 447:661-678.
  • [52]Ehret GB, Morrison AC, O'Connor AA, Grove ML, Baird L, Schwander K, Weder A, Cooper RS, Rao DC, Hunt SC, Boerwinkle E, Chakravarti A: Replication of the Wellcome Trust genome-wide association study of essential hypertension: the Family Blood Pressure Program. Eur J Hum Genet 2008, 16:1507-1511.
  • [53]Eichler EE, Flint J, Gibson G, Kong A, Leal SM, Moore JH, Nadeau JH: Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 2010, 11:446-450.
  • [54]Lettre G, Palmer CD, Young T, Ejebe KG, Allayee H, Benjamin EJ, Bennett F, Bowden DW, Chakravarti A, Dreisbach A, Farlow DN, Folsom AR, Fornage M, Forrester T, Fox E, Haiman CA, Hartiala J, Harris TB, Hazen SL, Heckbert SR, Henderson BE, Hirschhorn JN, Keating BJ, Kritchevsky SB, Larkin E, Li M, Rudock ME, McKenzie CA, Meigs JB, Meng YA, et al.: Genome-wide association study of coronary heart disease and its risk factors in 8,090 African Americans: the NHLBI CARe Project. PLoS Genet 2011, 7:e1001300.
  • [55]Hancox RJ, Le Souef PN, Anderson GP, Reddel HK, Chang AB, Beasley R: Asthma: time to confront some inconvenient truths. Respirology 2010, 15:194-201.
  • [56]Schadt EE: Molecular networks as sensors and drivers of common human diseases. Nature 2009, 461:218-223.
  • [57]Rappaport SM: Implications of the exposome for exposure science. J Expo Sci Environ Epidemiol 2011, 21:5-9.
  • [58]Wild CP: Complementing the genome with an "exposome": the outstanding challenge of environmental exposure measurement in molecular epidemiology. Cancer Epidemiol Biomarkers Prev 2005, 14:1847-1850.
  • [59]Wild CP: Environmental exposure measurement in cancer epidemiology. Mutagenesis 2009, 24:117-125.
  • [60]Rappaport SM, Smith MT: Epidemiology Environment and disease risks. Science 2010, 330:460-461.
  • [61]Wang Z, Klipfell E, Bennett BJ, Koeth R, Levison BS, Dugar B, Feldstein AE, Britt EB, Fu X, Chung YM, Wu Y, Schauer P, Smith JD, Allayee H, Tang WH, DiDonato JA, Lusis AJ, Hazen SL: Gut flora metabolism of phosphatidylcholine promotes cardiovascular disease. Nature 2011, 472:57-63.
  • [62]Patel CJ, Bhattacharya J, Butte AJ: An Environment-Wide Association Study (EWAS) on type 2 diabetes mellitus. PLoS One 2010, 5:e10746.
  • [63]Rappaport SM, Li H, Grigoryan H, Funk WE, Williams ER: Adductomics: Characterizing exposures to reactive electrophiles. Toxicol Lett 2011,  : .
  • [64]Workshop on Emerging Technologies for Measuring Individual Expsomes: Emerging Science for Environmental Health Decisions. December 8–9, Washington, DC. http://nas-sites.org/emergingscience/workshops/individual-exposomes/ webcite
  • [65]Gibson PG: Inflammatory phenotypes in adult asthma: clinical applications. Clin Respir J 2009, 3:198-206.
  • [66]Bansal M, Belcastro V, Ambesi-Impiombato A, di Bernardo D: How to infer gene networks from expression profiles. Mol Syst Biol 2007, 3:78.
  • [67]Marbach D, Prill RJ, Schaffter T, Mattiussi C, Floreano D, Stolovitzky G: Revealing strengths and weaknesses of methods for gene network inference. Proc Natl Acad Sci USA 2010, 107:6286-6291.
  • [68]Friedman N, Linial M, Nachman I, Pe'er D: Using Bayesian networks to analyze expression data. J Comput Biol 2000, 7:601-620.
  • [69]Luo W, Woolf PJ: Reconstructing transcriptional regulatory networks using three-way mutual information and Bayesian networks. Methods Mol Biol 2010, 674:401-418.
  • [70]Millstein J, Zhang B, Zhu J, Schadt EE: Disentangling molecular relationships with a causal inference test. BMC Genet 2009, 10:23.
  • [71]Maucher M, Kracher B, Kuhl M, Kestler HA: Inferring Boolean network structure via correlation. Bioinformatics 2011, 27:1529-1536.
  • [72]Mehra S, Hu WS, Karypis G: A Boolean algorithm for reconstructing the structure of regulatory networks. Metab Eng 2004, 6:326-339.
  • [73]Nagaraj SH, Reverter A: A Boolean-based systems biology approach to predict novel genes associated with cancer: application to colorectal cancer. BMC Syst Biol 2011, 5:35. BioMed Central Full Text
  • [74]Madar A, Greenfield A, Vanden-Eijnden E, Bonneau R: DREAM3: network inference using dynamic context likelihood of relatedness and the inferelator. PLoS One 2010, 5:e9803.
  • [75]Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A: ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinforma 2006, 7(1):S7. BioMed Central Full Text
  • [76]Meyer PE, Kontos K, Lafitte F, Bontempi G: Information-theoretic inference of large transcriptional regulatory networks. EURASIP J Bioinform Syst Biol 2007, 2007:79879.
  • [77]Gao S, Wang X: Quantitative utilization of prior biological knowledge in the Bayesian network modeling of gene expression data. BMC Bioinforma 2011, 12:359. BioMed Central Full Text
  • [78]Lo K, Raftery AE, Dombek KM, Zhu J, Schadt EE, Bumgarner RE, Yeung KY: Integrating external biological knowledge in the construction of regulatory networks from time-series expression data. BMC Syst Biol 2012, 6:101. BioMed Central Full Text
  • [79]Zhu J, Wiener MC, Zhang C, Fridman A, Minch E, Lum PY, Sachs JR, Schadt EE: Increasing the power to detect causal associations by combining genotypic and expression data in segregating populations. PLoS Comput Biol 2007, 3:e69.
  文献评价指标  
  下载次数:22次 浏览次数:13次