期刊论文详细信息
BMC Nephrology
Using a multi-staged strategy based on machine learning and mathematical modeling to predict genotype-phenotype risk patterns in diabetic kidney disease: a prospective case–control cohort analysis
Juliana CN Chan3  Stephen KW Tsui1  Wing Yee So4  Maggie Ng4  Vincent Lam4  Andrea OY Luk4  Ronald CW Ma4  Ying Wang4  Ross KK Leung2 
[1] School of Biomedical Sciences, Hong Kong, China;Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong, Hong Kong, SAR, China;Hong Kong Institute of Diabetes and Obesity, The Chinese University of Hong Kong, The Prince of Wales Hospital, Shatin New Territories, Hong Kong SAR, China;Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Hong Kong, SAR, China
关键词: Support vector machine;    Random forest;    Machine learning;    Phenotypes;    Genotypes;    Diabetic kidney disease;    Prediction;   
Others  :  1082877
DOI  :  10.1186/1471-2369-14-162
 received in 2012-04-24, accepted in 2013-07-18,  发布年份 2013
PDF
【 摘 要 】

Background

Multi-causality and heterogeneity of phenotypes and genotypes characterize complex diseases. In a database with comprehensive collection of phenotypes and genotypes, we compared the performance of common machine learning methods to generate mathematical models to predict diabetic kidney disease (DKD).

Methods

In a prospective cohort of type 2 diabetic patients, we selected 119 subjects with DKD and 554 without DKD at enrolment and after a median follow-up period of 7.8 years for model training, testing and validation using seven machine learning methods (partial least square regression, the classification and regression tree, the C5.0 decision tree, random forest, naïve Bayes classification, neural network and support vector machine). We used 17 clinical attributes and 70 single nucleotide polymorphisms (SNPs) of 54 candidate genes to build different models. The top attributes selected by the best-performing models were then used to build models with performance comparable to those using the entire dataset.

Results

Age, age of diagnosis, systolic blood pressure and genetic polymorphisms of uteroglobin and lipid metabolism were selected by most methods. Models generated by support vector machine (svmRadial) and random forest (cforest) had the best prediction accuracy whereas models derived from naïve Bayes classifier and partial least squares regression had the least optimal performance. Using 10 clinical attributes (systolic and diastolic blood pressure, age, age of diagnosis, triglyceride, white blood cell count, total cholesterol, waist to hip ratio, LDL cholesterol, and alcohol intake) and 5 genetic attributes (UGB G38A, LIPC -514C > T, APOB Thr71Ile, APOC3 3206T > G and APOC3 1100C > T), selected most often by SVM and cforest, we were able to build high-performance models.

Conclusions

Amongst different machine learning methods, svmRadial and cforest had the best performance. Genetic polymorphisms related to inflammation and lipid metabolism warrant further investigation for their associations with DKD.

【 授权许可】

   
2013 Leung et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20141224184935472.pdf 673KB PDF download
Figure 3. 85KB Image download
Figure 2. 33KB Image download
Figure 1. 50KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

【 参考文献 】
  • [1]Luk AO, So WY, Ma RC, Kong AP, Ozaki R, Ng VS, Yu LW, Lau WW, Yang X, Chow FC, Chan JC, Tong PC: Metabolic syndrome predicts new onset of chronic kidney disease in 5,829 patients with type 2 diabetes: a 5-year prospective analysis of the Hong Kong Diabetes Registry. Diabetes Care 2008, 31:2357-2361.
  • [2]Freedman BI, Bostrom M, Daeihagh P, Bowden DW: Genetic factors in diabetic nephropathy. Clin J Am Soc Nephrol 2007, 2:1306-1316.
  • [3]Liu Y, Freedman BI: Genetics of progressive renal failure in diabetic kidney disease. Kidney Int Suppl 2005, 99:S94-S97.
  • [4]Schork NJ, Murray SS, Frazer KA, Topol EJ: Common vs. rare allele hypotheses for complex diseases. Curr Opin Genet Dev 2009, 19:212-219.
  • [5]Yang Q, Khoury MJ, Friedman JM, Little J, Flanders WD: How many genes underlie the occurrence of common complex diseases in the population? Int J Epidemiol 2005, 34:1129-1137.
  • [6]Cordell HJ: Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet 2009, 10:392-404.
  • [7]Lusis AJ, Attie AD, Reue K: Metabolic syndrome: from epidemiology to systems biology. Nat Rev Genet 2008, 9:819-830.
  • [8]Szymczak S, Biernacka JM, Cordell HJ, González-Recio O, König IR, Zhang H, Sun YV: Machine learning in genome-wide association studies. Genet Epidemiol 2009, 33:S51-S57.
  • [9]Yang XL, So WY, Kong AP, Clarke P, Ho CS, Lam CW, Ng MH, Lyu RR, Yin DD, Chow CC, Cockram CS, Tong PC, Chan JC: End-stage renal disease risk equations for Hong Kong Chinese patients with type 2 diabetes: Hong Kong Diabetes Registry. Diabetologia 2006, 49:2299-2308.
  • [10]Ma YC, Zuo L, Chen JH, Luo Q, Yu XQ, Li Y, Xu JS, Huang SM, Wang LN, Huang W, Wang M, Xu GB, Wang HY, behalf of the Chinese e GFRIC: Modified glomerular filtration rate estimating equation for Chinese aatients with chronic kidney disease. J Am Soc Nephrol 2006, 17:2937-2944.
  • [11]Yang X, So WY, Tong PCY, Ma RCW, Kong APS, Lam CWK, Ho CS, Cockram CS, Ko GTC, Chow C-C, Wong VCW, Chan JCN: Development and validation of an all-cause mortality risk score in Type 2 diabetes: The Hong Kong Diabetes Registry. Arch Intern Med 2008, 168:451-457.
  • [12]Cheng S, Grow MA, Pallaud C, Klitz W, Erlich HA, Visvikis S, Chen JJ, Pullinger CR, Malloy MJ, Siest G, Kane JP: A multilocus genotyping assay for candidate markers of cardiovascular disease risk. Genome Res 1999, 9:936-949.
  • [13]Zee RY, Cook NR, Cheng S, Erlich HA, Lindpaintner K, Ridker PM: Multi-locus candidate gene polymorphisms and risk of myocardial infarction: a population-based, prospective genetic analysis. J Thromb Haemost 2006, 4:341-348.
  • [14]Wang Y, Ng MCY, Lee S-C, So W-Y, Tong PCY, Cockram CS, Critchley JAJH, Chan JCN: Phenotypic heterogeneity and associations of two aldose reductase gene polymorphisms with nephropathy and retinopathy in Type 2 diabetes. Diabetes Care 2003, 26:2410-2415.
  • [15]Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP: SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res 2002, 16:321-357.
  • [16]Strobl C, Boulesteix AL, Kneib T, Augustin T, Zeileis A: Conditional variable importance for random forests. BMC Bioinforma 2008, 11:307.
  • [17]Chong IG, Jun CH: Performance of some variable selection methods when multicollinearity is present. Chemometr Intell Lab 2005, 78:103-112.
  • [18]Tong PC, Lee KF, So WY, Ng MH, Chan WB, Lo MK, Chan NN, Chan JC: White blood cell count is associated with macro- and microvascular complications in chinese patients with type 2 diabetes. Diabetes Care 2004, 27:216-222.
  • [19]Wong CK, Ho AWY, Tong PCY, Yeung CY, Kong APS, Lun SWM, Chan JCN, Lam CWK: Aberrant activation profile of cytokines and mitogen-activated protein kinases in type 2 diabetic patients with nephropathy. Clin Exp Immunol 2007, 149:123-131.
  • [20]Wong CK, Ho AY, Tong PY, Yeung CY, Chan JN, Kong AS, Lam CK: Aberrant expression of soluble co-stimulatory molecules and adhesion molecules in type 2 diabetic patients with nephropathy. J Clin Immunol 2008, 28:36-43.
  • [21]Baum L, Ng MC, So WY, Lam VK, Wang Y, Poon E, Tomlinson B, Cheng S, Lindpaintner K, Chan JC: Effect of hepatic lipase -514C- > T polymorphism and its interactions with apolipoprotein C3–482C- > T and apolipoprotein E exon 4 polymorphisms on the risk of nephropathy in chinese type 2 diabetic patients. Diabetes Care 2005, 28:1704-1709.
  • [22]Wang Y, Luk AO, Ma RC, So WY, Tam CH, Ng MC, Yang X, Lam V, Tong PC, Chan JC: Predictive role of multilocus genetic polymorphisms in cardiovascular disease and inflammation-related genes on chronic kidney disease in Type 2 diabetes–an 8-year prospective cohort analysis of 1163 patients. Nephrol Dial Transplant 2012, 27:190-196.
  • [23]Mukherjee AB, Kundu GC, Mantile-Selvaggi G, Yuan CJ, Mandal AK, Chattopadhyay S, Zheng F, Pattabiraman N, Zhang Z: Uteroglobin: a novel cytokine? Cell Mol Life Sci 1999, 55:771-787.
  • [24]Narita I, Saito N, Goto S, Jin S, Omori K, Sakatsume M, Gejyo F: Role of uteroglobin G38A polymorphism in the progression of IgA nephropathy in Japanese patients. Kidney Int 2002, 61:1853-1858.
  • [25]Luk AO, Yang X, Ma RC, Ng VW, Yu LW, Lau WW, Ozaki R, Chow FC, Kong AP, Tong PC, Chan JC, So W: Association of statin use and development of renal dysfunction in type 2 diabetes–the Hong Kong Diabetes Registry. Diabetes Res Clin Pract 2010, 88:227-233.
  • [26]Thomas G, Sehgal AR, Kashyap SR, Srinivas TR, Kirwan JP, Navaneethan SD: Metabolic syndrome and kidney disease: a systematic review and meta-analysis. Clin J Am Soc Nephrol 2011, 6:2364-2373.
  • [27]Pistrosch F, Herbrig K, Kindel B, Passauer J, Fischer S, Gross P: Rosiglitazone improves glomerular hyperfiltration, renal endothelial dysfunction, and microalbuminuria of incipient diabetic nephropathy in patients. Diabetes 2005, 54:2206-2211.
  • [28]Davis TM, Ting R, Best JD, Donoghoe MW, Drury PL, Sullivan DR, Jenkins AJ, O'Connell RL, Whiting MJ, Glasziou PP, Simes RJ, Kesäniemi YA, Gebski VJ, Scott RS, Keech AC: Effects of fenofibrate on renal function in patients with type 2 diabetes mellitus: the Fenofibrate Intervention and Event Lowering in Diabetes (FIELD) Study. Diabetologia 2011, 54:280-290.
  文献评价指标  
  下载次数:64次 浏览次数:26次