期刊论文详细信息
Frontiers in Genetics
Using Machine Learning to Predict Obesity Based on Genome-Wide and Epigenome-Wide Gene–Gene and Gene–Diet Interactions
José M. Ordovás2  Jacob J. Christensen3  Nicola M. McKeown4  Caren E. Smith5  Jonathan Shao7  Yu-Chi Lee8  Laurence D. Parnell8  Chao-Qiang Lai8 
[1] CEI UAM + CSIC, IMDEA Food Institute, Madrid, Spain;Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, Spain;Department of Nutrition, Norwegian National Advisory Unit on FH, Oslo University Hospital, University of Oslo, Oslo, Norway;Friedman School of Nutrition Science and Policy, Tufts University, Boston, MA, United States;Nutrition and Genomics Laboratory, JM-USDA Human Nutrition Research Center on Aging at Tufts University, Boston, MA, United States;Nutritional Epidemiology Laboratory, JM-USDA Human Nutrition Research Center on Aging at Tufts University, Boston, MA, United States;Statistical and Bioinformatics Group, Northeast Area, USDA ARS, Beltsville, MD, United States;USDA ARS, Nutrition and Genomics Laboratory, JM-USDA Human Nutrition Research Center on Aging at Tufts University, Boston, MA, United States;
关键词: obesity;    machine learning;    genomics;    DNA methylation;    diet;    GxE interaction;   
DOI  :  10.3389/fgene.2021.783845
来源: DOAJ
【 摘 要 】

Obesity is associated with many chronic diseases that impair healthy aging and is governed by genetic, epigenetic, and environmental factors and their complex interactions. This study aimed to develop a model that predicts an individual’s risk of obesity by better characterizing these complex relations and interactions focusing on dietary factors. For this purpose, we conducted a combined genome-wide and epigenome-wide scan for body mass index (BMI) and up to three-way interactions among 402,793 single nucleotide polymorphisms (SNPs), 415,202 DNA methylation sites (DMSs), and 397 dietary and lifestyle factors using the generalized multifactor dimensionality reduction (GMDR) method. The training set consisted of 1,573 participants in exam 8 of the Framingham Offspring Study (FOS) cohort. After identifying genetic, epigenetic, and dietary factors that passed statistical significance, we applied machine learning (ML) algorithms to predict participants’ obesity status in the test set, taken as a subset of independent samples (n = 394) from the same cohort. The quality and accuracy of prediction models were evaluated using the area under the receiver operating characteristic curve (ROC-AUC). GMDR identified 213 SNPs, 530 DMSs, and 49 dietary and lifestyle factors as significant predictors of obesity. Comparing several ML algorithms, we found that the stochastic gradient boosting model provided the best prediction accuracy for obesity with an overall accuracy of 70%, with ROC-AUC of 0.72 in test set samples. Top predictors of the best-fit model were 21 SNPs, 230 DMSs in genes such as CPT1A, ABCG1, SLC7A11, RNF145, and SREBF1, and 26 dietary factors, including processed meat, diet soda, French fries, high-fat dairy, artificial sweeteners, alcohol intake, and specific nutrients and food components, such as calcium and flavonols. In conclusion, we developed an integrated approach with ML to predict obesity using omics and dietary data. This extends our knowledge of the drivers of obesity, which can inform precision nutrition strategies for the prevention and treatment of obesity.Clinical Trial Registration: [www.ClinicalTrials.gov], the Framingham Heart Study (FHS), [NCT00005121].

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:0次