期刊论文详细信息
BMC Bioinformatics
Integration of multi-omics data for prediction of phenotypic traits using random forest
Research
Chris Maliepaard1  Richard G. F. Visser1  Bjorn Kloosterman2  Animesh Acharjee3 
[1]Wageningen UR Plant Breeding, Wageningen University & Research Centre, PO Box 6700 AJ, Wageningen, The Netherlands
[2]Wageningen UR Plant Breeding, Wageningen University & Research Centre, PO Box 6700 AJ, Wageningen, The Netherlands
[3]Keygene NV, PO Box 216, 6700 AE, Wageningen, The Netherlands
[4]Wageningen UR Plant Breeding, Wageningen University & Research Centre, PO Box 6700 AJ, Wageningen, The Netherlands
[5]MRC Human Nutrition Research, 120 Fulbourn Road, CB1 9NL, Cambridge, UK
关键词: Data integration;    Genetical genomics;    Networks;    Random forest;   
DOI  :  10.1186/s12859-016-1043-4
来源: Springer
PDF
【 摘 要 】
BackgroundIn order to find genetic and metabolic pathways related to phenotypic traits of interest, we analyzed gene expression data, metabolite data obtained with GC-MS and LC-MS, proteomics data and a selected set of tuber quality phenotypic data from a diploid segregating mapping population of potato. In this study we present an approach to integrate these ~ omics data sets for the purpose of predicting phenotypic traits. This gives us networks of relatively small sets of interrelated ~ omics variables that can predict, with higher accuracy, a quality trait of interest.ResultsWe used Random Forest regression for integrating multiple ~ omics data for prediction of four quality traits of potato: tuber flesh colour, DSC onset, tuber shape and enzymatic discoloration. For tuber flesh colour beta-carotene hydroxylase and zeaxanthin epoxidase were ranked first and forty-fourth respectively both of which have previously been associated with flesh colour in potato tubers. Combining all the significant genes, LC-peaks, GC-peaks and proteins, the variation explained was 75 %, only slightly more than what gene expression or LC-MS data explain by themselves which indicates that there are correlations among the variables across data sets. For tuber shape regressed on the gene expression, LC-MS, GC-MS and proteomics data sets separately, only gene expression data was found to explain significant variation. For DSC onset, we found 12 significant gene expression, 5 metabolite levels (GC) and 2 proteins that are associated with the trait. Using those 19 significant variables, the variation explained was 45 %. Expression QTL (eQTL) analyses showed many associations with genomic regions in chromosome 2 with also the highest explained variation compared to other chromosomes. Transcriptomics and metabolomics analysis on enzymatic discoloration after 5 min resulted in 420 significant genes and 8 significant LC metabolites, among which two were putatively identified as caffeoylquinic acid methyl ester and tyrosine.ConclusionsIn this study, we made a strategy for selecting and integrating multiple ~ omics data using random forest method and selected representative individual peaks for networks based on eQTL, mQTL or pQTL information. Network analysis was done to interpret how a particular trait is associated with gene expression, metabolite and protein data.
【 授权许可】

CC BY   
© Acharjee et al. 2016

【 预 览 】
附件列表
Files Size Format View
RO202311109699316ZK.pdf 1092KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  文献评价指标  
  下载次数:0次 浏览次数:0次