期刊论文详细信息
BMC Bioinformatics | |
Integration of multi-omics data for prediction of phenotypic traits using random forest | |
Research | |
Chris Maliepaard1  Richard G. F. Visser1  Bjorn Kloosterman2  Animesh Acharjee3  | |
[1]Wageningen UR Plant Breeding, Wageningen University & Research Centre, PO Box 6700 AJ, Wageningen, The Netherlands | |
[2]Wageningen UR Plant Breeding, Wageningen University & Research Centre, PO Box 6700 AJ, Wageningen, The Netherlands | |
[3]Keygene NV, PO Box 216, 6700 AE, Wageningen, The Netherlands | |
[4]Wageningen UR Plant Breeding, Wageningen University & Research Centre, PO Box 6700 AJ, Wageningen, The Netherlands | |
[5]MRC Human Nutrition Research, 120 Fulbourn Road, CB1 9NL, Cambridge, UK | |
关键词: Data integration; Genetical genomics; Networks; Random forest; | |
DOI : 10.1186/s12859-016-1043-4 | |
来源: Springer | |
【 摘 要 】
BackgroundIn order to find genetic and metabolic pathways related to phenotypic traits of interest, we analyzed gene expression data, metabolite data obtained with GC-MS and LC-MS, proteomics data and a selected set of tuber quality phenotypic data from a diploid segregating mapping population of potato. In this study we present an approach to integrate these ~ omics data sets for the purpose of predicting phenotypic traits. This gives us networks of relatively small sets of interrelated ~ omics variables that can predict, with higher accuracy, a quality trait of interest.ResultsWe used Random Forest regression for integrating multiple ~ omics data for prediction of four quality traits of potato: tuber flesh colour, DSC onset, tuber shape and enzymatic discoloration. For tuber flesh colour beta-carotene hydroxylase and zeaxanthin epoxidase were ranked first and forty-fourth respectively both of which have previously been associated with flesh colour in potato tubers. Combining all the significant genes, LC-peaks, GC-peaks and proteins, the variation explained was 75 %, only slightly more than what gene expression or LC-MS data explain by themselves which indicates that there are correlations among the variables across data sets. For tuber shape regressed on the gene expression, LC-MS, GC-MS and proteomics data sets separately, only gene expression data was found to explain significant variation. For DSC onset, we found 12 significant gene expression, 5 metabolite levels (GC) and 2 proteins that are associated with the trait. Using those 19 significant variables, the variation explained was 45 %. Expression QTL (eQTL) analyses showed many associations with genomic regions in chromosome 2 with also the highest explained variation compared to other chromosomes. Transcriptomics and metabolomics analysis on enzymatic discoloration after 5 min resulted in 420 significant genes and 8 significant LC metabolites, among which two were putatively identified as caffeoylquinic acid methyl ester and tyrosine.ConclusionsIn this study, we made a strategy for selecting and integrating multiple ~ omics data using random forest method and selected representative individual peaks for networks based on eQTL, mQTL or pQTL information. Network analysis was done to interpret how a particular trait is associated with gene expression, metabolite and protein data.【 授权许可】
CC BY
© Acharjee et al. 2016
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311109699316ZK.pdf | 1092KB | download |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]
- [37]
- [38]
- [39]