期刊论文

【摘要】

BackgroundIn order to find genetic and metabolic pathways related to phenotypic traits of interest, we analyzed gene expression data, metabolite data obtained with GC-MS and LC-MS, proteomics data and a selected set of tuber quality phenotypic data from a diploid segregating mapping population of potato. In this study we present an approach to integrate these ~ omics data sets for the purpose of predicting phenotypic traits. This gives us networks of relatively small sets of interrelated ~ omics variables that can predict, with higher accuracy, a quality trait of interest.ResultsWe used Random Forest regression for integrating multiple ~ omics data for prediction of four quality traits of potato: tuber flesh colour, DSC onset, tuber shape and enzymatic discoloration. For tuber flesh colour beta-carotene hydroxylase and zeaxanthin epoxidase were ranked first and forty-fourth respectively both of which have previously been associated with flesh colour in potato tubers. Combining all the significant genes, LC-peaks, GC-peaks and proteins, the variation explained was 75 %, only slightly more than what gene expression or LC-MS data explain by themselves which indicates that there are correlations among the variables across data sets. For tuber shape regressed on the gene expression, LC-MS, GC-MS and proteomics data sets separately, only gene expression data was found to explain significant variation. For DSC onset, we found 12 significant gene expression, 5 metabolite levels (GC) and 2 proteins that are associated with the trait. Using those 19 significant variables, the variation explained was 45 %. Expression QTL (eQTL) analyses showed many associations with genomic regions in chromosome 2 with also the highest explained variation compared to other chromosomes. Transcriptomics and metabolomics analysis on enzymatic discoloration after 5 min resulted in 420 significant genes and 8 significant LC metabolites, among which two were putatively identified as caffeoylquinic acid methyl ester and tyrosine.ConclusionsIn this study, we made a strategy for selecting and integrating multiple ~ omics data using random forest method and selected representative individual peaks for networks based on eQTL, mQTL or pQTL information. Network analysis was done to interpret how a particular trait is associated with gene expression, metabolite and protein data.

【授权许可】

CC BY
© Acharjee et al. 2016

【预览】

附件列表
Files	Size	Format	View
RO202311109699316ZK.pdf	1092KB	PDF	download

【参考文献】

[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]

BMC Bioinformatics
Integration of multi-omics data for prediction of phenotypic traits using random forest
Research
Chris Maliepaard¹ Richard G. F. Visser¹ Bjorn Kloosterman² Animesh Acharjee³
[1]Wageningen UR Plant Breeding, Wageningen University & Research Centre, PO Box 6700 AJ, Wageningen, The Netherlands
[2]Wageningen UR Plant Breeding, Wageningen University & Research Centre, PO Box 6700 AJ, Wageningen, The Netherlands
[3]Keygene NV, PO Box 216, 6700 AE, Wageningen, The Netherlands
[4]Wageningen UR Plant Breeding, Wageningen University & Research Centre, PO Box 6700 AJ, Wageningen, The Netherlands
[5]MRC Human Nutrition Research, 120 Fulbourn Road, CB1 9NL, Cambridge, UK
关键词: Data integration; Genetical genomics; Networks; Random forest;
DOI : 10.1186/s12859-016-1043-4
来源: Springer
PDF


	文献评价指标
	下载次数：0次	浏览次数：0次

【 摘 要 】

【 授权许可】

【 预 览 】

【 参考文献 】

【摘要】

【授权许可】

【预览】

【参考文献】