期刊论文详细信息
BMC Bioinformatics
A multivariate approach to the integration of multi-omics datasets
Chen Meng1  Bernhard Kuster3  Aedín C Culhane2  Amin Moghaddas Gholami1 
[1] Chair of Proteomics and Bioanalytics, Technische Universität München, Freising, Germany
[2] Department of Biostatistics, Harvard School of Public Health, Boston, MA 02215, USA
[3] Center for Integrated Protein Science Munich, Freising, Germany
关键词: Visualization;    Omic data;    Data integration;    Multiple co-inertia;    Multivariate analysis;   
Others  :  818521
DOI  :  10.1186/1471-2105-15-162
 received in 2014-01-22, accepted in 2014-05-14,  发布年份 2014
PDF
【 摘 要 】

Background

To leverage the potential of multi-omics studies, exploratory data analysis methods that provide systematic integration and comparison of multiple layers of omics information are required. We describe multiple co-inertia analysis (MCIA), an exploratory data analysis method that identifies co-relationships between multiple high dimensional datasets. Based on a covariance optimization criterion, MCIA simultaneously projects several datasets into the same dimensional space, transforming diverse sets of features onto the same scale, to extract the most variant from each dataset and facilitate biological interpretation and pathway analysis.

Results

We demonstrate integration of multiple layers of information using MCIA, applied to two typical “omics” research scenarios. The integration of transcriptome and proteome profiles of cells in the NCI-60 cancer cell line panel revealed distinct, complementary features, which together increased the coverage and power of pathway analysis. Our analysis highlighted the importance of the leukemia extravasation signaling pathway in leukemia that was not highly ranked in the analysis of any individual dataset. Secondly, we compared transcriptome profiles of high grade serous ovarian tumors that were obtained, on two different microarray platforms and next generation RNA-sequencing, to identify the most informative platform and extract robust biomarkers of molecular subtypes. We discovered that the variance of RNA-sequencing data processed using RPKM had greater variance than that with MapSplice and RSEM. We provided novel markers highly associated to tumor molecular subtype combined from four data platforms. MCIA is implemented and available in the R/Bioconductor “omicade4” package.

Conclusion

We believe MCIA is an attractive method for data integration and visualization of several datasets of multi-omics features observed on the same set of individuals. The method is not dependent on feature annotation, and thus it can extract important features even when there are not present across all datasets. MCIA provides simple graphical representations for the identification of relationships between large datasets.

【 授权许可】

   
2014 Meng et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140711105112995.pdf 2337KB PDF download
Figure 4. 67KB Image download
Figure 3. 72KB Image download
Figure 2. 72KB Image download
Figure 1. 76KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

【 参考文献 】
  • [1]Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 2009, 10(1):57-63.
  • [2]Ozsolak F, Milos P: RNA sequencing: advances, challenges and opportunities. Nat Rev Genet 2011, 12(2):87-98.
  • [3]Mallick P, Kuster B: Proteomics: a pragmatic perspective. Nat Biotechnol 2010, 28(7):695-709.
  • [4]Aebersold R, Mann M: Mass spectrometry-based proteomics. Nature 2003, 422(6928):198-207.
  • [5]Cancer Genome Atlas N: Comprehensive molecular portraits of human breast tumours. Nature 2012, 490(7418):61-70.
  • [6]Cancer Genome Atlas Research N: Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 2008, 455(7216):1061-1068.
  • [7]Rosenbloom K, Dreszer T, Long J, Malladi V, Sloan C, Raney B, Cline M, Karolchik D, Barber G, Clawson H, Diekhans M, Fujita P, Goldman M, Gravell R, Harte R, Hinrichs A, Kirkup V, Kuhn R, Learned K, Maddren M, Meyer L, Pohl A, Rhead B, Wong M, Zweig A, Haussler D, Kent W: ENCODE whole-genome data in the UCSC genome browser: update 2012. Nucleic Acids Res 2012, 40(Database issue):7.
  • [8]Liu H, D'Andrade P, Fulmer-Smentek S, Lorenzi P, Kohn K, Weinstein J, Pommier Y, Reinhold W: mRNA and microRNA expression profiles of the NCI-60 integrated with drug activities. Mol Cancer Ther 2010, 9(5):1080-1091.
  • [9]Beck M, Schmidt A, Malmstroem J, Claassen M, Ori A, Szymborska A, Herzog F, Rinner O, Ellenberg J, Aebersold R: The quantitative proteome of a human cell line. Mol Syst Biol 2011, 7:549.
  • [10]Nagaraj N, Wisniewski JR, Geiger T, Cox J, Kircher M, Kelso J, Paabo S, Mann M: Deep proteome and transcriptome mapping of a human cancer cell line. Mol Syst Biol 2011, 7:548.
  • [11]Moghaddas Gholami A, Hahne H, Wu Z, Auer FJ, Meng C, Wilhelm M, Kuster B: Global proteome analysis of the NCI-60 cell line panel. Cell Rep 2013, 4(3):609-620.
  • [12]Geiger T, Wehner A, Schaab C, Cox J, Mann M: Comparative proteomic analysis of eleven common cell lines reveals ubiquitous but varying expression of most proteins. Mol Cell Proteomics 2012, 11(3):M111 014050.
  • [13]Shen K, Tseng G: Meta-analysis for pathway enrichment analysis when combining multiple genomic studies. Bioinformatics 2010, 26(10):1316-1323.
  • [14]Tyekucheva S, Marchionni L, Karchin R, Parmigiani G: Integrating diverse genomic data using gene sets. Genome Biol 2011, 12(10):R105. BioMed Central Full Text
  • [15]Kuo WP, Jenssen TK, Butte AJ, Ohno-Machado L, Kohane IS: Analysis of matched mRNA measurements from two different microarray technologies. Bioinformatics 2002, 18(3):405-412.
  • [16]Ebert M, Sharp P: Roles for microRNAs in conferring robustness to biological processes. Cell 2012, 149(3):515-524.
  • [17]As F, An C, Higgins D: A multivariate analysis approach to the integration of proteomic and gene expression data. Proteomics 2007, 7(13):2162-2171.
  • [18]Raychaudhuri S, Stuart J, Altman R: Principal components analysis to summarize microarray experiments: application to sporulation time series. Pac Symp Biocomput 2000, 455-466. Available online: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2669932/ webcite
  • [19]Yeung K, Ruzzo W: Principal component analysis for clustering gene expression data. Bioinformatics 2001, 17(9):763-774.
  • [20]Fellenberg K, Hauser N, Brors B, Neutzner A, Hoheisel J, Vingron M: Correspondence analysis applied to microarray data. Proc Natl Acad Sci U S A 2001, 98(19):10781-10786.
  • [21]Fagan A, Culhane AC, Higgins DG: A multivariate analysis approach to the integration of proteomic and gene expression data. Proteomics 2007, 7(13):2162-2171.
  • [22]Yao F, Coquery J, Le Cao KA: Independent principal component analysis for biologically meaningful dimension reduction of large biological data sets. BMC Bioinformatics 2012, 13:24. BioMed Central Full Text
  • [23]Sheng J, Deng H-W, Calhoun V, Wang Y-P: Integrated analysis of gene expression and copy number data on gene shaving using independent component analysis. IEEE/ACM Trans Comput Biol Bioinform 2011, 8(6):12.
  • [24]Dray S, Chessel D, Thioulouse J: Co-inertia analysis and the linking of ecological data tables. Ecology 2003, 84(11):11.
  • [25]Dolédec S, Chessel D: Co-inertia analysis: an alternative method for studying species–environment relationships. Freshwater Biology 1994, 31(3):277-294.
  • [26]Culhane A, Perrière G, Higgins D: Cross-platform comparison and visualisation of gene expression data using co-inertia analysis. BMC Bioinformatics 2003, 4:59. BioMed Central Full Text
  • [27]Le Cao KA, Martin PG, Robert-Granie C, Besse P: Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinformatics 2009, 10:34. BioMed Central Full Text
  • [28]Hanafi M, Kohler A, Qannari E-M: Connections between multiple co-inertia analysis and consensus principal component analysis. Chemometrics and intelligent laboratory systems 2011, 106:4.
  • [29]Tenenhaus A, Tenenhaus M: Regularized generalized canonical correlation analysis. Psychometrika 2011, 76(2):28.
  • [30]Tenenhaus A, Philippe C, Guillemot V, Le Cao KA, Grill J, Frouin V: Variable selection for generalized canonical correlation analysis. Biostatistics 2014. doi:10.1093/biostatistics/kxu001
  • [31]Witten DM, Tibshirani R, Hastie T: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 2009, 10(3):515-534.
  • [32]de Vienne D, Ollier S, Aguileta G: Phylo-MCOA: a fast and efficient method to detect outlier genes and species in phylogenomics using multiple co-inertia analysis. Mol Biol Evol 2012, 29(6):1587-1598.
  • [33]Shankavaram UT, Reinhold WC, Nishizuka S, Major S, Morita D, Chary KK, Reimers MA, Scherf U, Kahn A, Dolginow D, Cossman J, Kaldjian EP, Scudiero DA, Petricoin E, Liotta L, Lee JK, Weinstein JN: Transcript and protein expression profiles of the NCI-60 cancer cell panel: an integromic microarray study. Mol Cancer Ther 2007, 6(3):820-832.
  • [34]Kroonenberg PM, R L: Nonsymmetric correspondence analysis: a tool for analysing contingency tables with a dependence structure. Multivariate Behavioral Research 1999, 34(3):367-396.
  • [35]Chessel D, Hanafi M: Analysis of the co-inertia of K tables Analyses de la co-inertie de K nuages de points. Revue de statistique appliquée 1996, 44(2):35-66.
  • [36]Pfister TD, Reinhold WC, Agama K, Gupta S, Khin SA, Kinders RJ, Parchment RE, Tomaszewski JE, Doroshow JH, Pommier Y: Topoisomerase I levels in the NCI-60 cancer cell line panel determined by validated ELISA and microarray analysis and correlation with indenoisoquinoline sensitivity. Mol Cancer Ther 2009, 8(7):1878-1884.
  • [37]Cancer Genome Atlas Research N: Integrated genomic analyses of ovarian carcinoma. Nature 2011, 474(7353):609-615.
  • [38]Shankavaram UT, Varma S, Kane D, Sunshine M, Chary KK, Reinhold WC, Pommier Y, Weinstein JN: Cell Miner: a relational database and query tool for the NCI-60 cancer cell lines. BMC Genomics 2009, 10:277. BioMed Central Full Text
  • [39]Wu Z, Irizarry RA, Gentleman R, Murillo FM, Spencer F: A model based background adjustment for oligonucleotide expression arrays. J Am Stat Assoc 2004, 99:909-917.
  • [40]Bolstad BM, Irizarry RA, Astrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003, 19(2):185-193.
  • [41]Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 2008, 5(7):621-628.
  • [42]Li B, Dewey C: RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 2011, 12:323. BioMed Central Full Text
  • [43]Wang K, Singh D, Zeng Z, Coleman S, Huang Y, Savich G, He X, Mieczkowski P, Grimm S, Perou C, MacLeod JN, Chiang DY, Prins JF, Liu J: MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 2010, 38(18):e178. doi: 10.1093/nar/gkq622
  • [44]Bussey KJ, Chin K, Lababidi S, Reimers M, Reinhold WC, Kuo WL, Gwadry F, Ajay , Kouros-Mehr H, Fridlyand J, Jain A, Collins C, Nishizuka S, Tonon G, Roschke A, Gehlhaus K, Kirsch I, Scudiero DA, Gray JW, Weinstein JN: Integrating data on DNA copy number with gene expression levels and drug sensitivities in the NCI-60 cell line panel. Mol Cancer Ther 2006, 5(4):853-867.
  • [45]Roschke AV, Tonon G, Gehlhaus KS, McTyre N, Bussey KJ, Lababidi S, Scudiero DA, Weinstein JN, Kirsch IR: Karyotypic complexity of the NCI-60 drug-screening panel. Cancer Res 2003, 63(24):8634-8647.
  • [46]Abaan OD, Polley EC, Davis SR, Zhu YJ, Bilke S, Walker RL, Pineda M, Gindin Y, Jiang Y, Reinhold WC, Holbeck SL, Simon RM, Doroshow JH, Pommier Y, Meltzer PS: The exomes of the NCI-60 panel: a genomic resource for cancer biology and systems pharmacology. Cancer Res 2013, 73(14):4372-4382.
  • [47]Ikediobi ON, Davies H, Bignell G, Edkins S, Stevens C, O'Meara S, Santarius T, Avis T, Barthorpe S, Brackenbury L, Buck G, Butler A, Clements J, Cole J, Dicks E, Forbes S, Gray K, Halliday K, Harrison R, Hills K, Hinton J, Hunter C, Jenkinson A, Jones D, Kosmidou V, Lugg R, Menzies A, Mironenko T, Parker A, Perry J, et al.: Mutation analysis of 24 known cancer genes in the NCI-60 cell line set. Mol Cancer Ther 2006, 5(11):2606-2612.
  • [48]Scherf U, Ross DT, Waltham M, Smith LH, Lee JK, Tanabe L, Kohn KW, Reinhold WC, Myers TG, Andrews DT, Scudiero DA, Eisen MB, Sausville EA, Pommier Y, Botstein D, Brown PO, Weinstein JN: A gene expression database for the molecular pharmacology of cancer. Nat Genet 2000, 24(3):236-244.
  • [49]Stinson SF, Alley MC, Kopp WC, Fiebig HH, Mullendore LA, Pittman AF, Kenney S, Keller J, Boyd MR: Morphological and immunocytochemical characteristics of human tumor cell lines for use in a disease-oriented anticancer drug screen. Anticancer Res 1992, 12(4):1035-1053.
  • [50]Robert P, Escoufier Y: A unified tool for linear multivariate statistical methods: The RV-coefficient. Applied statistics 1976, 25(3):8.
  • [51]Imamura T, Hikita A, Inoue Y: The roles of TGF-beta signaling in carcinogenesis and breast cancer metastasis. Breast Cancer 2012, 19(2):118-124.
  • [52]Springer TA: Traffic signals on endothelium for lymphocyte recirculation and leukocyte emigration. Annu Rev Physiol 1995, 57:827-872.
  • [53]Wu Z, Moghaddas Gholami A, Kuster B: Systematic identification of the HSP90 candidate regulated proteome. Mol Cell Proteomics 2012, 11(6):M111 016675.
  • [54]Virant-Klun I, Stimpfel M, Cvjeticanin B, Vrtacnik-Bokal E, Skutella T: Small SSEA-4-positive cells from human ovarian cell cultures: related to embryonic stem cells and germinal lineage? J Ovarian Res 2013, 6(1):24. BioMed Central Full Text
  • [55]Tothill RW, Tinker AV, George J, Brown R, Fox SB, Lade S, Johnson DS, Trivett MK, Etemadmoghadam D, Locandro B, Traficante N, Fereday S, Hung JA, Chiew YE, Haviv I, Gertig D, DeFazio A, Bowtell DD, Australian Ovarian Cancer Study Group: Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clin Cancer Res 2008, 14(16):5198-5208.
  • [56]Verhaak RG, Tamayo P, Yang JY, Hubbard D, Zhang H, Creighton CJ, Fereday S, Lawrence M, Carter SL, Mermel CH, Kostic AD, Etemadmoghadam D, Saksena G, Cibulskis K, Duraisamy S, Levanon K, Sougnez C, Tsherniak A, Gomez S, Onofrio R, Gabriel S, Chin L, Zhang N, Spellman PT, Zhang Y, Akbani R, Hoadley KA, Kahn A, Kobel M, Huntsman D, Soslow RA, et al.: Prognostically relevant gene signatures of high-grade serous ovarian carcinoma. J Clin Invest 2013, 123(1):517-525.
  • [57]da Huang W, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 2009, 4(1):44-57.
  • [58]Maines-Bandiera S, Woo MM, Borugian M, Molday LL, Hii T, Gilks B, Leung PC, Molday RS, Auersperg N: Oviductal glycoprotein (OVGP1, MUC9): a differentiation-based mucin present in serum of women with ovarian cancer. Int J Gynecol Cancer 2010, 20(1):16-22.
  • [59]Steffan JJ, Koul S, Meacham RB, Koul HK: The transcription factor SPDEF suppresses prostate tumor metastasis. J Biol Chem 2012, 287(35):29968-29978.
  • [60]Bonnet N, Conway SJ, Ferrari SL: Regulation of beta catenin signaling and parathyroid hormone anabolic effects in bone by the matricellular protein periostin. Proc Natl Acad Sci U S A 2012, 109(37):15048-15053.
  文献评价指标  
  下载次数:37次 浏览次数:15次