BMC Bioinformatics | |
A simplicial complex-based approach to unmixing tumor progression data | |
Theodore Roman4  Amir Nayyeri2  Brittany Terese Fasy3  Russell Schwartz1  | |
[1] Department of Biological Sciences, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, USA | |
[2] Computer Science Department, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, USA | |
[3] Department of Computer Science, Tulane University, 6834 St. Charles St., New Orleans, USA | |
[4] Computatational Biology Department, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, USA | |
关键词: Genomics; Computational geometry; Mixture modeling; Tumor phylogeny; Cancer; | |
Others : 1230246 DOI : 10.1186/s12859-015-0694-x |
|
received in 2014-12-22, accepted in 2015-08-03, 发布年份 2015 |
【 摘 要 】
Background
Tumorigenesis is an evolutionary process by which tumor cells acquire mutations through successive diversification and differentiation. There is much interest in reconstructing this process of evolution due to its relevance to identifying drivers of mutation and predicting future prognosis and drug response. Efforts are challenged by high tumor heterogeneity, though, both within and among patients. In prior work, we showed that this heterogeneity could be turned into an advantage by computationally reconstructing models of cell populations mixed to different degrees in distinct tumors. Such mixed membership model approaches, however, are still limited in their ability to dissect more than a few well-conserved cell populations across a tumor data set.
Results
We present a method to improve on current mixed membership model approaches by better accounting for conserved progression pathways between subsets of cancers, which imply a structure to the data that has not previously been exploited. We extend our prior methods, which use an interpretation of the mixture problem as that of reconstructing simple geometric objects called simplices, to instead search for structured unions of simplices called simplicial complexes that one would expect to emerge from mixture processes describing branches along an evolutionary tree. We further improve on the prior work with a novel objective function to better identify mixtures corresponding to parsimonious evolutionary tree models. We demonstrate that this approach improves on our ability to accurately resolve mixtures on simulated data sets and demonstrate its practical applicability on a large RNASeq tumor data set.
Conclusions
Better exploiting the expected geometric structure for mixed membership models produced from common evolutionary trees allows us to quickly and accurately reconstruct models of cell populations sampled from those trees. In the process, we hope to develop a better understanding of tumor evolution as well as other biological problems that involve interpreting genomic data gathered from heterogeneous populations of cells.
【 授权许可】
2015 Roman et al.
Files | Size | Format | View |
---|---|---|---|
Fig. 7. | 11KB | Image | download |
Fig. 6. | 45KB | Image | download |
Fig. 5. | 54KB | Image | download |
Fig. 4. | 65KB | Image | download |
Fig. 3. | 71KB | Image | download |
Fig. 2. | 19KB | Image | download |
Fig. 1. | 30KB | Image | download |
Fig. 7. | 11KB | Image | download |
Fig. 6. | 45KB | Image | download |
Fig. 5. | 54KB | Image | download |
Fig. 4. | 65KB | Image | download |
Fig. 3. | 71KB | Image | download |
Fig. 2. | 19KB | Image | download |
Fig. 1. | 30KB | Image | download |
【 图 表 】
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
【 参考文献 】
- [1]Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011; 144(5):646-74.
- [2]Desper R, Jiang F, Kallioniemi OP, Moch H, Papadimitriou CH, Schäffer AA. Inferring tree models for oncogenesis from comparative genome hybridization data. J Comput Biol. 1999; 6:37-51.
- [3]Pennington G, Smith CA, Shackney S, Schwartz R. Reconstructing tumor phylogenies from heterogeneous single-cell data. J Bioinforma Comput Biol. 2007; 5(02a):407-27.
- [4]Pennington G, Smith C, Shackney S, Schwartz R. Expectation-maximization method for reconstructing tumor phylogenies from single-cell data. In: Computational Systems Bioinformatics Conference (CSB): 2006. p. 371–80.
- [5]Navin N, Krasnitz A, Rodgers L, Cook K, Meth J, Kendall J et al.. Inferring tumor progression from genomic heterogeneity. Genome Res. 2010; 20(1):68-80.
- [6]Gerlinger M, Rowan AJ, Horswell S, Larkin J, Endesfelder D, Gronroos E et al.. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med. 2012; 366(10):883-92.
- [7]Gerlinger M, Quezada SA, Peggs KS, Furness AJ, Fisher R, Marafioti T et al.. Ultra-deep t cell receptor sequencing reveals the complexity and intratumour heterogeneity of t cell clones in renal cell carcinomas. J Pathol Bacteriol. 2013; 231(4):424-32.
- [8]Chowdhury SA, Shackney SE, Heselmeyer-Haddad K, Ried T, Schäffer AA, Schwartz R. Phylogenetic analysis of multiprobe fluorescence in situ hybridization data from tumor cell populations. Bioinformatics. 2013; 29(13):189-98.
- [9]Chowdhury SA, Shackney SE, Heselmeyer-Haddad K, Ried T, Schäffer AA, Schwartz R. Algorithms to model single gene, single chromosome, and whole genome copy number changes jointly in tumor phylogenetics. PLoS Comput Biol. 2014; 10(7):1003740.
- [10]Navin N, Kendall J, Troge J, Andrews P, Rodgers L, McIndoo J et al.. Tumour evolution inferred by single-cell sequencing. Nature. 2011; 472(7341):90-4.
- [11]Wang D, Bodovitz S. Single cell analysis: the new frontier in ‘omics’. Trends Biotechnol. 2010; 28(6):281-90.
- [12]Tao Y, Ruan J, Yeh SH, Lu X, Wang Y, Zhai W et al.. Rapid growth of a hepatocellular carcinoma and the driving mutations revealed by cell-population genetic analysis of whole-genome data. Proc Natl Acad Sci. 2011; 108(29):12042-7.
- [13]Hou Y, Song L, Zhu P, Zhang B, Tao Y, Xu X et al.. Single-cell exome sequencing and monoclonal evolution of a jak2-negative myeloproliferative neoplasm. Cell. 2012; 148(5):873-85.
- [14]Xu X, Hou Y, Yin X, Bao L, Tang A, Song L et al.. Single-cell exome sequencing reveals single-nucleotide mutation characteristics of a kidney tumor. Cell. 2012; 148(5):886-95.
- [15]Gruhl J, Erosheva EA. A tale of two (types of) memberships: Comparing mixed and partial membership with a continuous data example. Handbook of Mixed Membership Models and Its Applications. Chapman & Hall/CRC, Boca Raton, FL; 2013.
- [16]Schwartz R, Shackney SE. Applying unmixing to gene expression data for tumor phylogeny inference. BMC Bioinforma. 2010; 11(1):42. BioMed Central Full Text
- [17]Etzioni R, Hawley S, Billheimer D, True LD, Knudsen B. Analyzing patterns of staining in immunohistochemical studies: application to a study of prostate cancer recurrence. Cancer Epidemiol Biomarkers Prev. 2005; 14:1040-6.
- [18]Roth A, Ding J, Morin R, Crisan A, Ha G, Giuliany R et al.. Jointsnvmix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics. 2012; 28(7):907-13.
- [19]Su X, Zhang L, Zhang J, Meric-Bernstam F, Weinstein JN. Purityest: estimating purity of human tumor samples using next-generation sequencing data. Bioinformatics. 2012; 28(17):2265-6.
- [20]Goya R, Sun MG, Morin RD, Leung G, Ha G, Wiegand KC et al.. Snvmix: predicting single nucleotide variants from next-generation sequencing of tumors. Bioinformatics. 2010; 26(6):730-6.
- [21]Full WE, Ehrlich R, Bezdek JC. Fuzzy qmodel–a new approach for linear unmixing. J Int Assoc Math Geol. 1982; 14(3):259-70.
- [22]Hastie T, Tibshirani R, Friedman J, Hastie T, Friedman J, Tibshirani R. The Elements of Statistical Learning. Springer, New York; 2009.
- [23]Tolliver D, Tsourakakis C, Subramanian A, Shackney S, Schwartz R. Robust unmixing of tumor states in array comparative genomic hybridization data. Bioinformatics. 2010; 26(12):106-14.
- [24]Oesper L, Mahmoody A, Raphael BJ. Theta: inferring intra-tumor heterogeneity from high-throughput dna sequencing data. Genome Biol. 2013; 14(7):80. BioMed Central Full Text
- [25]Zare H, Wang J, Hu A, Weber K, Smith J, Nickerson D et al.. Inferring clonal composition from multiple sections of a breast cancer. PLoS Comput Biol. 2014; 10(7):1003703.
- [26]Ha G, Roth A, Khattra J, Ho J, Yap D, Prentice LM et al.. Titan: Inference of copy number architectures in clonal cell populations from tumor whole genome sequence data. Genome Res. 2014; 24(11):1881-93.
- [27]Li Y, Xie X. Deconvolving tumor purity and ploidy by integrating copy number alterations and loss of heterozygosity. Bioinformatics. 2014; 30(15):2121-9.
- [28]Roth A, Khattra J, Yap D, Wan A, Laks E, Biele J et al.. Pyclone: statistical inference of clonal population structure in cancer. Nat Methods. 2014; 11:396-8.
- [29]Qiao Y, Quinlan AR, Jazaeri AA, Verhaak RG, Wheeler DA, Marth GT. Subcloneseeker: A computational framework for reconstructing tumor clone structure for cancer variant interpretation and prioritization. Genome Biol. 2014; 15(8):443. BioMed Central Full Text
- [30]Koboldt DC, Steinberg KM, Larson DE, Wilson RK, Mardis ER. The next-generation sequencing revolution and its impact on genomics. Cell. 2013; 155(1):27-38.
- [31]Oesper L, Satas G, Raphael BJ. Quantifying tumor heterogeneity in whole-genome and whole-exome sequencing data. Bioinformatics. 2014; 30(24):3532-40.
- [32]Li A, Liu Z, Lezon-Geyda K, Sarkar S, Lannin D, Schulz V et al.. GPHMM: an integrated hidden markov model for identification of copy number alteration and loss of heterozygosity in complex tumor samples using whole genome snp arrays. Nucleic Acids Res. 2011; 39(12):4928-41.
- [33]Jiao W, Vembu S, Deshwar AG, Stein L, Morris Q. Inferring clonal evolution of tumors from single nucleotide somatic mutations. BMC Bioinforma. 2014; 15(1):35. BioMed Central Full Text
- [34]Larson NB, Fridley BL. Purbayes: estimating tumor cellularity and subclonality in next-generation sequencing data. Bioinformatics. 2013; 29(15):1888-9.
- [35]Weinstein JN, Collisson EA, Mills GB, Shaw KRM, Ozenberger BA, Ellrott K et al.. The cancer genome atlas pan-cancer analysis project. Nat Genet. 2013; 45(10):1113-20.
- [36]Salari R, Saleh S, Kashef-Haghighi D, Khavari D, Newburger DE, West RB et al.. Inference of tumor phylogeneies with improved somatic mutation discovery. J Comput Biol. 2013; 20(11):933-44.
- [37]Fischer A, Vázquez-García I, Illingworth CJ, Mustonen V. High-definition reconstruction of clonal composition in cancer. Cell Rep. 2014; 7:1740-52.
- [38]Eng KH, Hanlon BM. Discrete mixture modeling to address genetic heterogeneity in time-to-event regression. Bioinformatics. 2014; 30(12):1690-7.
- [39]Shackney SE, Smith CA, Pollice A, Brown K, Day R, Julian T et al.. Intracellular patterns of her-2/neu, ras, and ploidy abnormalities in primary human breast cancers predict postoperative clinical disease-free survival. Clin Cancer Res. 2004; 10(9):3042-52.
- [40]Heselmeyer-Haddad K, Berroa Garcia LY, Bradley A, Ortiz-Melendez C, Lee WJ, Christensen R et al.. Single-cell genetic analysis of ductal carcinomain Situ and invasive breast cancer reveals enormous tumor heterogeneity yet conserved genomic imbalances and gain of MYC during progression. Am J Pathol. 2012; 181(5):1807-22.
- [41]Pearson K. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science. 1901; 2:559-72.
- [42]Comon P. Independent component analysis. Signal Proc. 1994; 36:287-314.
- [43]Schölkopf B, Smola AJ. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT press, Cambridge; 2002.
- [44]Schölkopf B, Smola A, Müller KR. Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 1998; 10(5):1299-319.
- [45]Roweis ST, Saul LK. Nonlinear dimensionality reduction by locally linear embedding. Science. 2000; 290(5500):2323-26.
- [46]Tenenbaum JB, De Silva V, Langford JC. A global geometric framework for nonlinear dimensionality reduction. Science. 2000; 290(5500):2319-23.
- [47]Hartigan JA, Wong MA. Algorithm as 136: A k-means clustering algorithm. Appl Stat. 1979; 28:100-8.
- [48]Chan TH, Chi CY, Huang YM, Ma WK. A convex analysis-based minimum-volume enclosing simplex algorithm for hyperspectral unmixing. Signal Processing, IEEE Transactions on. 2009; 57(11):4418-32.
- [49]Wasserman L. All of Nonparametric Statistics. Springer, New York; 2006.
- [50]Su Z, Labaj P, Li S, Thierry-Mieg J et al.. A comprehensive assessment of rna-seq accuracy, reproducibility and information content by the sequencing quality control consortium. Nat Biotechnol. 2014; 32(9):903-14.
- [51]Golub GH, Reinsch C. Singular value decomposition and least squares solutions. Numer Math. 1970; 14(5):403-20.
- [52]Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY. An efficient k-means clustering algorithm: Analysis and implementation. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 2002; 24(7):881-92.
- [53]Verbeek JJ, Vlassis N, Kröse B. Efficient greedy learning of gaussian mixture models. Neural Comput. 2003; 15(2):469-85.
- [54]Carter SL, Cibulskis K, Helman E, McKenna A, Shen H, Zack T et al.. Absolute quantification of somatic dna alterations in human cancer. Nat Biotechnol. 2012; 30(5):413-21.
- [55]Parker JS, Mullins M, Cheang MC, Leung S, Voduc D, Vickery T et al.. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol. 2009; 27(8):1160-7.
- [56]Imanishi T, Nakaoka H. Hyperlink management system and id converter system: enabling maintenance-free hyperlinks among major biological databases. Nucleic Acids Res. 2009; 37S2:17-22.
- [57]Dennis Jr G, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC et al.. David: database for annotation, visualization, and integrated discovery. Genome Biol. 2003; 4(5):3. BioMed Central Full Text
- [58]Balkwill F, Mantovani A. Inflammation and cancer: back to virchow? Lancet. 2001; 357(9255):539-45.
- [59]Coussens LM, Werb Z. Inflammation and cancer. Nature. 2002; 420(6917):860-7.
- [60]Turpin E, Bièche I, Bertheau P, Plassa LF, Lerebours F, de Roquancourt A et al.. Increased incidence of erbb2 overexpression and tp53 mutation in inflammatory breast cancer. Oncogene. 2002; 21(49):7593-7.
- [61]Mantovani A, Allavena P, Sica A, Balkwill F. Cancer-related inflammation. Nature. 2008; 454(7203):436-44.
- [62]Jin L. The actin associated protein palladin in smooth muscle and in the development of diseases of the cardiovasculature and in cancer. J Muscle Res Cell Motil. 2011; 32(1):7-17.
- [63]Mierke CT, Rösel D, Fabry B, Brábek J. Contractile forces in tumor cell migration. Eur J Cell Biol. 2008; 87(8):669-76.
- [64]Hashimoto Y, Skacel M, Adams JC. Roles of fascin in human carcinoma motility and signaling: prospects for a novel biomarker? Int J Biochem Cell Biol. 2005; 37(9):1787-804.
- [65]Kulbe H, Levinson NR, Balkwill F, Wilson JL. The chemokine network in cancer-much more than directing cell movement. Int J Dev Biol. 2004; 48:489-96.
- [66]Walser TC, Fulton AM. The role of chemokines in the biology and therapy of breast cancer. Breast disease. 2004; 20(1):137-43.
- [67]Li JY, Ou ZL, Yu SJ, Gu XL, Yang C, Chen AX et al.. The chemokine receptor ccr4 promotes tumor growth and lung metastasis in breast cancer. Breast Cancer Res Treat. 2012; 131(3):837-48.
- [68]Chavey C, Bibeau F, Gourgou-Bourgade S, Burlinchon S, Boissière F, Laune D, Roques S et al.. Oestrogen receptor negative breast cancers exhibit high cytokine content. Breast Cancer Res. 2007; 9(1):15. BioMed Central Full Text
- [69]Miki Y, Swensen J, Shattuck-Eidens D, Futreal PA, Harshman K, Tavtigian S et al.. A strong candidate for the breast and ovarian cancer susceptibility gene brca1. Science. 1994; 266(5182):66-71.
- [70]Durocher F, Shattuck-Eidens D, McClure M, Labrie F, Skolnick MH, Goldgar DE et al.. Comparison of brca1 polymorphisms, rare sequence variants and/or missense mutations in unaffected and breast/ovarian cancer populations. Hum Mol Genet. 1996; 5(6):835-42.
- [71]Iida N, Bourguignon LY. New cd44 splice variants associated with human breast cancers. J Cell Physiol. 1995; 162(1):127-33.
- [72]Brinkman B. Splice variants as cancer biomarkers. Clin Biochem. 2004; 37(7):584-94.
- [73]Zhang QX, Hilsenbeck SG, Fuqua SA, Borg Å. Multiple splicing variants of the estrogen receptor are present in individual human breast tumors. J Steroid Biochem Mol Biol. 1996; 59(3):251-60.
- [74]Doyle GA, Bourdeau-Heller JM, Coulthard S, Meisner LF, Ross J. Amplification in human breast cancer of a gene encoding a c-myc mrna-binding protein. Cancer Res. 2000; 60(11):2756-9.
- [75]van Kouwenhove M, Kedde M, Agami R. Microrna regulation by rna-binding proteins and its implications for cancer. Nat Rev Cancer. 2011; 11(9):644-56.
- [76]Jögi A, Brennan DJ, Rydén L, Magnusson K, Fernö M, Stål O et al.. Nuclear expression of the rna-binding protein rbm3 is associated with an improved clinical outcome in breast cancer. Mod Pathol. 2009; 22(12):1564-74.
- [77]Westley B, Rochefort H. A secreted glycoprotein induced by estrogen in human breast cancer cell lines. Cell. 1980; 20(2):353-62.
- [78]Ro J, Sahin A, Ro JY, Fritsche H, Hortobagyi G, Blick M. Immunohistochemical analysis of p-glycoprotein expression correlated with chemotherapy resistance in locally advanced breast cancer. Hum Pathol. 1990; 21(8):787-91.
- [79]Joensuu H, Klemi P, Toikkanen S, Jalkanen S. Glycoprotein cd44 expression and its association with survival in breast cancer. Am J Pathol. 1993; 143(3):867.
- [80]Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. 2000; 100(1):57-70.
- [81]Eccles SA. The role of c-erbb-2/her2/neu in breast cancer progression and metastasis. J Mammary Gland Biol Neoplasia. 2001; 6(4):393-406.
- [82]Zhao Y, Liu H, Liu Z, Ding Y, LeDoux SP, Wilson GL et al.. Overcoming trastuzumab resistance in breast cancer by targeting dysregulated glucose metabolism. Cancer Res. 2011; 71(13):4585-97.
- [83]Casero RA, Marton LJ. Targeting polyamine metabolism and function in cancer and other hyperproliferative diseases. Nat Rev Drug Discov. 2007; 6(5):373-90.
- [84]Alam S, Kelleher SL. Cellular mechanisms of zinc dysregulation: a perspective of zinc homeostatis as an etiological factor in the development and progression of breast cancer. Nutrients. 2012; 4:875-903.
- [85]Yoshihara K, Shahmoradgoli M, Martínez E, Vegesna R, Kim H, Torres-Garcia W et al.. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat Commun. 2013; 4:2612.
- [86]Cheng SW, Chiu MK. Dimension detection via slivers. In: Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms: 1001–1010 January 2009; New York. ACM-SIAM: 2009. p. 1001–1010.
- [87]Pinto D, Darvishi K, Shi X, Rajan D, Rigler D, Fitzgerald T et al.. Comprehensive assessment of array-based platforms and calling algorithms for detection of copy number variants. Nat Biotechnol. 2011; 29(6):512-20.
- [88]Hudson TJ, Anderson W, Aretz A, Barker AD, Bell C, Bernabé RR et al.. International network of cancer genome projects. Nature. 2010; 464(7291):993-8.
- [89]Stolovitzky G, Monroe D, Califano A. Dialogue on reverse-engineering assessment and methods. Ann N Y Acad Sci. 2007; 1115(1):1-22.