期刊论文详细信息
BMC Systems Biology
Pegasus: a comprehensive annotation and prediction tool for detection of driver gene fusions in cancer
Raul Rabadan2  Giorgio Inghirami5  Antonio Iavarone4  Anna Lasorella4  Veronique Frattini4  Chris H Wiggins1  Andrea Acquaviva3  Elisa Ficarra3  Sakellarios Zairis2  Francesco Abate3 
[1] Institute for Data Sciences and Engineering, Columbia University, 500 W. 120th Street, Mudd 524, New York 10027, New York, USA;Center for Computational Biology and Bioinformatics, Columbia University, 1130 St. Nicholas Ave, New York 10032, NY, USA;Department of Control and Computer Engineering, Politecnico di Torino, Torino 10129, Italy;Institute for Cancer Genetics, Columbia University Medical Center, New York, New York, USA;Department of Pathology, Center for Experimental Research and Medical Studies, Laboratory of Functional Genomics, University of Torino, Torino, Italy
关键词: Machine learning;    Next-generation sequencing;    Gene fusion;   
Others  :  1127082
DOI  :  10.1186/s12918-014-0097-z
 received in 2014-02-06, accepted in 2014-08-05,  发布年份 2014
PDF
【 摘 要 】

Background

The extraordinary success of imatinib in the treatment of BCR-ABL1 associated cancers underscores the need to identify novel functional gene fusions in cancer. RNA sequencing offers a genome-wide view of expressed transcripts, uncovering biologically functional gene fusions. Although several bioinformatics tools are already available for the detection of putative fusion transcripts, candidate event lists are plagued with non-functional read-through events, reverse transcriptase template switching events, incorrect mapping, and other systematic errors. Such lists lack any indication of oncogenic relevance, and they are too large for exhaustive experimental validation.

Results

We have designed and implemented a pipeline, Pegasus, for the annotation and prediction of biologically functional gene fusion candidates. Pegasus provides a common interface for various gene fusion detection tools, reconstruction of novel fusion proteins, reading-frame-aware annotation of preserved/lost functional domains, and data-driven classification of oncogenic potential. Pegasus dramatically streamlines the search for oncogenic gene fusions, bridging the gap between raw RNA-Seq data and a final, tractable list of candidates for experimental validation.

Conclusion

We show the effectiveness of Pegasus in predicting new driver fusions in 176 RNA-Seq samples of glioblastoma multiforme (GBM) and 23 cases of anaplastic large cell lymphoma (ALCL). Contact: fa2306@columbia.edu.

【 授权许可】

   
2014 Abate et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150219044845484.pdf 2307KB PDF download
Figure 8. 40KB Image download
Figure 7. 30KB Image download
Figure 6. 27KB Image download
Figure 5. 49KB Image download
Figure 4. 97KB Image download
Figure 3. 43KB Image download
Figure 2. 66KB Image download
Figure 1. 64KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

【 参考文献 】
  • [1]Nowell P, Hungerford D: A minute chromosome in chronic granulocytic leukemia. Science 1960, 132(3438):1488-1501.
  • [2]Zhao X, Ghaffari S, Lodish H, Malashkevich VN, Kim PS: Structure of the Bcr-Abl oncoprotein oligomerization domain. Nat Struct Biol 2002, 9(2):117-120.
  • [3]Tomlins SA, Rhodes DR, Perner S, Dhanasekaran SM, Mehra R, Sun XW, Varambally S, Cao X, Tchinda J, Kuefer R, Lee C, Montie JE, Shah RB, Pienta KJ, Rubin MA, Chinnaiyan AM: Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science 2005, 310(5748):644-648.
  • [4]Merson S, Jhavar S, Flohr P, Edwards S, Foster CS, Eeles R, Martin FL, Phillips DH, Crundwell M, Christmas T, Thompson A, Fisher C, Kovacs G, Cooper CS: Diversity of TMPRSS2-ERG fusion transcripts in the human prostate. Oncogene 2007, 26(18):2667-2673.
  • [5]Voena C, Ambrogio C, Piva R, Inghirami G: The anaplastic lymphoma kinase in the pathogenesis of cancer. Nat Rev Cancer 2008, 8(1):11-23.
  • [6]Morris SW, Kirstein MN, Valentine MB, Dittmer KG, Shapiro DN, Saltman DL, Look AT: Fusion of a kinase gene, ALK, to a nucleolar protein gene, NPM, in non-Hodgkin’s lymphoma. Science 1994, 263(5151):1281-1284.
  • [7]Soda M, Choi YL, Enomoto M, Takada S, Yamashita Y, Ishikawa S, Fujiwara S, Watanabe H, Kurashina K, Hatanaka H, Bando M, Ohno S, Ishikawa Y, Aburatani H, Niki T, Sohara Y, Sugiyama Y, Mano H: Identification of the transforming EML4-ALK fusion gene in non-small-cell lung cancer. Nature 2007, 448(7153):561-566.
  • [8]Maher CA, Palanisamy N, Brenner JC, Cao X, Kalyana-Sundaram S, Luo S, Khrebtukova I, Barrette TR, Grasso C, Yu J, Lonigro RJ, Schroth G, Kumar-Sinha C, Chinnaiyan AM: Chimeric transcript discovery by paired-end transcriptome sequencing. Proc Natl Acad Sci U S A 2009, 106(30):12353-12358.
  • [9]Steidl C, Shah SP, Woolcock BW, Rui L, Kawahara M, Farinha P, Johnson NA, Zhao Y, Telenius A, Neriah SB, McPherson A, Meissner B, Okoye UC, Diepstra A, van den Berg A, Sun M, Leung G, Jones SJ, Connors JM, Huntsman DG, Savage KJ, Rimsza LM, Horsman DE, Staudt LM, Steidl U, Marra MA, Gascoyne RD: MHC class II transactivator CIITA is a recurrent gene fusion partner in lymphoid cancers. Nature 2011, 471(7338):377-381.
  • [10]Singh D, Chan JM, Zoppoli P, Niola F, Sullivan R, Castano A, Liu EM, Reichel J, Porrati P, Pellegatta S, Qiu K, Gao Z, Ceccarelli M, Riccardi R, Brat DJ, Guha A, Aldape K, Golfinos JG, Zagzag D, Mikkelsen T, Finocchiaro G, Lasorella A, Rabadan R, Iavarone A: Transforming fusions of FGFR and TACC genes in human glioblastoma. Science 2012, 337(6099):1231-1235.
  • [11]Williams SV, Hurst CD, Knowles MA: Oncogenic FGFR3 gene fusions in bladder cancer. Hum Mol Genet 2013, 22(4):795-803.
  • [12]Majewski IJ, Mittempergher L, Davidson NM, Bosma A, Willems SM, Horlings HM, de Rink I, Greger L, Hooijer GK, Peters D, Nederlof PM, Hofland I, de Jong J, Wesseling J, Kluin RJ, Brugman W, Kerkhoven R, Nieboer F, Roepman P, Broeks A, Muley TR, Jassem J, Niklinski J, van Zandwijk N, Brazma A, Oshlack A, van den Heuvel M, Bernards R: Identification of recurrent FGFR3 fusion genes in lung cancer through kinome-centred RNA sequencing. J Pathol 2013, 230(3):270-276.
  • [13]Sboner A, Habegger L, Pflueger D, Terry S, Chen DZ, Rozowsky JS, Tewari AK, Kitabayashi N, Moss BJ, Chee MS, Demichelis F, Rubin MA, Gerstein MB: FusionSeq: a modular framework for finding gene fusions by analyzing paired-end RNA-sequencing data. Genome Biol 2010, 11(10):R104. BioMed Central Full Text
  • [14]McPherson A, Hormozdiari F, Zayed A, Giuliany R, Ha G, Sun MG, Griffith M, Heravi Moussavi A, Senz J, Melnyk N, Pacheco M, Marra MA, Hirst M, Nielsen TO, Sahinalp SC, Huntsman D, Shah SP: deFuse: an algorithm for gene fusion discovery in tumor RNA-Seq data. PLoS Comput Biol 2011, 7(5):e1001138.
  • [15]Iyer MK, Chinnaiyan AM, Maher CA: ChimeraScan: a tool for identifying chimeric transcription in sequencing data. Bioinformatics 2011, 27(20):2903-2904.
  • [16]Abate F, Acquaviva A, Paciello G, Foti C, Ficarra E, Ferrarini A, Delledonne M, Iacobucci I, Soverini S, Martinelli G, Macii E: Bellerophontes: an RNA-Seq data analysis framework for chimeric transcripts discovery based on accurate fusion model. Bioinformatics 2012, 28(16):2114-2121.
  • [17]Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 2009, 25(9):1105-1111.
  • [18]Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 2010, 28(5):511-515.
  • [19]Edgren H, Murumagi A, Kangaspeska S, Nicorici D, Hongisto V, Kleivi K, Rye IH, Nyberg S, Wolf M, Borresen-Dale AL, Kallioniemi O: Identification of fusion genes in breast cancer by paired-end RNA-sequencing. Genome Biol 2011, 12(1):R6. BioMed Central Full Text
  • [20]Carrara M, Beccuti M, Lazzarato F, Cavallo F, Cordero F, Donatelli S, Calogero RA: State-of-the-art fusion-finder algorithms sensitivity and specificity. Biomed Res Int 2013, 2013:340620.
  • [21]Ozsolak F, Milos PM: RNA sequencing: advances, challenges and opportunities. Nat Rev Genet 2011, 12(2):87-98.
  • [22][http://cgap.nci.nih.gov/Chromosomes/Mitelman] webcite Mitelman F, J.B.a.M.F: Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer. 2013. Available from: .
  • [23]Novo FJ, de Mendibil IO, Vizmanos JL: TICdb: a collection of gene-mapped translocation breakpoints in cancer. BMC Genomics 2007, 8:33. BioMed Central Full Text
  • [24]Kim P, Yoon S, Kim N, Lee S, Ko M, Lee H, Kang H, Kim J, Lee S: ChimerDB 2.0–a knowledgebase for fusion genes updated. Nucleic Acids Res 2010, 38(Database issue):D81-D85.
  • [25]Wang XS, Prensner JR, Chen GA, Cao Q, Han B, Dhanasekaran SM, Ponnala R, Cao XH, Varambally S, Thomas DG, Giordano TJ, Beer DG, Palanisamy N, Sartor MA, Omenn GS, Chinnaiyan AM: An integrative approach to reveal driver gene fusions from paired-end sequencing data in cancer. Nat Biotechnol 2009, 27(11):1005.
  • [26]Wu CC, Kannan K, Lin S, Yen L, Milosavljevic A: Identification of cancer fusion drivers using network fusion centrality. Bioinformatics 2013, 29(9):1174-1181.
  • [27]Shugay M, Ortiz De Mendibil I, Vizmanos JL, Novo FJ: Oncofuse: a computational framework for the prediction of the oncogenic potential of gene fusions. Bioinformatics 2013, 29(20):2539-2546.
  • [28]Jin Y, Mertens F, Kullendorff CM, Panagopoulos I: Fusion of the tumor-suppressor gene CHEK2 and the gene for the regulatory subunit B of protein phosphatase 2 PPP2R2A in childhood teratoma. Neoplasia 2006, 8(5):413-418.
  • [29]Flicek P, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S, Gil L, Gordon L, Hendrix M, Hourlier T, Johnson N, Kahari AK, Keefe D, Keenan S, Kinsella R, Komorowska M, Koscielny G, Kulesha E, Larsson P, Longden I, McLaren W, Muffato M, Overduin B, Pignatelli M, Pritchard B, Riat HS, et al.: Ensembl 2012. Nucleic Acids Res 2012, 40(Database issue):D84-D90.
  • [30]Reorganizing the protein space at the Universal Protein Resource (UniProt) Nucleic Acids Res 2012, 40(Database issue):D71-D75.
  • [31]Friedman J, Hastie T, Tibshirani R: Additive Logistic Regression: a Statistical View of Boosting. Ann Stat 2000, 28(2):337-407.
  • [32]Friedman JH: Greedy Function Approximation: A Gradient Boosting Machine. Ann Stat 2000, 29:1189-1232.
  • [33]Hastie T, Tibshirani R, Friedman JH: The Elements of Statistical Learning.Springer Series Stat 2001.
  • [34]Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E: Scikit-learn: Machine Learning in Python. (J Mach Learn Res 2011, 12:2825-2830. MIT Press
  • [35]Ananth Mohan ZC: Kilian Weinberger Web-Search Ranking with Initialized Gradient Boosted Regression Trees. JMLR: Workshop and Conference Proceedings 2011, 14:77-89.
  • [36]Breiman L: Classification and regression trees. 1984.
  • [37]Frattini V, Trifonov V, Chan JM, Castano A, Lia M, Abate F, Keir ST, Ji AX, Zoppoli P, Niola F, Danussi C, Dolgalev I, Porrati P, Pellegatta S, Heguy A, Gupta G, Pisapia DJ, Canoll P, Bruce JN, McLendon RE, Yan H, Aldape K, Finocchiaro G, Mikkelsen T, Prive GG, Bigner DD, Lasorella A, Rabadan R, Iavarone A: The integrated landscape of driver genomic alterations in glioblastoma. Nat Genet 2013, 45(10):1141-1149.
  • [38]Brennan CW, Verhaak RG, McKenna A, Campos B, Noushmehr H, Salama SR, Zheng S, Chakravarty D, Sanborn JZ, Berman SH, Beroukhim R, Bernard B, Wu CJ, Genovese G, Shmulevich I, Barnholtz-Sloan J, Zou L, Vegesna R, Shukla SA, Ciriello G, Yung WK, Zhang W, Sougnez C, Mikkelsen T, Aldape K, Bigner DD, Van Meir EG, Prados M, Sloan A, Black KL, et al.: The somatic genomic landscape of glioblastoma. Cell 2013, 155(2):462-477.
  • [39]Stratton MR, Campbell PJ, Futreal PA: The cancer genome. Nature 2009, 458(7239):719-724.
  • [40]Feldman AL, Vasmatzis G, Asmann YW, Davila J, Middha S, Eckloff BW, Johnson SH, Porcher JC, Ansell SM, Caride A: Novel TRAF1-ALK fusion identified by deep RNA sequencing of anaplastic large cell lymphoma. Genes Chromosomes Cancer 2013, 52(11):1097-1102.
  文献评价指标  
  下载次数:33次 浏览次数:11次