期刊论文详细信息
BMC Bioinformatics
A universal genomic coordinate translator for comparative genomics
Neda Zamani2  Görel Sundström2  Jennifer RS Meadows2  Marc P Höppner2  Jacques Dainat2  Henrik Lantz2  Brian J Haas1  Manfred G Grabherr1 
[1] Broad Institute of MIT and Harvard, Cambridge, MA, USA
[2] Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
关键词: Cross-species gene expression analysis;    Genomic duplication;    Genomic coordinate translation;    Comparative genomics;   
Others  :  818323
DOI  :  10.1186/1471-2105-15-227
 received in 2014-03-10, accepted in 2014-06-18,  发布年份 2014
PDF
【 摘 要 】

Background

Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic sequences. However, a comprehensive analysis of duplication events and their contributions to evolution would require all-to-all genome alignments, which increases at N2 with the number of available genomes, N.

Results

Here, we introduce Kraken, software that omits the all-to-all requirement by recursively traversing a graph of pairwise alignments and dynamically re-computing orthology. Kraken scales linearly with the number of targeted genomes, N, which allows for including large numbers of genomes in analyses. We first evaluated the method on the set of 12 Drosophila genomes, finding that orthologous correspondence computed indirectly through a graph of multiple synteny maps comes at minimal cost in terms of sensitivity, but reduces overall computational runtime by an order of magnitude. We then used the method on three well-annotated mammalian genomes, human, mouse, and rat, and show that up to 93% of protein coding transcripts have unambiguous pairwise orthologous relationships across the genomes. On a nucleotide level, 70 to 83% of exons match exactly at both splice junctions, and up to 97% on at least one junction. We last applied Kraken to an RNA-sequencing dataset from multiple vertebrates and diverse tissues, where we confirmed that brain-specific gene family members, i.e. one-to-many or many-to-many homologs, are more highly correlated across species than single-copy (i.e. one-to-one homologous) genes. Not limited to protein coding genes, Kraken also identifies thousands of newly identified transcribed loci, likely non-coding RNAs that are consistently transcribed in human, chimpanzee and gorilla, and maintain significant correlation of expression levels across species.

Conclusions

Kraken is a computational genome coordinate translator that facilitates cross-species comparisons, distinguishes orthologs from paralogs, and does not require costly all-to-all whole genome mappings. Kraken is freely available under LPGL from http://github.com/nedaz/kraken webcite.

【 授权许可】

   
2014 Zamani et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140711094245551.pdf 1507KB PDF download
Figure 6. 49KB Image download
Figure 5. 85KB Image download
Figure 4. 22KB Image download
Figure 3. 54KB Image download
Figure 2. 85KB Image download
Figure 1. 99KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

【 参考文献 】
  • [1]Kmita M, Duboule D: Organizing axes in time and space; 25 years of colinear tinkering. Science 2003, 301:331-333.
  • [2]Mallo M, Wellik DM, Deschamps J: Hox genes and regional patterning of the vertebrate body plan. Dev Biol 2010, 344:7-15.
  • [3]Hoegg S, Meyer A: Hox clusters as models for vertebrate genome evolution. Trends Genet 2005, 21:421-424.
  • [4]Meyer A: Hox gene variation and evolution. Nature 1998, 391(225):227-228.
  • [5]Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E: EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates. Genome Res 2009, 19:327-335.
  • [6]Catchen JM, Braasch I, Postlethwait JH: Conserved synteny and the zebrafish genome. Methods Cell Biol 2011, 104:259-285.
  • [7]Jun J, Mandoiu II, Nelson CE: Identification of mammalian orthologs using local synteny. BMC Genomics 2009, 10:630.
  • [8]Lindblad-Toh K, Garber M, Zuk O, Lin MF, Parker BJ, Washietl S, Kheradpour P, Ernst J, Jordan G, Mauceli E, Ward LD, Lowe CB, Holloway AK, Clamp M, Gnerre S, Alföldi J, Beal K, Chang J, Clawson H, Cuff J, Di Palma F, Fitzgerald S, Flicek P, Guttman M, Hubisz MJ, Jaffe DB, Jungreis I, Kent WJ, Kostka D, Lara M, et al.: A high-resolution map of human evolutionary constraint using 29 mammals. Nature 2011, 478:476-482.
  • [9]Harris RS: Improved pairwise alignment of genomic DNA. Ann Arbor: ProQuest; 2007:84.
  • [10]Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL: Versatile and open software for comparing large genomes. Genome Biol 2004, 5:R12.
  • [11]Soderlund C, Bomhoff M, Nelson WM: SyMAP v3.4: a turnkey synteny system with application to plant genomes. Nucleic Acids Res 2011, 39:e68.
  • [12]Grabherr MG, Russell P, Meyer M, Mauceli E, Alföldi J, Di Palma F, Lindblad-Toh K: Genome-wide synteny through highly sensitive sequence alignment: Satsuma. Bioinformatics 2010, 26:1145-1151.
  • [13]Lyons E, Pedersen B, Kane J, Freeling M: The value of nonmodel genomes and an example using synmap within coge to dissect the hexaploidy that predates the rosids. Trop Plant Biol 2008, 1:181-190.
  • [14]Paten B, Herrero J, Beal K, Fitzgerald S, Birney E: Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res 2008, 18:1814-1828.
  • [15]Hickey G, Paten B, Earl D, Zerbino D, Haussler D: HAL: a hierarchical format for storing and analyzing multiple genome alignments. Bioinformatics 2013, 29:1341-1342.
  • [16]Stark A, Lin MF, Kheradpour P, Pedersen JS, Parts L, Carlson JW, Crosby MA, Rasmussen MD, Roy S, Deoras AN, Ruby JG, Brennecke J, Hodges E, Hinrichs AS, Caspi A, Paten B, Park S-W, Han MV, Maeder ML, Polansky BJ, Robson BE, Aerts S, van Helden J, Hassan B, Gilbert DG, Eastman DA, Rice M, Weir M, Hahn MW, Park Y, et al.: Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature 2007, 450:219-232.
  • [17]Brawand D, Soumillon M, Necsulea A, Julien P, Csárdi G, Harrigan P, Weier M, Liechti A, Aximu-Petri A, Kircher M, Albert FW, Zeller U, Khaitovich P, Grützner F, Bergmann S, Nielsen R, Pääbo S, Kaessmann H: The evolution of gene expression levels in mammalian organs. Nature 2011, 478:343-348.
  • [18]Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L: Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 2012, 7:562-578.
  • [19]Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 2011, 29:644-652.
  • [20]Zamani N, Sundström G, Höppner MP, Grabherr MG: Modular and configurable optimal sequence alignment software: cola. Source Code Biol Med 2014, 9:12.
  • [21]Altschul S, Erickson B: Optimal sequence alignment using affine gap costs. Bull Math Biol 1986, 48:603-616.
  • [22]Chao K-M, Pearson WR, Miller W: Aligning two sequences within a specified diagonal band. Bioinformatics 1992, 8:481-487.
  • [23]Otto TD, Dillon GP, Degrave WS, Berriman M: RATT: rapid annotation transfer tool. Nucleic Acids Res 2011, 39:e57.
  • [24]Flicek P, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S, Gil L, Gordon L, Hendrix M, Hourlier T, Johnson N, Kähäri AK, Keefe D, Keenan S, Kinsella R, Komorowska M, Koscielny G, Kulesha E, Larsson P, Longden I, McLaren W, Muffato M, Overduin B, Pignatelli M, Pritchard B, Riat HS, et al.: Ensembl 2012. Nucleic Acids Res 2012, 40:D84-D90.
  • [25]Khalil AM, Guttman M, Huarte M, Garber M, Raj A, Rivea Morales D, Thomas K, Presser A, Bernstein BE, van Oudenaarden A, Regev A, Lander ES, Rinn JL: Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci U S A 2009, 106:11667-11672.
  • [26]Guttman M, Donaghey J, Carey BW, Garber M, Grenier JK, Munson G, Young G, Lucas AB, Ach R, Bruhn L, Yang X, Amit I, Meissner A, Regev A, Rinn JL, Root DE: Lander ES: lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature 2011, 477:295-300.
  • [27]Muers M: Non-coding RNA: Zebrafish provide insight into lincRNA evolution. Nat Rev Genet 2012, 13:74.
  • [28]Hoeppner MP, Lundquist A, Pirun M, Meadows JRS, Zamani N, Johnson J, Sundström G, Cook A, FitzGerald MG, Swofford R, Mauceli E, Torabi Moghadam B, Greka A, Alföldi A, Abouelleil A, Aftuck L, Bessette D, Berlin A, Brown A, Gearin G, Lui A, Macdonald JP, Pr GM: An improved canine genome and a comprehensive catalogue of coding genes and non-coding transcripts. PloS oneIn Press
  • [29]Kellis M, Birren BW, Lander ES: Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 2004, 428:617-624.
  • [30]Kasahara M, Naruse K, Sasaki S, Nakatani Y, Qu W, Ahsan B, Yamada T, Nagayasu Y, Doi K, Kasai Y, Jindo T, Kobayashi D, Shimada A, Toyoda A, Kuroki Y, Fujiyama A, Sasaki T, Shimizu A, Asakawa S, Shimizu N, Hashimoto S-I, Yang J, Lee Y, Matsushima K, Sugano S, Sakaizumi M, Narita T, Ohishi K, Haga S, Ohta F, et al.: The medaka draft genome and insights into vertebrate genome evolution. Nature 2007, 447:714-719.
  • [31]Nakatani Y, Takeda H, Kohara Y, Morishita S: Reconstruction of the vertebrate ancestral genome reveals dynamic genome reorganization in early vertebrates. Genome Res 2007, 17:1254-1265.
  • [32]Putnam NH, Butts T, Ferrier DEK, Furlong RF, Hellsten U, Kawashima T, Robinson-Rechavi M, Shoguchi E, Terry A, Yu J-K, Benito-Gutiérrez EL, Dubchak I, Garcia-Fernàndez J, Gibson-Brown JJ, Grigoriev IV, Horton AC, de Jong PJ, Jurka J, Kapitonov VV, Kohara Y, Kuroki Y, Lindquist E, Lucas S, Osoegawa K, Pennacchio LA, Salamov AA, Satou Y, Sauka-Spengler T, Schmutz J, Shin-I T, et al.: The amphioxus genome and the evolution of the chordate karyotype. Nature 2008, 453:1064-1071.
  • [33]Alföldi J, Di Palma F, Grabherr M, Williams C, Kong L, Mauceli E, Russell P, Lowe CB, Glor RE, Jaffe JD, Ray DA, Boissinot S, Shedlock AM, Botka C, Castoe TA, Colbourne JK, Fujita MK, Moreno RG, ten Hallers BF, Haussler D, Heger A, Heiman D, Janes DE, Johnson J, de Jong PJ, Koriabine MY, Lara M, Novick PA, Organ CL, Peach SE, et al.: The genome of the green anole lizard and a comparative analysis with birds and mammals. Nature 2011, 477:587-591.
  • [34]Chatterji S, Pachter L: Reference based annotation with GeneMapper. Genome Biol 2006, 7:R29.
  • [35]Hahn MW: Distinguishing among evolutionary models for the maintenance of gene duplicates. J Hered 2009, 100:605-617.
  • [36]Hoyle CHV: Evolution of neuronal signalling: transmitters and receptors. Auton Neurosci 2011, 165:28-53.
  • [37]Cañestro C, Albalat R, Irimia M, Garcia-Fernàndez J: Impact of gene gains, losses and duplication modes on the origin and diversification of vertebrates. Semin Cell Dev Biol 2013, 24:83-94.
  • [38]Schmidt EE: Transcriptional promiscuity in testes. Curr Biol 1996, 6:768-769.
  文献评价指标  
  下载次数:87次 浏览次数:33次