期刊论文详细信息
BMC Bioinformatics
Odintifier - A computational method for identifying insertions of organellar origin from modern and ancient high-throughput sequencing data based on haplotype phasing
M Thomas P. Gilbert1  Nathan Wales1  Ross Barnett1  Marie Lisandra Zepeda Mendoza1  Jose Alfredo Samaniego Castruita1 
[1]Centre for GeoGenetics, The Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, Copenhagen DK-1350, Denmark
关键词: Phasing;    Odin;    Nupt;    Numt;    Mitochondrial assembly;    High-throughput sequencing;    Ancient DNA;   
Others  :  1230606
DOI  :  10.1186/s12859-015-0682-1
 received in 2015-03-25, accepted in 2015-07-21,  发布年份 2015
PDF
【 摘 要 】

Background

Cellular organelles with genomes of their own (e.g. plastids and mitochondria) can pass genetic sequences to other organellar genomes within the cell in many species across the eukaryote phylogeny. The extent of the occurrence of these organellar-derived inserted sequences (odins) is still unknown, but if not accounted for in genomic and phylogenetic studies, they can be a source of error. However, if correctly identified, these inserted sequences can be used for evolutionary and comparative genomic studies. Although such insertions can be detected using various laboratory and bioinformatic strategies, there is currently no straightforward way to apply them as a standard organellar genome assembly on next-generation sequencing data. Furthermore, most current methods for identification of such insertions are unsuitable for use on non-model organisms or ancient DNA datasets.

Results

We present a bioinformatic method that uses phasing algorithms to reconstruct both source and inserted organelle sequences. The method was tested in different shotgun and organellar-enriched DNA high-throughput sequencing (HTS) datasets from ancient and modern samples. Specifically, we used datasets from lions (Panthera leo ssp. and Panthera leo leo) to characterize insertions from mitochondrial origin, and from common grapevine (Vitis vinifera) and bugle (Ajuga reptans) to characterize insertions derived from plastid genomes. Comparison of the results against other available organelle genome assembly methods demonstrated that our new method provides an improvement in the sequence assembly.

Conclusion

Using datasets from a wide range of species and different levels of complexity we showed that our novel bioinformatic method based on phasing algorithms can be used to achieve the next two goals: i) reference-guided assembly of chloroplast/mitochondrial genomes from HTS data and ii) identification and simultaneous assembly of odins. This method represents the first application of haplotype phasing for automatic detection of odins and reference-based organellar genome assembly.

【 授权许可】

   
2015 Samaniego Castruita et al.

【 预 览 】
附件列表
Files Size Format View
20151107013041599.pdf 2189KB PDF download
Fig. 5. 94KB Image download
Fig. 4. 41KB Image download
Fig. 3. 47KB Image download
Figure 4. 50KB Image download
Fig. 1. 40KB Image download
【 图 表 】

Fig. 1.

Figure 4.

Fig. 3.

Fig. 4.

Fig. 5.

【 参考文献 】
  • [1]Du Buy HG, Riley FL. Hybridization between the nuclear and kinetoplast DNA’s of Leishmania enriettii and between nuclear and mitochondrial DNA's of mouse liver. Proc Natl Acad Sci U S A. 1967; 57:790-7.
  • [2]Wang D, Wu YW, Shih ACC, Wu CS, Wang YN, Chaw SM. Transfer of chloroplast genomic DNA to mitochondrial genome occurred at least 300 MYA. Mol Biol Evol. 2007; 24:2040-8.
  • [3]Smith DR. Extending the limited transfer window hypothesis to inter-organelle DNA migration. Genome Biol Evol. 2011; 3:743-8.
  • [4]Roark LM, Hui AY, Donnelly L, Birchler JA, Newton KJ. Recent and frequent insertions of chloroplast DNA into maize nuclear chromosomes. Cytogenet Genome Res. 2010; 129:17-23.
  • [5]Michalovova M, Vyskot B, Kejnovsky E. Analysis of plastid and mitochondrial DNA insertions in the nucleus (NUPTs and NUMTs) of six plant species: size, relative age and chromosomal localization. Heredity (Edinb). 2013; 111:314-20.
  • [6]Bensasson D, Zhang D-X, Hartl DL, Hewitt GM. Mitochondrial pseudogenes: evolution’s misplaced witnesses. Trends Ecol Evol. 2001; 16:314-21.
  • [7]Antunes A, Ramos MJ. Discovery of a large number of previously unrecognized mitochondrial pseudogenes in fish genomes. Genomics. 2005; 86:708-17.
  • [8]Qu H, Ma F, Li Q. Comparative analysis of mitochondrial fragments transferred to the nucleus in vertebrate. J Genet Genomics. 2008; 35:485-90.
  • [9]Bi K, Bogart JP. Time and time again: unisexual salamanders (genus Ambystoma) are the oldest unisexual vertebrates. BMC Evol Biol. 2010; 10:238. BioMed Central Full Text
  • [10]Podnar M, Haring E, Pinsker W, Mayer W. Unusual origin of a nuclear pseudogene in the Italian wall lizard: intergenomic and interspecific transfer of a large section of the mitochondrial genome in the genus Podarcis (Lacertidae). J Mol Evol. 2007; 64:308-20.
  • [11]Miraldo A, Hewitt GM, Dear PH, Paulo OS, Emerson BC. Numts help to reconstruct the demographic history of the ocellated lizard (Lacerta lepida) in a secondary contact zone. Mol Ecol. 2012; 21:1005-18.
  • [12]Behura SK. Analysis of nuclear copies of mitochondrial sequences in honeybee (Apis mellifera) genome. Mol Biol Evol. 2007; 24:1492-505.
  • [13]Bensasson D, Zhang DX, Hewitt GM. Frequent assimilation of mitochondrial DNA by grasshopper nuclear genomes. Mol Biol Evol. 2000; 17:406-15.
  • [14]Blanchard JL, Schmidt GW. Mitochondrial DNA migration events in yeast and humans: integration by a common end-joining mechanism and alternative perspectives on nucleotide substitution patterns. Mol Biol Evol. 1996; 13:893.
  • [15]Ricchetti M, Fairhead C, Dujon B. Mitochondrial DNA repairs double-strand breaks in yeast chromosomes. Nature. 1999; 402:96-100.
  • [16]Wang D, Lloyd AH, Timmis JN. Environmental stress increases the entry of cytoplasmic organellar DNA into the nucleus in plants. Proc Natl Acad Sci U S A. 2012; 109:2444-8.
  • [17]Mourier T, Hansen AJ, Willerslev E, Arctander P. The Human Genome Project Reveals a Continuous Transfer of Large Mitochondrial Fragments to the Nucleus. Mol Biol Evol. 2001; 18:1833-7.
  • [18]Hazkani-Covo E, Graur D. A comparative analysis of numt evolution in human and chimpanzee. Mol Biol Evol. 2007; 24:13-8.
  • [19]Thalmann O, Hebler J, Poinar HN, Pääbo S, Vigilant L. Unreliable mtDNA data due to nuclear insertions: a cautionary tale from analysis of humans and other great apes. Mol Ecol. 2004; 13:321-35.
  • [20]Dubey S, Michaux J, Brünner H, Hutterer R, Vogel P. False phylogenies on wood mice due to cryptic cytochrome-b pseudogene. Mol Phylogenet Evol. 2009; 50:633-41.
  • [21]Soto-Calderón ID, Clark NJ, Wildschutte JVH, DiMattio K, Jensen-Seaman MI, Anthony NM. Identification of species-specific nuclear insertions of mitochondrial DNA (numts) in gorillas and their potential as population genetic markers. Mol Phylogenet Evol. 2014; 81C:61-70.
  • [22]Collura RV, Stewart CB. Insertions and duplications of mtDNA in the nuclear genomes of Old World monkeys and hominoids. Nature. 1995; 378:485-9.
  • [23]Ibarguchi G, Friesen VL, Lougheed SC. Defeating numts: semi-pure mitochondrial DNA from eggs and simple purification methods for field-collected wildlife tissues. Genome. 2006; 49:1438-50.
  • [24]Krause J, Fu Q, Good JM, Viola B, Shunkov MV, Derevianko AP et al.. The complete mitochondrial DNA genome of an unknown hominin from southern Siberia. Nature. 2010; 464:894-7.
  • [25]Kidd MG, Friesen VL. Sequence variation in the guillemot (Alcidae: Cepphus) mitochondrial control region and its nuclear homolog. Mol Biol Evol. 1998; 15:61-70.
  • [26]Collura RV, Auerbach MR, Stewart CB. A quick, direct method that can differentiate expressed mitochondrial genes from their nuclear pseudogenes. Curr Biol. 1996; 6:1337-9.
  • [27]Williams ST, Knowlton N. Mitochondrial pseudogenes are pervasive and often insidious in the snapping shrimp genus Alpheus. Mol Biol Evol. 2001; 18:1484-93.
  • [28]Benesh DP, Hasu T, Suomalainen LR, Valtonen ET, Tiirola M. Reliability of mitochondrial DNA in an acanthocephalan: the problem of pseudogenes. Int J Parasitol. 2006; 36:247-54.
  • [29]Botero-Castro F, Tilak M, Justy F, Catzeflis F, Delsuc F, Douzery EJP. Next-generation sequencing and phylogenetic signal of complete mitochondrial genomes for resolving the evolutionary history of leaf-nosed bats (Phyllostomidae). Mol Phylogenet Evol. 2013; 69:728-39.
  • [30]Olson LE, Yoder AD. Using secondary structure to identify ribosomal numts: cautionary examples from the human genome. Mol Biol Evol. 2002; 19:93-100.
  • [31]Hassanin A, Bonillo C, Nguyen BX, Cruaud C. Comparisons between mitochondrial genomes of domestic goat (Capra hircus) reveal the presence of numts and multiple sequencing errors. Mitochondrial DNA. 2010; 21:68-76.
  • [32]Gjerde B. Characterisation of full-length mitochondrial copies and partial nuclear copies (numts) of the cytochrome b and cytochrome c oxidase subunit I genes of Toxoplasma gondii, Neospora caninum, Hammondia heydorni and Hammondia triffittae (Apicomplexa: Sarcocys. Parasitol Res. 2013; 112:1493-511.
  • [33]Meyer M, Fu Q, Aximu-Petri A, Glocke I, Nickel B, Arsuaga JL et al.. A mitochondrial genome sequence of a hominin from Sima de los Huesos. Nature. 2014; 505:403-6.
  • [34]Thalmann O, Shapiro B, Cui P, Schuenemann VJ, Sawyer SK, Greenfield DL et al.. Complete mitochondrial genomes of ancient canids suggest a European origin of domestic dogs. Science. 2013; 342:871-4.
  • [35]Zhu A, Guo W, Jain K, Mower JP. Unprecedented Heterogeneity in the Synonymous Substitution Rate within a Plant Genome. Mol Biol Evol. 2014; 31:1228-36.
  • [36]Lindahl T. Instability and decay of the primary structure of DNA. Nature. 1993; 362:709-15.
  • [37]Gilbert MTP, Bandelt HJ, Hofreiter M, Barnes I. Assessing ancient DNA studies. Trends Ecol Evol. 2005; 20:541-4.
  • [38]Den Tex RJ, Maldonado JE, Thorington R, Leonard JA. Nuclear copies of mitochondrial genes: another problem for ancient DNA. Genetica. 2010; 138:979-84.
  • [39]Kolokotronis SO, Macphee RDE, Greenwood AD. Detection of mitochondrial insertions in the nucleus (NuMts) of Pleistocene and modern muskoxen. BMC Evol Biol. 2007; 7:67. BioMed Central Full Text
  • [40]Aird D, Ross MG, Chen WS, Danielsson M, Fennell T, Russ C et al.. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 2011; 12:R18. BioMed Central Full Text
  • [41]Zischler H, Geisert H, von Haeseler A, Pääbo S. A nuclear “fossil” of the mitochondrial D-loop and the origin of modern humans. Nature. 1995; 378:489-92.
  • [42]Bensasson D, Petrov DA, Zhang DX, Hartl DL, Hewitt GM. Genomic gigantism: DNA loss is slow in mountain grasshoppers. Mol Biol Evol. 2001; 18:246-53.
  • [43]Browning SR, Browning BL. Haplotype phasing: existing methods and new developments. Nat Rev Genet. 2011; 12:703-14.
  • [44]Patil N, Berno AJ, Hinds DA, Barrett WA, Doshi JM, Hacker CR et al.. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science. 2001; 294:1719-23.
  • [45]Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B et al.. The structure of haplotype blocks in the human genome. Science. 2002; 296:2225-9.
  • [46]Phillips MS, Lawrence R, Sachidanandam R, Morris AP, Balding DJ, Donaldson MA et al.. Chromosome-wide distribution of haplotype blocks and the role of recombination hot spots. Nat Genet. 2003; 33:382-7.
  • [47]Nachman MW, Crowell SL. Estimate of the mutation rate per nucleotide in humans. Genetics. 2000; 156:297-304.
  • [48]The International HapMap Project. Nature. 2003; 426:789-96.
  • [49]Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA et al.. A map of human genome variation from population-scale sequencing. Nature. 2010; 467:1061-73.
  • [50]Browning BL, Browning SR. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet. 2009; 84:210-23.
  • [51]Roach JC, Glusman G, Hubley R, Montsaroff SZ, Holloway AK, Mauldin DE et al.. Chromosomal haplotypes by genetic phasing of human families. Am J Hum Genet. 2011; 89:382-97.
  • [52]Underhill PA, Passarino G, Lin AA, Shen P, Mirazón Lahr M, Foley RA et al.. The phylogeography of Y chromosome binary haplotypes and the origins of modern human populations. Ann Hum Genet. 2001; 65:43-62.
  • [53]Johnson GC, Esposito L, Barratt BJ, Smith AN, Heward J, Di Genova G et al.. Haplotype tagging for the identification of common disease genes. Nat Genet. 2001; 29:233-7.
  • [54]Chapman JM, Cooper JD, Todd JA, Clayton DG. Detecting disease associations due to linkage disequilibrium using haplotype tags: a class of tests and the determinants of statistical power. Hum Hered. 2003; 56:18-31.
  • [55]Richter A, Rioux JD, Bouchard JP, Mercier J, Mathieu J, Ge B et al.. Location score and haplotype analyses of the locus for autosomal recessive spastic ataxia of Charlevoix-Saguenay, in chromosome region 13q11. Am J Hum Genet. 1999; 64:768-75.
  • [56]Zhang K, Zhi D. Joint haplotype phasing and genotype calling of multiple individuals using haplotype informative reads. Bioinformatics. 2013; 29:2427-34.
  • [57]Lopez JV, Cevario S, O’Brien SJ. Complete nucleotide sequences of the domestic cat (Felis catus) mitochondrial genome and a transposed mtDNA tandem repeat (Numt) in the nuclear genome. Genomics. 1996; 33:229-46.
  • [58]Kim JH, Antunes A, Luo SJ, Menninger J, Nash WG, O’Brien SJ et al.. Evolutionary analysis of a large mtDNA translocation (numt) into the nuclear genome of the Panthera genus species. Gene. 2006; 366:292-302.
  • [59]Ma YP, Wang S. Mitochondrial genome of the African lion Panthera leo leo. Mitochondrial DNA. 2014;doi:10.3109/19401736.2013.865171
  • [60]Bagatharia SB, Joshi MN, Pandya RV, Pandit AS, Patel RP, Desai SM et al.. Complete mitogenome of Asiatic lion resolves phylogenetic status within Panthera. BMC Genomics. 2013; 14:572. BioMed Central Full Text
  • [61]Binladen J, Wiuf C, Gilbert MTP, Bunce M, Barnett R, Larson G et al.. Assessing the fidelity of ancient DNA sequences amplified from nuclear genes. Genetics. 2006; 172:733-41.
  • [62]Goremykin VV, Salamini F, Velasco R, Viola R. Mitochondrial DNA of Vitis vinifera and the issue of rampant horizontal gene transfer. Mol Biol Evol. 2009; 26:99-110.
  • [63]Aquadro CF, Greenberg BD. Human mitochondrial DNA variation and evolution: analysis of nucleotide sequences from seven individuals. Genetics. 1983; 103:287-312.
  • [64]Hoelzel AR, Lopez JV, Dover GA, O’Brien SJ. Rapid evolution of a heteroplasmic repetitive sequence in the mitochondrial DNA control region of carnivores. J Mol Evol. 1994; 39:191-9.
  • [65]Degner JF, Marioni JC, Pai AA, Pickrell JK, Nkadori E, Gilad Y, Pritchard JK. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics. 2009; 25:3207-12.
  • [66]Hu G, Thilly WG. Evolutionary trail of the mitochondrial genome as based on human 16S rDNA pseudogenes. Gene. 1994; 147:197-204.
  • [67]Aguiar D, Istrail S. HapCompass: a fast cycle basis algorithm for accurate haplotype assembly of sequence data. J Comput Biol. 2012; 19:577-90.
  • [68]Cilibrasi R, van Iersel L, Steven K, Tromp J. On the complexity of several haplotyping problems. Casadio R, Myers G, editors. Algorithms Bioinformations Lect Notes Comput Sci, Proceedings of the 8th International Workshop. Springer Berlin Heidelberg. 2005;3692:128–39
  • [69]Cho YS, Hu L, Hou H, Lee H, Xu J, Kwon S et al.. The tiger genome and comparative analysis with lion and snow leopard genomes. Nat Commun. 2013; 4:2433.
  • [70]Giannuzzi G, D’Addabbo P, Gasparro M, Martinelli M, Carelli FN, Antonacci D et al.. Analysis of high-identity segmental duplications in the grapevine genome. BMC Genomics. 2011; 12:436. BioMed Central Full Text
  • [71]Jansen RK, Kaittanis C, Saski C, Lee S-B, Tomkins J, Alverson AJ et al.. Phylogenetic analyses of Vitis (Vitaceae) based on complete chloroplast genome sequences: effects of taxon sampling and phylogenetic methods on resolving relationships among rosids. BMC Evol Biol. 2006; 6:32. BioMed Central Full Text
  • [72]Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009; 25:1754-60.
  • [73]Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N et al.. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009; 25:2078-9.
  • [74]Picard. Available from:. http://broadinstitute. github.io/picard/ webcite
  • [75]McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A et al.. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010; 20:1297-303.
  • [76]DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C et al.. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011; 43:491-8.
  • [77]Thorvaldsdóttir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 2013; 14:178-92.
  • [78]R: A language and environment for statistical computing. R Found Stat Comput Viena, Austria; 2013.
  • [79]Pages H, Aboyoun P, Gentleman R, DebRoy S. Biostrings: String objects representing biological sequences, and matching algorithms. R Package. 2014. http://bioconductor. org/packages/release/bioc/html/Biostrings.html webcite
  • [80]Quinlan AR. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr Protoc Bioinformatics. 2014; 47:11.12.1-11.12.34.
  • [81]Rice P, Longden I, Bleasby A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000; 16:276-7.
  • [82]Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004; 32:1792-7.
  • [83]Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 2011; 28:2731-9.
  • [84]Harris RS. Improved Pairwise Alignment of Genomic DNA. Ph. D. Thesis. The Pennsylvania State University. 2007. http://www. bx.psu.edu/~rsharris/lastz/ webcite
  • [85]Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D et al.. Circos: an information aesthetic for comparative genomics. Genome Res. 2009; 19:1639-45.
  文献评价指标  
  下载次数:29次 浏览次数:8次