期刊论文详细信息
BMC Evolutionary Biology
Evolution of protein indels in plants, animals and fungi
Sandra L Baldauf1  Pravech Ajawatanawong1 
[1] Department of Systematic Biology, Evolutionary Biology Centre (EBC), Uppsala University, Uppsala 75236, Sweden
关键词: Indel profiles;    Eukaryote evolution;    Multiple sequence alignment;    Insertion/deletion;    Phylogeny;    Rare genomic changes;    Indels;   
Others  :  1086920
DOI  :  10.1186/1471-2148-13-140
 received in 2013-02-25, accepted in 2013-06-24,  发布年份 2013
PDF
【 摘 要 】

Background

Insertions/deletions (indels) in protein sequences are useful as drug targets, protein structure predictors, species diagnostics and evolutionary markers. However there is limited understanding of indel evolutionary patterns. We sought to characterize indel patterns focusing first on the major groups of multicellular eukaryotes.

Results

Comparisons of complete proteomes from a taxonically broad set of primarily Metazoa, Fungi and Viridiplantae yielded 299 substantial (>250aa) universal, single-copy (in-paralog only) proteins, from which 901 simple (present/absent) and 3,806 complex (multistate) indels were extracted. Simple indels are mostly small (1-7aa) with a most frequent size class of 1aa. However, even these simple looking indels show a surprisingly high level of hidden homoplasy (multiple independent origins). Among the apparently homoplasy-free simple indels, we identify 69 potential clade-defining indels (CDIs) that may warrant closer examination. CDIs show a very uneven taxonomic distribution among Viridiplante (13 CDIs), Fungi (40 CDIs), and Metazoa (0 CDIs). An examination of singleton indels shows an excess of insertions over deletions in nearly all examined taxa. This excess averages 2.31 overall, with a maximum observed value of 7.5 fold.

Conclusions

We find considerable potential for identifying taxon-marker indels using an automated pipeline. However, it appears that simple indels in universal proteins are too rare and homoplasy-rich to be used for pure indel-based phylogeny. The excess of insertions over deletions seen in nearly every genome and major group examined maybe useful in defining more realistic gap penalties for sequence alignment. This bias also suggests that insertions in highly conserved proteins experience less purifying selection than do deletions.

【 授权许可】

   
2013 Ajawatanawong and Baldauf; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150116020748873.pdf 3654KB PDF download
Figure 10. 42KB Image download
Figure 9. 116KB Image download
Figure 8. 70KB Image download
Figure 7. 51KB Image download
Figure 6. 78KB Image download
Figure 5. 52KB Image download
Figure 4. 68KB Image download
Figure 3. 143KB Image download
Figure 2. 71KB Image download
Figure 1. 65KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

Figure 10.

【 参考文献 】
  • [1]Chan SK, Hsing M, Hormozdiari F, Cherkasov A: Relationship between insertion/deletion (indel) frequency of proteins and essentiality. BMC Bioinforma 2007, 28:227.
  • [2]Romero PR, Zaidi S, Fang YY, Uversky VN, Radivojac P, Oldfield CJ, Cortese MS, Sickmeier M, LeGall T, Obradovic Z, Dunker AK: Alternative splicing in concert with protein intrinsic disorder enables increased functional diversity in multicellular organisms. Proc Natl Acad Sci U S A 2006, 103:8390-8395.
  • [3]Hormozdiari F, Hsing M, Salari R, Schönhuth A, Chan SK, Sahinalp SC, Cherkasov A: Effect of insertions and deletions (indels) on wirings in protein-protein interaction networks: a large-scale study. J Comp Biol 2009, 16:159-167.
  • [4]Zhang Z, Xing C, Wang L, Gong B, Liu H: IndelFR: a database of indels in protein structures and their flanking regions. Nucleic Acids Res 2011, 40:512-518.
  • [5]Zhang Z, Huang J, Wang Z, Wang L, Gao P: Impact of indels on the flanking regions in structural domains. Mol Biol Evol 2011, 28:291-301.
  • [6]Pascarella S, Argos P: Analysis of insertions/deletions in protein structures. J Mol Biol 1992, 224:461-471.
  • [7]Benner SA, Cohen MA, Gonnet GH: Empirical and structural models for insertions and deletions in the divergent evolution of proteins. J Mol Biol 1993, 229:1065-1082.
  • [8]Wolf Y, Madej T, Babenko V, Shoemaker B, Panchenko AR: Long-term trends in evolution of indels in protein sequences. BMC Evol Biol 2007, 7:19. BioMed Central Full Text
  • [9]Podlaha O, Zhang J: Positive selection on protein-length in the evolution of a primate sperm ion channel. Proc Natl Acad Sci USA 2003, 100:12241-12246.
  • [10]Cherkasov A, Lee SJ, Nandan D, Reiner NE: Large-scale survey for potentially targetable indels in bacterial and protozoan proteins. Proteins: Struct, Funct, Bioinf 2006, 62:371-380.
  • [11]Nandan D, Lopez M, Ban F, Huang M, Li Y, Reiner NE, Cherkasov A: Indel-based targeting of essential proteins in human pathogens that have close host orthologue(s): Discovery of selective inhibitors for Leishmania donovani elongation factor-1α. Proteins: Struct, Funct, Genet 2007, 67:53-64.
  • [12]Baldauf SL, Palmer JD: Animals and fungi are each other's closest relatives: congruent evidence from multiple proteins. Proc Natl Acad Sci U S A 1993, 90:11558-11562.
  • [13]Inagaki Y, Doolittle WF, Baldauf SL, Roger AJ: Lateral transfer of an EF-1α gene: origin and evolution of the large subunit of ATP sulfurylase in eubacteria. Curr Biol 2002, 12:772-776.
  • [14]de Jong WW, van Dijk MA, Poux C, Kappe G, van Rheede T, Madsen O: Indels in protein-coding sequences of Euarchontoglires constrain the rooting of the eutherian tree. Mol Phylogenet Evol 2003, 28:328-340.
  • [15]van Rheede T, Bastiaans T, Boone DN, Hedges SB, de Jong WW, Madsen O: The platypus is in its place: nuclear genes and indels confirm the sister group relationship of monotremes and therians. Mol Biol Evol 2005, 23:587-597.
  • [16]Atkinson GC, Baldauf SL: Evolution of elongation factor G and the origins of mitochondrial and chloroplast forms. Mol Biol Evol 2011, 28:1281-1292.
  • [17]Mullaney JM, Mills RE, Pittard WS, Devine SE: Small insertions and deletions (INDELs) in human genomes. Hum Mol Genet 2010, 19:R131-R136.
  • [18]Kim R, Guo JT: Systematic analysis of short internal indels and their impact on protein folding. BMC Struct Biol 2010, 4:10-24.
  • [19]Michalsky E, Goede A, Preissner R: Loops In Proteins (LIP)—a comprehensive loop database for homology modeling. Protein Eng 2003, 16:979-985.
  • [20]Espadaler J, Fernandez-Fuentes N, Hermoso A, Querol E, Aviles FX, Sternberg MJ, Oliva B: ArchDB: automated protein loop classification as a tool for structural genomics. Nucleic Acids Res 2004, 32:D185-188.
  • [21]Krissinel E, Henrick K: Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D: Biol Crystallogr 2004, 60:2256-2268.
  • [22]Hsing M, Cherkasov A: Indel PDB: a database of structural insertions and deletions derived from sequence alignments of closely related proteins. BMC Bioinforma 2008, 9:293. BioMed Central Full Text
  • [23]Chen CH, Chuang TJ, Liao BY, Chen FC: Scanning for the Signatures of Positive Selection for Human-Specific Insertions and Deletions. Genome Biol Evol 2009, 1:415-419.
  • [24]Lloyd DG, Calder VL: Multi-residue gaps, a class of molecular characters with exceptional reliability for phylogenetic analyses. J Evol Biol 1991, 4:9-21.
  • [25]Rokas A, Holland PWH: Rare genomic changes as a tool for phylogenetics. Trends Ecol Evol 2000, 15:454-459.
  • [26]Shedlock AM, Okada N: SINE insertions: powerful tools for molecular systematics. BioEssay 2000, 22:148-160.
  • [27]Gupta RS, Mok A: Phylogenomics and signature proteins for the alpha proteobacteria and its main groups. BMC Microbiol 2007, 7:106. BioMed Central Full Text
  • [28]Roy SW, Irimia M: Origins of human malaria: rare genomic changes and full mitochondrial genomes confirm the relationship of Plasmodium falciparum to other mammalian parasites but complicate the origins of Plasmodium vivax. Mol Biol Evol 2008, 25:1192-1198.
  • [29]Masta SE, McCall A, Longhorn SJ: Rare genomic changes and mitochondrial sequences provide independent support for congruent relationships among the sea spiders (Arthropoda, Pycnogonida). Mol Phylo Evol 2010, 57:59-70.
  • [30]Simmons MP, Ochoterena H: Gaps as characters in sequence-based phylogenetic analyses. Syst Biol 2000, 49:369-381.
  • [31]Young ND, Healy J: GapCoder automates the use of indel characters in phylogenetic analysis. BMC Bioinforma 2003, 4:6. BioMed Central Full Text
  • [32]Redelings BD, Suchard MA: Incorporating indel information into phylogeny estimation for rapidly emerging pathogen. BMC Evol Biol 2007, 7:40. BioMed Central Full Text
  • [33]Simmons MP, Müller K, Norton AP: The relative performance of indel-coding methods in simulations. Mol Phylo Evol 2007, 44:724-740.
  • [34]Bapteste E: Philippe: The potential value of indels as phylogenetic markers: position of trichomonads as a case study. Mol Biol Evol 2002, 19:972-7.
  • [35]Keeling PJ, Palmer JD: Lateral transfer at the gene and subgenic levels in the evolution of eukaryotic enolase. Proc Natl Acad Sci U S A 2001, 98:10745-10750.
  • [36]Maeso I, Roy SW, Irimia M: Widespread recurrent evolution of genomic features. Genome Biol Evol 2012, 4:486-500.
  • [37]Belinky F, Cohen O, Huchon D: Large-scale parsimony analysis of metazoan indels in protein-coding genes. Mol Biol Evol 2010, 27:441-451.
  • [38]Graybeal A: Is it better to add taxa or characters to a difficult phylogenetic problem? Syst Biol 1998, 47:9-17.
  • [39]Ajawatanawong P, Atkinson GC, Watson-Haigh NS, MacKenzie B, Baldauf SL: SeqFIRE: a web application for automated extraction of indel regions and conserved blocks from protein multiple sequence alignments. Nucleic Acids Res 2012, 40:W340-W347.
  • [40]Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004, 32:1792-1797.
  • [41]Edgar RC: MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinforma 2004, 5:113. BioMed Central Full Text
  • [42]Rost B: Twilight zone of protein sequence alignment. Protein Eng 1999, 12:85-94.
  • [43]Brocchieri L, Karlin S: Protein length in eukaryotic and prokaryotic proteomes. Nucleic Acids Res 2005, 33:3390-3400.
  • [44]Gupta RS: Protein signatures (molecular synapomorphies) that are distinctive characteristics of the major cyanobacterial clades. Int J Syst Evol Microbiol 2009, 59:2510-2526.
  • [45]Parfrey LW, Grant J, Tekle YI, Lasek-Nesselquist E, Morrison HG, Sogin ML, Patterson DJ, Katz LA: Broadly sampled multigene analyses yield a well-resolved eukaryotic tree of life. Syst Biol 2010, 59:518-533.
  • [46]Hejnol A, Obst M, Stamatakis A, Ott M, Rouse GW, Edgecombe GD, Martinez P, Baguñà J, Bailly X, Jondelius U, Wiens M, Müller WE, Seaver E, Wheeler WC, Martindale MQ, Giribet G, Dunn CW: Assessing the root of bilaterian animals with scalable phylogenomic methods. Proc Biol Sci 2009, 276:4261-4270.
  • [47]Stajich JE, Berbee ML, Blackwell M, Hibbett DS, James TY, Spatafora JW, Taylor JW: Primer – the fungi. Curr Biol 2009, 19:R840-R845.
  • [48]de la Chaux N, Messer PW, Arndt PF: DNA indels in coding regions reveal selective constraints on protein evolution in the human lineage. BMC Evol Biol 2007, 7:191. BioMed Central Full Text
  • [49]Swofford DL, Olsen GJ, Waddell PJ, Hillis DM: Phylogenetic inference. In Molecular Systematics. Edited by Hillis DM, Moritz C, Mable BK. Sunderland, Massachusetts, U.S.A: Sinauer Associates, Inc; 1996:407-425.
  • [50]Keeling PJ, Palmer JD: Parabasalian flagellates are ancient eukaryotes. Nature 2000, 405:635-637.
  • [51]Berney C, Pawlowski J: A molecular time-scale for eukaryote evolution recalibrated with the continuous microfossil record. Proc Biol Sci 2006, 273:1867-1872.
  • [52]Petrov DA: Mutational equilibrium model of genome size evolution. Theor Popul Biol 2002, 61:531-544.
  • [53]Denver DR, Morris K, Lynch M, Thomas WK: High mutation rate and predominance of insertions in the Caenorhabditis elegans nuclear genome. Nature 2004, 430:679-682.
  • [54]Zhang Z, Gerstein M: Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes. Nucleic Acids Res 2003, 31:5338-5348.
  • [55]Garcia-Diaz M, Kunkel TA: Mechanism of a genetic glissando: structural biology of indel mutations. Trends Biochem Sci 2006, 31:206-214.
  • [56]Löytynoja A, Goldman N: Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 2008, 320:1632-1635.
  • [57]Geer LY, Marchler-Bauer A, Geer RC, Han L, He J, He S, Liu C, Shi W, Bryant SH: The NCBI BioSystems database. Nucleic Acids Res 2010, 38:D492-D496.
  • [58]Grigoriev IV, Nordberg H, Shabalov I, Aerts A, Cantor M, Goodstein D, Kuo A, Minovitsky S, Nikitin R, Ohm RA, Otillar R, Poliakov A, Ratnere I, Riley R, Smirnova T, Rokhsar D, Dubchak I: The genome portal of the department of energy joint genome institute. Nucleic Acids Res 2011, 40:D26-D32.
  • [59]O’Brien KP, Remm M, Sonnhammer ELL: Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res 2005, 33:D476-D480.
  • [60]Gouy M, Guindon S, Gascuel O: SeaView version 4: a multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol Biol Evol 2010, 27:221-224.
  文献评价指标  
  下载次数:33次 浏览次数:3次