期刊论文详细信息
BMC Evolutionary Biology
Emergence of novel domains in proteins
M Mar Albà1  Macarena Toll-Riera2 
[1] Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain;Current address: Department of Zoology, University of Oxford, Oxford, UK
关键词: Domain age;    Gene age;    Novel domain;    Evolutionary rate;    Lineage-specific domain;    Protein domain;   
Others  :  1129782
DOI  :  10.1186/1471-2148-13-47
 received in 2012-11-28, accepted in 2013-01-31,  发布年份 2013
PDF
【 摘 要 】

Background

Proteins are composed of a combination of discrete, well-defined, sequence domains, associated with specific functions that have arisen at different times during evolutionary history. The emergence of novel domains is related to protein functional diversification and adaptation. But currently little is known about how novel domains arise and how they subsequently evolve.

Results

To gain insights into the impact of recently emerged domains in protein evolution we have identified all human young protein domains that have emerged in approximately the past 550 million years. We have classified them into vertebrate-specific and mammalian-specific groups, and compared them to older domains. We have found 426 different annotated young domains, totalling 995 domain occurrences, which represent about 12.3% of all human domains. We have observed that 61.3% of them arose in newly formed genes, while the remaining 38.7% are found combined with older domains, and have very likely emerged in the context of a previously existing protein. Young domains are preferentially located at the N-terminus of the protein, indicating that, at least in vertebrates, novel functional sequences often emerge there. Furthermore, young domains show significantly higher non-synonymous to synonymous substitution rates than older domains using human and mouse orthologous sequence comparisons. This is also true when we compare young and old domains located in the same protein, suggesting that recently arisen domains tend to evolve in a less constrained manner than older domains.

Conclusions

We conclude that proteins tend to gain domains over time, becoming progressively longer. We show that many proteins are made of domains of different age, and that the fastest evolving parts correspond to the domains that have been acquired more recently.

【 授权许可】

   
2013 Toll-Riera and Alba; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150226111340425.pdf 474KB PDF download
Figure 5. 48KB Image download
Figure 4. 45KB Image download
Figure 3. 48KB Image download
Figure 2. 43KB Image download
Figure 1. 57KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

【 参考文献 】
  • [1]Chothia C, Gough J, Vogel C, Teichmann SA: Evolution of the protein repertoire. Science 2003, 300(5626):1701-1703.
  • [2]Muller A, MacCallum RM, Sternberg MJ: Structural characterization of the human proteome. Genome Res 2002, 12(11):1625-1641.
  • [3]Vogel C, Bashton M, Kerrison ND, Chothia C, Teichmann SA: Structure, function and evolution of multidomain proteins. Curr Opin Struct Biol 2004, 14(2):208-216.
  • [4]Ekman D, Bjorklund AK, Frey-Skott J, Elofsson A: Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions. J Mol Biol 2005, 348(1):231-243.
  • [5]Moore AD, Bjorklund AK, Ekman D, Bornberg-Bauer E, Elofsson A: Arrangements in the modular evolution of proteins. Trends Biochem Sci 2008, 33(9):444-451.
  • [6]Buljan M, Bateman A: The evolution of protein domain families. Biochem Soc Trans 2009, 37(Pt 4):751-755.
  • [7]Pal LR, Guda C: Tracing the origin of functional and conserved domains in the human proteome: implications for protein evolution at the modular level. BMC Evol Biol 2006, 6:91. BioMed Central Full Text
  • [8]Apic G, Gough J, Teichmann SA: An insight into domain combinations. Bioinformatics 2001, 17(Suppl 1):S83-S89.
  • [9]Marsh JA, Teichmann SA: How do proteins gain new domains? Genome Biol 2010, 11(7):126. BioMed Central Full Text
  • [10]Buljan M, Frankish A, Bateman A: Quantifying the mechanisms of domain gain in animal proteins. Genome Biol 2010, 11(7):R74. BioMed Central Full Text
  • [11]Moore AD, Bornberg-Bauer E: The dynamics and evolutionary potential of domain loss and emergence. Mol Biol Evol 2012, 29(2):787-796.
  • [12]Eddy SR: Profile hidden Markov models. Bioinformatics 1998, 14(9):755-763.
  • [13]Capra JA, Williams AG, Pollard KS: ProteinHistorian: tools for the comparative analysis of eukaryote protein origin. PLoS Comput Biol 2012, 8(6):e1002567.
  • [14]Margolin JF, Friedman JR, Meyer WK, Vissing H, Thiesen HJ, Rauscher FJ 3rd: Kruppel-associated boxes are potent transcriptional repression domains. Proc Natl Acad Sci U S A 1994, 91(10):4509-4513.
  • [15]Toll-Riera M, Rado-Trilla N, Martys F, Alba MM: Role of low-complexity sequences in the formation of novel protein coding sequences. Mol Biol Evol 2012, 29(3):883-886.
  • [16]Gibbs S, Fijneman R, Wiegant J, van Kessel AG, van De Putte P, Backendorf C: Molecular characterization and evolution of the SPRR family of keratinocyte differentiation markers encoding small proline-rich proteins. Genomics 1993, 16(3):630-637.
  • [17]Capra JA, Pollard KS, Singh M: Novel genes exhibit distinct patterns of function acquisition and network integration. Genome Biol 2010, 11(12):R127. BioMed Central Full Text
  • [18]Tautz D, Domazet-Loso T: The evolutionary origin of orphan genes. Nat Rev Genet 2011, 12(10):692-702.
  • [19]Domazet-Loso T, Tautz D: An evolutionary analysis of orphan genes in Drosophila. Genome Res 2003, 13(10):2213-2219.
  • [20]Toll-Riera M, Bosch N, Bellora N, Castelo R, Armengol L, Estivill X, Alba MM: Origin of primate orphan genes: a comparative genomics approach. Mol Biol Evol 2009, 26(3):603-612.
  • [21]Alba MM, Castresana J: Inverse relationship between evolutionary rate and age of mammalian genes. Mol Biol Evol 2005, 22(3):598-606.
  • [22]Cai JJ, Woo PC, Lau SK, Smith DK, Yuen KY: Accelerated evolutionary rate may be responsible for the emergence of lineage-specific genes in ascomycota. J Mol Evol 2006, 63(1):1-11.
  • [23]Cai JJ, Petrov DA: Relaxed purifying selection and possibly high rate of adaptation in primate lineage-specific genes. Genome Biol Evol 2010, 2:393-409.
  • [24]Hubbard TJ, Aken BL, Ayling S, Ballester B, Beal K, Bragin E, Brent S, Chen Y, Clapham P, Clarke L: Ensembl 2009. Nucleic Acids Res 2009, 37(Database issue):D690-D697.
  • [25]Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J: The Pfam protein families database. Nucleic Acids Res 2012, 40(Database issue):D290-D301.
  • [26]Williams AJ, Blacklow SC, Collins T: The zinc finger-associated SCAN box is a conserved oligomerization domain. Mol Cell Biol 1999, 19(12):8526-8535.
  • [27]Emerson RO, Thomas JH: Gypsy and the birth of the SCAN domain. J Virol 2011, 85(22):12043-12052.
  • [28]Castresana J, Guigo R, Alba MM: Clustering of genes coding for DNA binding proteins in a region of atypical evolution of the human genome. J Mol Evol 2004, 59(1):72-79.
  • [29]Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W: Initial sequencing and analysis of the human genome. Nature 2001, 409(6822):860-921.
  • [30]Rattan R, Narita K, Chien J, Maguire JL, Shridhar R, Giri S, Shridhar V: TCEAL7, a putative tumor suppressor gene, negatively regulates NF-kappaB pathway. Oncogene 2010, 29(9):1362-1373.
  • [31]Ekman D, Bjorklund AK, Elofsson A: Quantification of the elevated rate of domain rearrangements in metazoa. J Mol Biol 2007, 372(5):1337-1348.
  • [32]Laurie S, Toll-Riera M, Rado-Trilla N, Alba MM: Sequence shortening in the rodent ancestor. Genome Res 2012, 22(3):478-485.
  • [33]Bjorklund AK, Ekman D, Light S, Frey-Skott J, Elofsson A: Domain rearrangements in protein evolution. J Mol Biol 2005, 353(4):911-923.
  • [34]Fong JH, Geer LY, Panchenko AR, Bryant SH: Modeling the evolution of protein domain architectures using maximum parsimony. J Mol Biol 2007, 366(1):307-315.
  • [35]Frenkel ZM, Trifonov EN: Origin and evolution of genes and genomes. Crucial role of triplet expansions. J Biomol Struct Dyn 2012, 30(2):201-210.
  • [36]Vibranovski MD, Sakabe NJ, de Oliveira RS, de Souza SJ: Signs of ancient and modern exon-shuffling are correlated to the distribution of ancient and modern domains along proteins. J Mol Evol 2005, 61(3):341-350.
  • [37]Weiner J 3rd, Beaussart F, Bornberg-Bauer E: Domain deletions and substitutions in the modular protein evolution. FEBS J 2006, 273(9):2037-2047.
  • [38]Daubin V, Ochman H: Bacterial genomes as new gene homes: the genealogy of ORFans in E. coli. Genome Res 2004, 14(6):1036-1042.
  • [39]Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389-3402.
  • [40]Notredame C, Higgins DG, Heringa J: T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 2000, 302(1):205-217.
  • [41]Yang Z: PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 2007, 24(8):1586-1591.
  • [42]R: A languange and environment for statistical computing. Vienna (Austria): R fundation for statistical computing; 2007.
  文献评价指标  
  下载次数:47次 浏览次数:10次