BMC Evolutionary Biology | |
Dissecting the role of low-complexity regions in the evolution of vertebrate proteins | |
MMar Albà1  Núria Radó-Trilla2  | |
[1] Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain;Evolutionary Genomics Group, Research Programme on Biomedical Informatics (GRIB) - IMIM (Hospital del Mar Research Institute), Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Spain | |
关键词: Slippage; Vertebrate protein; Amino acid tandem repeat; Simple sequence; Low-complexity region; | |
Others : 1140435 DOI : 10.1186/1471-2148-12-155 |
|
received in 2012-04-17, accepted in 2012-07-30, 发布年份 2012 | |
【 摘 要 】
Background
Low-complexity regions (LCRs) in proteins are tracts that are highly enriched in one or a few amino acids. Given their high abundance, and their capacity to expand in relatively short periods of time through replication slippage, they can greatly contribute to increase protein sequence space and generate novel protein functions. However, little is known about the global impact of LCRs on protein evolution.
Results
We have traced back the evolutionary history of 2,802 LCRs from a large set of homologous protein families from H.sapiens, M.musculus, G.gallus, D.rerio and C.intestinalis. Transcriptional factors and other regulatory functions are overrepresented in proteins containing LCRs. We have found that the gain of novel LCRs is frequently associated with repeat expansion whereas the loss of LCRs is more often due to accumulation of amino acid substitutions as opposed to deletions. This dichotomy results in net protein sequence gain over time. We have detected a significant increase in the rate of accumulation of novel LCRs in the ancestral Amniota and mammalian branches, and a reduction in the chicken branch. Alanine and/or glycine-rich LCRs are overrepresented in recently emerged LCR sets from all branches, suggesting that their expansion is better tolerated than for other LCR types. LCRs enriched in positively charged amino acids show the contrary pattern, indicating an important effect of purifying selection in their maintenance.
Conclusion
We have performed the first large-scale study on the evolutionary dynamics of LCRs in protein families. The study has shown that the composition of an LCR is an important determinant of its evolutionary pattern.
【 授权许可】
2012 Radó-Trilla and Albà; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150325021706901.pdf | 578KB | download | |
Figure 4. | 14KB | Image | download |
Figure 3. | 27KB | Image | download |
Figure 2. | 26KB | Image | download |
Figure 1. | 50KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
【 参考文献 】
- [1]Golding GB: Simple sequence is abundant in eukaryotic proteins. Protein Sci 1999, 8(6):1358-1361.
- [2]Wootton JC, Federhen S: Analysis of compositionally biased regions in sequence databases. Methods Enzymol 1996, 266:554-571.
- [3]Green H, Wang N: Codon reiteration and the evolution of proteins. Proc Natl Acad Sci U S A 1994, 91(10):4298-4302.
- [4]Karlin S, Brocchieri L, Bergman A, Mrazek J, Gentles AJ: Amino acid runs in eukaryotic proteomes and disease associations. Proc Natl Acad Sci U S A 2002, 99(1):333-338.
- [5]Alba MM, Guigo R: Comparative analysis of amino acid repeats in rodents and humans. Genome Res 2004, 14(4):549-554.
- [6]Alba MM, Tompa P, Veitia RA: Amino acid repeats and the structure and evolution of proteins. Genome Dyn 2007, 3:119-130.
- [7]Nishizawa K, Nishizawa M, Kim KS: Tendency for local repetitiveness in amino acid usages in modern proteins. J Mol Biol 1999, 294(4):937-953.
- [8]Alba MM, Laskowski RA, Hancock JM: Detecting cryptically simple protein sequences using the SIMPLE algorithm. Bioinformatics 2002, 18(5):672-678.
- [9]Levinson G, Gutman GA: Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol Biol Evol 1987, 4(3):203-221.
- [10]Alba MM, Santibanez-Koref MF, Hancock JM: Conservation of polyglutamine tract size between mice and humans depends on codon interruption. Mol Biol Evol 1999, 16(11):1641-1644.
- [11]Mularoni L, Veitia RA, Alba MM: Highly constrained proteins contain an unexpectedly large number of amino acid tandem repeats. Genomics 2007, 89(3):316-325.
- [12]Rorick MM, Wagner GP: The origin of conserved protein domains and amino acid repeats via adaptive competition for control over amino acid residues. J Mol Evol 2010, 70(1):29-43.
- [13]Faux NG, Bottomley SP, Lesk AM, Irving JA, Morrison JR, de la Banda MG, Whisstock JC: Functional insights from the distribution and role of homopeptide repeat-containing proteins. Genome Res 2005, 15(4):537-551.
- [14]Brown L, Paraso M, Arkell R, Brown S: In vitro analysis of partial loss-of-function ZIC2 mutations in holoprosencephaly: alanine tract expansion modulates DNA binding and transactivation. Hum Mol Genet 2005, 14(3):411-420.
- [15]Gerber HP, Seipel K, Georgiev O, Hofferer M, Hug M, Rusconi S, Schaffner W: Transcriptional activation modulated by homopolymeric glutamine and proline stretches. Science 1994, 263(5148):808-811.
- [16]Galant R, Carroll SB: Evolution of a transcriptional repression domain in an insect Hox protein. Nature 2002, 415(6874):910-913.
- [17]Gatchel JR, Zoghbi HY: Diseases of unstable repeat expansion: mechanisms and common principles. Nat Rev Genet 2005, 6(10):743-755.
- [18]Brown LY, Brown SA: Alanine tracts: the expanding story of human illness and trinucleotide repeats. Trends Genet 2004, 20(1):51-58.
- [19]Simon M, Hancock JM: Tandem and cryptic amino acid repeats accumulate in disordered regions of proteins. Genome Biol 2009, 10(6):R59. BioMed Central Full Text
- [20]Zilversmit MM, Volkman SK, DePristo MA, Wirth DF, Awadalla P, Hartl DL: Low-complexity regions in Plasmodium falciparum: missing links in the evolution of an extreme genome. Mol Biol Evol 2010, 27(9):2198-2209.
- [21]Salichs E, Ledda A, Mularoni L, Alba MM, de la Luna S: Genome-wide analysis of histidine repeats reveals their role in the localization of human proteins to the nuclear speckles compartment. PLoS Genet 2009, 5(3):e1000397.
- [22]Shepard PJ, Hertel KJ: The SR protein family. Genome Biol 2009, 10(10):242. BioMed Central Full Text
- [23]Janody F, Sturny R, Schaeffer V, Azou Y, Dostatni N: Two distinct domains of Bicoid mediate its transcriptional downregulation by the Torso pathway. Development 2001, 128(12):2281-2290.
- [24]Hanna-Rose W, Hansen U: Active repression mechanisms of eukaryotic transcription repressors. Trends Genet 1996, 12(6):229-234.
- [25]Tompa P: Intrinsically unstructured proteins evolve by repeat expansion. Bioessays 2003, 25(9):847-855.
- [26]Jorda J, Xue B, Uversky VN, Kajava AV: Protein tandem repeats - the more perfect, the less structured. FEBS J 2010, 277(1):2673-2682.
- [27]Laurie S, Toll-Riera M, Rado-Trilla N, Alba MM: Sequence shortening in the rodent ancestor. Genome Res 2012, 22(3):478-485.
- [28]Chakraborty R, Kimmel M, Stivers DN, Davison LJ, Deka R: Relative mutation rates at di-, tri-, and tetranucleotide microsatellite loci. Proc Nat Acad Sci U S A 1997, 94:1041-1046.
- [29]Rooney AP, Zhang J, Nei M: An unusual form of purifying selection in a sperm protein. Mol Biol Evol 2000, 17(2):278-283.
- [30]Hancock JM, Simon M: Simple sequence repeats in proteins and their significance for network evolution. Gene 2005, 345(1):113-118.
- [31]Wren JD, Forgacs E, Fondon JW, Pertsemlidis A, Cheng SY, Gallardo T, Williams RS, Shohet RV, Minna JD, Garner HR: Repeat polymorphisms within gene regions: phenotypic and evolutionary implications. Am J Hum Genet 2000, 67(2):345-356.
- [32]Mularoni L, Guigo R, Alba MM: Mutation patterns of amino acid tandem repeats in the human proteome. Genome Biol 2006, 7(4):R33. BioMed Central Full Text
- [33]Li YC, Korol AB, Fahima T, Nevo E: Microsatellites within genes: structure, function, and evolution. Mol Biol Evol 2004, 21(6):991-1007.
- [34]Mularoni L, Ledda A, Toll-Riera M, Alba MM: Natural selection drives the accumulation of amino acid tandem repeats in human proteins. Genome Res 2010, 20(6):745-754.
- [35]Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E: EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res 2009, 19(2):327-335.
- [36]Donoghue PC, Benton MJ: Rocks and clocks: calibrating the Tree of Life using fossils and molecules. Trends Ecol Evol 2007, 22(8):424-431.
- [37]Haerty W, Golding GB: Genome-wide evidence for selection acting on single amino acid repeats. Genome Res 2010, 20(6):755-760.
- [38]Swire J: Selection on synthesis cost affects interprotein amino acid usage in all three domains of life. J Mol Evol 2007, 64(5):558-571.
- [39]Chi LM, Lam SL: Structural roles of CTG repeats in slippage expansion during DNA replication. Nucleic Acids Res 2005, 33(5):1604-1617.
- [40]Mar Alba M, Santibanez-Koref MF, Hancock JM: Amino acid reiterations in yeast are overrepresented in particular classes of proteins and show evidence of a slippage-like mutational process. J Mol Evol 1999, 49(6):789-797.
- [41]Huntley MA, Clark AG: Evolutionary Analysis of Amino Acid Repeats Across the Genomes of 12 Drosophila Species. Mol Biol Evol 2007, 24(12):2598-2609.
- [42]Herskovits JS, Shpetner HS, Burgess CC, Vallee RB: Microtubules and Src homology 3 domains stimulate the dynamin GTPase via its C-terminal domain. Proc Natl Acad Sci U S A 1993, 90(24):11468-11472.
- [43]Trifonov EN, Bettecken T: Sequence fossils, triplet expansion, and reconstruction of earliest codons. Gene 1997, 205(1–2):1-6.
- [44]Trifonov EN: The origin of the genetic code and of the earliest oligopeptides. Res Microbiol 2009, 160(7):481-486.
- [45]Ohno S, Epplen JT: The primitive code and repeats of base oligomers as the primordial protein-encoding sequence. Proc Natl Acad Sci U S A 1983, 80(11):3391-3395.
- [46]Toll-Riera M, Rado-Trilla N, Martys F, Alba MM: Role of low-complexity sequences in the formation of novel protein coding sequences. Mol Biol Evol 2012, 29(3):883-886.
- [47]Caburet S, Cocquet J, Vaiman D, Veitia RA: Coding repeats and evolutionary "agility". Bioessays 2005, 27(6):581-587.
- [48]Nakachi Y, Hayakawa T, Oota H, Sumiyama K, Wang L, Ueda S: Nucleotide compositional constraints on genomes generate alanine-, glycine-, and proline-rich structures in transcription factors. Mol Biol Evol 1997, 14(10):1042-1049.
- [49]Sumiyama K, Washio-Watanabe K, Saitou N, Hayakawa T, Ueda S: Class III POU genes: generation of homopolymeric amino acid repeats under GC pressure in mammals. J Mol Evol 1996, 43(3):170-178.
- [50]Zhou Y, Liu J, Han L, Li ZG, Zhang Z: Comprehensive analysis of tandem amino acid repeats from ten angiosperm genomes. BMC Genomics 2011, 12:632. BioMed Central Full Text
- [51]DePristo MA, Zilversmit MM, Hartl DL: On the abundance, amino acid composition, and evolutionary dynamics of low-complexity regions in proteins. Gene 2006, 378:19-30.
- [52]Flicek P, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, et al.: Ensembl 2008. Nucleic Acids Res 2008, 36(Database issue):D707-D714.
- [53]Notredame C, Higgins DG, Heringa J: T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 2000, 302(1):205-217.
- [54]Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25(1):25-29.
- [55]R: A languange and environment for statistical computing. Vienna (Austria): R fundation for statistical computing; 2007.