BMC Genomics | |
Identification, characterization, and utilization of single copy genes in 29 angiosperm genomes | |
Peigen Xiao1  Lijia Xu1  Yong Peng1  Fengming Han1  | |
[1] Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences, Beijing 100193, PR China | |
关键词: Phylogeny; Alternative splicing; Ka/Ks; Gene expression; GC3; Codon usage; Gene Ontology; Duplication; Single copy gene; | |
Others : 857069 DOI : 10.1186/1471-2164-15-504 |
|
received in 2013-11-19, accepted in 2014-06-17, 发布年份 2014 | |
【 摘 要 】
Background
Single copy genes are common across angiosperm genomes. With the sufficiently high quality sequenced genomes, the identification of large-scale single copy genes among multiple species is possible. Although some characteristics have been reported, our study provides novel insights into single copy genes.
Results
We identified single copy genes across 29 angiosperm genomes. A significant negative correlation was found between the number of duplicate blocks and the number of single copy genes. We found that a considerable number of single copy genes are located in organelles, showing a preference for binding and catalytic activity. The analysis of effective number of codons (Nc) illustrates that single copy genes have a stronger codon bias than non-single copy genes in eudicots. The relative high expression level of single copy genes was partially confirmed by the RNA-seq data, rather than the Codon Adaptation Index (CAI). Unlike in most other species, a strongly negatively correlation occurs between Nc and GC3 among single copy genes in grass genomes. When compared to all non-single copy genes, single copy genes indicate more conservation (as indicated by Ka and Ks values). But our alternative splicing (AS) results reveal that selective constraints are weaker in single copy genes than in low copy family genes (1–10 in-paralogs) and stronger than high copy family genes (>10 in-paralogs). Using concatenated shared single copy genes, we obtained a well-resolved phylogenetic tree. With the addition of intron sequences, the branch support is improved, but striking incongruences are also evident. Therefore, it is noteworthy that inclusion of intron sequences seems more appropriate for the phylogenetic reconstruction at lower taxonomic levels.
Conclusions
Our analysis provides insight into the evolutionary characteristics of single copy genes across 29 angiosperm genomes. The results suggest that there are key differences in evolutionary constraints between single copy genes and non-single copy genes. And to some extent, these evolutionary constraints show some species-specific differences, especially between eudicots and monocots. Our preliminary evidence also suggests that the concatenated shared single copy genes are well suited for use in resolving phylogenetic relationships.
【 授权许可】
2014 Han et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20140723064001467.pdf | 887KB | download | |
113KB | Image | download | |
56KB | Image | download | |
66KB | Image | download |
【 图 表 】
【 参考文献 】
- [1]Zhang J: Evolution by gene duplication: an update. Trends Ecol Evol 2003, 18(6):292-298.
- [2]Birchler JA, Newton KJ: Modulation of protein levels in chromosomal dosage series of maize: the biochemical basis of aneuploid syndromes. Genetics 1981, 99(2):247-266.
- [3]Song K, Lu P, Tang K, Osborn TC: Rapid genome change in synthetic polyploids of Brassica and its implications for polyploid evolution. Proc Natl Acad Sci U S A 1995, 92(17):7719-7723.
- [4]Shaked H, Kashkush K, Ozkan H, Feldman M, Levy AA: Sequence elimination and cytosine methylation are rapid and reproducible responses of the genome to wide hybridization and allopolyploidy in wheat. Plant Cell 2001, 13(8):1749-1759.
- [5]Papp B, Pal C, Hurst LD: Dosage sensitivity and the evolution of gene families in yeast. Nature 2003, 424(6945):194-197.
- [6]Birchler JA, Riddle NC, Auger DL, Veitia RA: Dosage balance in gene regulation: biological implications. Trends Genet 2005, 21(4):219-226.
- [7]Edger PP, Pires JC: Gene and genome duplications: the impact of dosage-sensitivity on the fate of nuclear genes. Chromosome Res 2009, 17(5):699-717.
- [8]Makino T, McLysaght A: Ohnologs in the human genome are dosage balanced and frequently associated with disease. Proc Natl Acad Sci U S A 2010, 107(20):9270-9274.
- [9]Paterson AH, Chapman BA, Kissinger JC, Bowers JE, Feltus FA, Estill JC: Many gene and domain families have convergent fates following independent whole-genome duplication events in Arabidopsis, Oryza, Saccharomyces and Tetraodon. Trends Genet 2006, 22(11):597-602.
- [10]Duarte JM, Wall PK, Edger PP, Landherr LL, Ma H, Pires JC, Leebens-Mack J, de Pamphilis CW: Identification of shared single copy nuclear genes in Arabidopsis, Populus, Vitis and Oryza and their phylogenetic utility across various taxonomic levels. BMC Evol Biol 2010, 10:61.
- [11]De Smet R, Adams KL, Vandepoele K, Van Montagu MC, Maere S, Van de Peer Y: Convergent gene loss following gene and genome duplications creates single-copy families in flowering plants. Proc Natl Acad Sci U S A 2013, 110(8):2898-2903.
- [12]Small RL, Cronn RC, Wendel JF: Use of nuclear genes for phylogeny reconstruction in plants. Aust Syst Bot 2004, 17(2):145-170.
- [13]Wu F, Mueller LA, Crouzillat D, Pétiard V, Tanksley SD: Combining bioinformatics and phylogenetics to identify large sets of single-copy orthologous genes (COSII) for comparative, evolutionary and systematic studies: a test case in the euasterid plant clade. Genetics 2006, 174(3):1407-1420.
- [14]Li M, Wunder J, Bissoli G, Scarponi E, Gazzani S, Barbaro E, Saedler H, Varotto C: Development of COS genes as universally amplifiable markers for phylogenetic reconstructions of closely related plant species. Cladistics 2008, 24(5):727-745.
- [15]Emshwiller E, Doyle JJ: Chloroplast-expressed glutamine synthetase (ncpGS): potential utility for phylogenetic studies with an example from Oxalis (Oxalidaceae). Mol Phylogenet Evol 1999, 12(3):310-319.
- [16]Bulmer M: The selection-mutation-drift theory of synonymous codon usage. Genetics 1991, 129(3):897-907.
- [17]Akashi H: Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 1994, 136(3):927-935.
- [18]Parmley JL, Hurst LD: Exonic splicing regulatory elements skew synonymous codon usage near intron-exon boundaries in mammals. Mol Biol Evol 2007, 24(8):1600-1603.
- [19]Zhou T, Weems M, Wilke CO: Translationally optimal codons associate with structurally sensitive sites in proteins. Mol Biol Evol 2009, 26(7):1571-1580.
- [20]Wright F: The ‘effective number of codons’ used in a gene. Gene 1990, 87(1):23-29.
- [21]Sharp PM, Li WH: The codon Adaptation Index-a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 1987, 15(3):1281-1295.
- [22]Xia X: An improved implementation of codon adaptation index. Evol Bioinform Online 2007, 3:53-58.
- [23]Gonçalves I, Duret L, Mouchiroud D: Nature and structure of human genes that generate retropseudogenes. Genome Res 2000, 10(5):672-678.
- [24]Ponger L, Duret L, Mouchiroud D: Determinants of CpG islands: expression in early embryo and isochore structure. Genome Res 2001, 11(11):1854-1860.
- [25]Duret L: Evolution of synonymous codon usage in metazoans. Curr Opin Genet Dev 2002, 12(6):640-649.
- [26]Tatarinova TV, Alexandrov NN, Bouck JB, Feldmann KA: GC3 biology in corn, rice, sorghum and other grasses. BMC Genomics 2010, 11:308.
- [27]Rao YS, Chai XW, Wang ZF, Nie QH, Zhang XQ: Impact of GC content on gene expression pattern in chicken. Genet Sel Evol 2013, 45:9.
- [28]Blencowe BJ: Alternative splicing: new insights from global analyses. Cell 2006, 126(1):37-47.
- [29]Xing Y, Lee C: Alternative splicing and RNA selection pressure–evolutionary consequences for eukaryotic genomes. Nat Rev Genet 2006, 7(7):499-509.
- [30]Keren H, Lev-Maor G, Ast G: Alternative splicing and evolution: diversification, exon definition and function. Nat Rev Genet 2010, 11(5):345-355.
- [31]The Angiosperm Phylogeny G: An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III. Bot J Linn Soc 2009, 161(2):105-121.
- [32]Flagel LE, Wendel JF: Gene duplication and evolutionary novelty in plants. New Phytol 2009, 183(3):557-564.
- [33]Innan H, Kondrashov F: The evolution of gene duplications: classifying and distinguishing between models. Nat Rev Genet 2010, 11(2):97-108.
- [34]Xu J-H, Bennetzen JL, Messing J: Dynamic gene copy number variation in collinear regions of grass genomes. Mol Biol Evol 2012, 29(2):861-871.
- [35]Dopman EB, Hartl DL: A portrait of copy-number polymorphism in Drosophila melanogaster. Proc Natl Acad Sci U S A 2007, 104(50):19920-19925.
- [36]Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, Casagrande A, Choisne N, Aubourg S, Vitulo N, Jubin C, Vezzi A, Legeai F, Hugueney P, Dasilva C, Horner D, Mica E, Jublot D, Poulain J, Bruyère C, Billault A, Segurens B, Gouyvenoux M, Ugarte E, Cattonaro F, Anthouard V, Vico V, Del Fabbro C, Alaux M, Di Gaspero G, Dumas V, et al.: The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 2007, 449(7161):463-467.
- [37]Tang H, Bowers JE, Wang X, Ming R, Alam M, Paterson AH: Synteny and collinearity in plant genomes. Science 2008, 320(5875):486-488.
- [38]Tang H, Wang X, Bowers JE, Ming R, Alam M, Paterson AH: Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps. Genome Res 2008, 18(12):1944-1954.
- [39]Campbell WH, Gowri G: Codon usage in higher plants, green algae, and cyanobacteria. Plant Physiol 1990, 92(1):1-11.
- [40]Fennoy SL, Bailey-Serres J: Synonymous codon usage in Zea mays L. nuclear genes is varied by levels of C and G-ending codons. Nucleic Acids Res 1993, 21(23):5294-5300.
- [41]Sharp PM, Matassi G: Codon usage and genome evolution. Curr Opin Genet Dev 1994, 4(6):851-860.
- [42]Xing Y, Lee C: Evidence of functional selection pressure for alternative splicing events that accelerate evolution of protein subsequences. Proc Natl Acad Sci U S A 2005, 102(38):13526-13531.
- [43]Su Z, Wang J, Yu J, Huang X, Gu X: Evolution of alternative splicing after gene duplication. Genome Res 2006, 16(2):182-189.
- [44]Chen TW, Wu TH, Ng WV, Lin WC: Interrogation of alternative splicing events in duplicated genes during evolution. BMC Genomics 2011, 12(Suppl 3):S16.
- [45]Ner-Gaon H, Leviatan N, Rubin E, Fluhr R: Comparative cross-species alternative splicing in plants. Plant Physiol 2007, 144(3):1632-1641.
- [46]Barkman TJ, Simpson BB: Hybrid origin and parentage of Dendrochilum acuiferum (Orchidaceae) inferred in a phylogenetic context using nuclear and plastid DNA sequence data. Syst Bot 2002, 27(2):209-220.
- [47]Albach DC, Chase MW: Incongruence in Veroniceae (Plantaginaceae): evidence from two plastid and a nuclear ribosomal DNA region. Mol Phylogenet Evol 2004, 32(1):183-197.
- [48]Fehrer J, Gemeinholzer B, Chrtek J Jr, Bräutigam S: Incongruent plastid and nuclear DNA phylogenies reveal ancient intergeneric hybridization in Pilosella hawkweeds (Hieracium, Cichorieae, Asteraceae). Mol Phylogenet Evol 2007, 42(2):347-361.
- [49]Felsenstein J: Cases in which parsimony or compatibility methods will be positively misleading. Syst Bot 1978, 27(4):401-410.
- [50]Phillips MJ, Delsuc F, Penny D: Genome-scale phylogeny and the detection of systematic biases. Mol Biol Evol 2004, 21(7):1455-1458.
- [51]Jeffroy O, Brinkmann H, Delsuc F, Philippe H: Phylogenomics: the beginning of incongruence? Trends Genet 2006, 22(4):225-231.
- [52]Aguileta G, Marthey S, Chiapello H, Lebrun MH, Rodolphe F, Fournier E, Gendrault-Jacquemard A, Giraud T: Assessing the performance of single-copy genes for recovering robust phylogenies. Syst Biol 2008, 57(4):613-627.
- [53]Enright AJ, Van Dongen S, Ouzounis CA: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 2002, 30(7):1575-1584.
- [54]Wang Y, Tang H, DeBarry JD, Tan X, Li J, Wang X, Lee T-h, Jin H, Marler B, Guo H: MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res 2012, 40(7):e49-e49.
- [55]Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 2005, 21(18):3674-3676.
- [56]Götz S, García-Gómez JM, Terol J, Williams TD, Nagaraj SH, Nueda MJ, Robles M, Talón M, Dopazo J, Conesa A: High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res 2008, 36(10):3420-3435.
- [57]Benzécri J-P: Correspondence Analysis Handbook, Vol. 125. New York: CRC Press; 1992.
- [58]Xia X, Xie Z: DAMBE: software package for data analysis in molecular biology and evolution. J Hered 2001, 92(4):371-373.
- [59]Katoh K, Misawa K, Kuma K, Miyata T: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 2002, 30(14):3059-3066.
- [60]Zhang Z, Li J, Zhao X-Q, Wang J, Wong GK-S, Yu J: KaKs_Calculator: calculating Ka and Ks through model selection and model averaging. Genomics Proteomics Bioinformatics 2006, 4(4):259-263.
- [61]Marquez Y, Brown JW, Simpson C, Barta A, Kalyna M: Transcriptome survey reveals increased complexity of the alternative splicing landscape in Arabidopsis. Genome Res 2012, 22(6):1184-1195.
- [62]Zhang G, Guo G, Hu X, Zhang Y, Li Q, Li R, Zhuang R, Lu Z, He Z, Fang X: Deep RNA sequencing at single base-pair resolution reveals high complexity of the rice transcriptome. Genome Res 2010, 20(5):646-654.
- [63]Stamatakis A: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 2006, 22(21):2688-2690.
- [64]Stamatakis A, Hoover P, Rougemont J: A rapid bootstrap algorithm for the RAxML Web servers. Syst Bot 2008, 57(5):758-771.