期刊论文详细信息
BMC Evolutionary Biology
Superiority of a mechanistic codon substitution model even for protein sequences in Phylogenetic analysis
Sanzo Miyazawa1 
[1] 6-5-607 Miyanodai, Sakura, Chiba, 285-0857, Japan
关键词: multiple nucleotide change;    Variable mutation rate across sites;    Variable selective constraint across sites;    Selective constraints;    Functional constraints;    Structural constraints;    Mechanistic codon substitution model;    Empirical amino acid substitution rate matrix;    Amino acid substitution model;   
Others  :  1084722
DOI  :  10.1186/1471-2148-13-257
 received in 2013-09-21, accepted in 2013-11-14,  发布年份 2013
PDF
【 摘 要 】

Background

Nucleotide and amino acid substitution tendencies are characteristic of each species, organelle, and protein family. Hence, various empirical amino acid substitution rate matrices have needed to be estimated for phylogenetic analysis: JTT, WAG, and LG for nuclear proteins, mtREV for mitochondrial proteins, cpREV10 and cpREV64 for chloroplast-encoded proteins, and FLU for influenza proteins. On the other hand, in a mechanistic codon substitution model, in which each codon substitution rate is proportional to the product of a codon mutation rate and the ratio of fixation depending on the type of amino acid replacement, mutation rates and the strength of selective constraint on amino acids can be tailored to each protein family with additional 11 parameters. As a result, in the evolutionary analysis of codon sequences it outperforms codon substitution models equivalent to empirical amino acid substitution matrices. Is it superior even for amino acid sequences, among which synonymous substitutions cannot be identified?

Results

Nucleotide mutations are assumed to occur independently of codon positions but multiple nucleotide changes in infinitesimal time are allowed. Selective constraints on the respective types of amino acid replacements are tailored to each gene with a linear function of a given estimate of selective constraints, which were estimated by maximizing the likelihood of an empirical amino acid or codon substitution frequency matrix, each of JTT, WAG, LG, and KHG. It is shown that the mechanistic codon substitution model with the assumption of equal codon usage yields better values of Akaike and Bayesian information criteria for all three phylogenetic trees of mitochondrial, chloroplast, and influenza-A hemagglutinin proteins than the empirical amino acid substitution models with mtREV, cpREV64, and FLU, which were designed specifically for those protein families, respectively. The variation of selective constraint across sites fits the datasets significantly better than variable codon mutation rates, confirming that substitution rate variations across sites detected by amino acid substitution models are caused primarily by the variation of selective constraint against amino acid substitutions rather than the variation of codon mutation rate.

Conclusions

The mechanistic codon substitution model is superior to amino acid substitution models even in the evolutionary analysis of protein sequences.

【 授权许可】

   
2013 Miyazawa; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150113163833289.pdf 243KB PDF download
【 参考文献 】
  • [1]Kimura M: A simple model for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 1980, 16:111-120.
  • [2]Hasegawa M, Kishino H, Yano T: Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 1985, 22:160-174.
  • [3]Tamura K, Nei M: Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 1993, 10:512-526.
  • [4]Dayhoff MO, Schwartz RM, Orcutt BC: A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure,. Edited by Dayhoff MO. Washington D.C.; National Biomedical Research Foundation; 1978:345–352
  • [5]Jones DT, Taylor WR, Thornton JM: The rapid generation of mutation data matrices from protein sequences. CABIOS 1992, 8:275-282.
  • [6]Adachi J, Hasegawa M: Model of amino acid substitution in proteins encoded by mitochondrial DNA. J Mol Evol 1996, 42:459-468.
  • [7]Yang Z, Nielsen R, Hasegawa M: Models of amino acid substitution and application to mitochondrial protein evolution. Mol Biol Evol 1998, 15:1600-1611.
  • [8]Adachi J, Waddell PJ, Martin W, Hasegawa M: Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA. J Mol Evol 2000, 50:348-358.
  • [9]Dimmic MW, Mindell DP: Goldstein RA: Modelling evolution at the protein level using an adjustable amino acid fitness model. Pac Symp Biocomput 5:18-29.
  • [10]Whelan S, Goldman N: A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18:691-699.
  • [11]Le SQ, Gascuel O: An improved general amino acid replacement matrix. Mol Biol Evol 2008, 25:1307-1320.
  • [12]Huelsenbeck JP, Joyce P, Lakner C, Ronquist F: Bayesian analysis of amino acid substitution models. Phil Trans R Soc B 2008, 363:3941-3953.
  • [13]Miyazawa S, Jernigan RL: A new substitution matrix for protein sequence searches based on contact frequencies in protein structures. Protein Eng 1993, 6:267-278.
  • [14]Goldman N, Yang Z: A codon-based model of nucleotide substitution for protein-coding DNA. Mol Biol Evol 1994, 11:725-736.
  • [15]Muse SV, Gaut BS: Nucleotide substitution rates, with application to the chloroplast genome. Mol Biol Evol 1994, 11:715-724.
  • [16]Whelan S, Goldman N: Estimating the frequency of events that cause multiple-nucleotide changes. Genetics 2004, 167:2027-2043.
  • [17]Yang Z, Nielsen R: Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage. Mol Biol Evol 2008, 25:568-579.
  • [18]Yang Z, Nielsen R, Goldman N, Pedersen A-MK: Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 2000, 155:431-449.
  • [19]Doron-Faigenboim A, Pupko T: A combined empirical and mechanistic codon model. Mol Biol Evol 2007, 24:388-397.
  • [20]Seo TK, Kishino H: Synonymous substitutions substantially improve evolutionary inference from highly diverged proteins. Syst Biol 2008, 57:367-377.
  • [21]Seo TK, Kishino H: Statistical comparison of nucleotide, amino acid, and codon substitution models for evolutionary analysis of protein-coding sequences. Syst Biol 2009, 58:199-210.
  • [22]Delport W, Scheffler K, Gravenor MB, Muse SV, Kosakovsky PS: Benchmarking multi-rate codon models. PLoS One 2010, 5:11587.
  • [23]Delport W, Scheffler K, Botha G, Gravenor MB, Muse SV, Kosakovsky PS: CodonTest: modeling amino acid substitution preferences in coding sequences. PLoS Comp Biol 2010, 6:1000885.
  • [24]Halpern AL, Bruno WJ: Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies. Mol Biol Evol 1998, 15:910-917.
  • [25]Kosiol C, Holmes I, Goldman N: An empirical codon model for protein sequence evolution. Mol Biol Evol 2007, 24:1464-1479.
  • [26]Miyazawa S: Selective constraints on amino acids estimated by a mechanistic codon substitution model with multiple nucleotide changes. PLoS One 2011, 6:17244.
  • [27]Miyazawa S: Advantages of a mechanistic codon substitution model for evolutionary analysis of protein-coding sequences. PLoS One 2001, 6:28892.
  • [28]Zhong B, Yonezawa T, Zhong Y, Hasegawa M: The position of gnetales among seed plants: overcoming pitfalls of chloroplast phylogenomics. Mol Biol Evol 2010, 10:1093.
  • [29]Dang CC, Le SQ, Gascuel O, Le VS: Flu, an amino acid substitution model for influenza proteins. BMC Evol Biol 2008, 8:331. BioMed Central Full Text
  • [30]Murrell B, Weighill T, Buys J, Ketteringham R, Moola S, Benade G, du Buisson L, Kaliski D, Hands T, Scheffler K: Non-negative matrix factorization for learning alignment-specific models of protein evolution. PLoS One 2011, 6:28898.
  • [31]Zoller S, Schneider A: Improving phylogenetic inference with a semiempirical amino acid substitution model. Mol Biol Evol 2013, 30:469-479.
  • [32]Akaike H: A new look at the statistical model identification. IEEE Trans Autom Contr 1974, AC-19:716-723.
  • [33]Schwarz G: Estimating the dimension of a model. Ann Stat 1974, 6:461-464.
  • [34]Nikaido M, Cao Y, Harada M, Okada N, Hasegawa M: Mitochondrial phylogeny of hedgehogs and monophyly of eulipotyphla. Mol Phylogenet Evol 2003, 28:276-284.
  • [35]Jansen RK, Cai Z, Raubeson LA, Daniell H, dePamphilis CW, Leebens-Mack J, Müller KF, Guisinger-Bellian M, Haberle RC, Chumley TW, Lee S-B Peery R, McNeal JR, Kuehl JV, Boore JL: Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns. Proc Natl Acad Sci USA 2007, 104:19369-19374.
  • [36]Price MN, Dehal PS, Arkin AP: FastTree 2 - approximately maximum-likelihood trees for large alignments. PLoS One 2010, 5:9490.
  • [37]Guindon S, Gascuel O: Simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 2003, 52:696-704.
  • [38]Yang Z: Maximum likelihood phylogenetic estimation from DNA, sequences with variable rates over sites: approximate methods. J Mol Evol 1994, 39:306-314.
  • [39]Yang Z: A space-time process model for the evolution of DNA, sequences. Genetics 1995, 139:993-1005.
  • [40]Go M, Miyazawa S: Volume and polarity changes accompanied by amino acid substitutions in protein evolution. Int J Pept Protein Res 1978, 12:237-241.
  • [41]Go M, Miyazawa S: Relationship between mutability, polarity and exteriority of amino acid residues in protein evolution. Int J Peptide Protein Res 1980, 15:211-224.
  • [42]Lartillot N, Philippe H: A bayesian mixture model for acrosssite heterogeneities in the amino-acid replacement process. Mol Biol Evol 2004, 21:1095-1109.
  • [43]Wang HC, Li K, Susko E, Roger AJ: A class frequency mixture model that adjusts for site-specific amino acid frequencies and improves inference of protein phylogeny. BMC Evol Biol 1996, 11:158-163.
  • [44]Le SQ, Gascuel O, Lartillot N: Empirical profile mixture models for phylogenetic reconstruction. Bioinformatics 2008, 24:2317-2323.
  • [45]Le SQ, Lartillot N, Gascuel O: Phylogenetic mixture models for proteins. Philos Trans R Soc Lond B Biol Sci 2008, 363:3965-3976.
  • [46]Le SQ, Dang CC, Gascuel O: Modeling protein evolution with several amino acid replacement matrices depending on site rates. Mol Biol Evol 2012, 29:2921-2936.
  • [47]Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG: Clustalw and clustalx version 2.0. Bioinformatics 2007, 23:2947-2948.
  • [48]Katoh K, Standley DM: Mafft multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 2013, 30:772-780.
  文献评价指标  
  下载次数:16次 浏览次数:7次