期刊论文详细信息
BMC Bioinformatics
Parameterizing sequence alignment with an explicit evolutionary model
Research Article
Elena Rivas1  Sean R. Eddy2 
[1] Department of Molecular and Cellular Biology, Harvard University, 02138, Cambridge, MA, USA;Department of Molecular and Cellular Biology, Harvard University, 02138, Cambridge, MA, USA;Howard Hughes Medical Institute, 4000 Jones Bridge Rd, 20815, Chevy Chase, MD, USA;John A. Paulson School of Engineering and Applied Sciences, 16 Divinity Avenue, 02138, Cambridge, MA, USA;FAS Center for Systems Biology, Harvard University, 16 Divinity Avenue, 02138, Cambridge, MA, USA;
关键词: Evolutionary models;    Hidden Markov models;    Insertions and deletions;   
DOI  :  10.1186/s12859-015-0832-5
 received in 2015-05-27, accepted in 2015-11-20,  发布年份 2015
来源: Springer
PDF
【 摘 要 】

BackgroundInference of sequence homology is inherently an evolutionary question, dependent upon evolutionary divergence. However, the insertion and deletion penalties in the most widely used methods for inferring homology by sequence alignment, including BLAST and profile hidden Markov models (profile HMMs), are not based on any explicitly time-dependent evolutionary model. Using one fixed score system (BLOSUM62 with some gap open/extend costs, for example) corresponds to making an unrealistic assumption that all sequence relationships have diverged by the same time. Adoption of explicit time-dependent evolutionary models for scoring insertions and deletions in sequence alignments has been hindered by algorithmic complexity and technical difficulty.ResultsWe identify and implement several probabilistic evolutionary models compatible with the affine-cost insertion/deletion model used in standard pairwise sequence alignment. Assuming an affine gap cost imposes important restrictions on the realism of the evolutionary models compatible with it, as single insertion events with geometrically distributed lengths do not result in geometrically distributed insert lengths at finite times. Nevertheless, we identify one evolutionary model compatible with symmetric pair HMMs that are the basis for Smith-Waterman pairwise alignment, and two evolutionary models compatible with standard profile-based alignment.We test different aspects of the performance of these “optimized branch length” models, including alignment accuracy and homology coverage (discrimination of residues in a homologous region from nonhomologous flanking residues). We test on benchmarks of both global homologies (full length sequence homologs) and local homologies (homologous subsequences embedded in nonhomologous sequence).ConclusionsContrary to our expectations, we find that for global homologies a single long branch parameterization suffices both for distant and close homologous relationships. In contrast, we do see an advantage in using explicit evolutionary models for local homologies. Optimal branch parameterization reduces a known artifact called “homologous overextension”, in which local alignments erroneously extend through flanking nonhomologous residues.

【 授权许可】

CC BY   
© Rivas and Eddy. 2015

【 预 览 】
附件列表
Files Size Format View
RO202311102909157ZK.pdf 3438KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  文献评价指标  
  下载次数:13次 浏览次数:6次