期刊论文详细信息
BMC Bioinformatics
ReformAlign: improved multiple sequence alignments using a profile-based meta-alignment approach
Dimitrios P Lyras1  Dirk Metzler1 
[1] Faculty of Biology, Department II, Ludwig-Maximilians Universität München, Planegg-Martinsried 82152, Germany
关键词: Dynamic programming;    Iterative refinement;    Multiple sequence alignment;   
Others  :  1087527
DOI  :  10.1186/1471-2105-15-265
 received in 2014-03-24, accepted in 2014-07-29,  发布年份 2014
PDF
【 摘 要 】

Background

Obtaining an accurate sequence alignment is fundamental for consistently analyzing biological data. Although this problem may be efficiently solved when only two sequences are considered, the exact inference of the optimal alignment easily gets computationally intractable for the multiple sequence alignment case. To cope with the high computational expenses, approximate heuristic methods have been proposed that address the problem indirectly by progressively aligning the sequences in pairs according to their relatedness. These methods however are not flexible to change the alignment of an already aligned group of sequences in the view of new data, resulting thus in compromises on the quality of the deriving alignment. In this paper we present ReformAlign, a novel meta-alignment approach that may significantly improve on the quality of the deriving alignments from popular aligners. We call ReformAlign a meta-aligner as it requires an initial alignment, for which a variety of alignment programs can be used. The main idea behind ReformAlign is quite straightforward: at first, an existing alignment is used to construct a standard profile which summarizes the initial alignment and then all sequences are individually re-aligned against the formed profile. From each sequence-profile comparison, the alignment of each sequence against the profile is recorded and the final alignment is indirectly inferred by merging all the individual sub-alignments into a unified set. The employment of ReformAlign may often result in alignments which are significantly more accurate than the starting alignments.

Results

We evaluated the effect of ReformAlign on the generated alignments from ten leading alignment methods using real data of variable size and sequence identity. The experimental results suggest that the proposed meta-aligner approach may often lead to statistically significant more accurate alignments. Furthermore, we show that ReformAlign results in more substantial improvement in cases where the starting alignment is of relatively inferior quality or when the input sequences are harder to align.

Conclusions

The proposed profile-based meta-alignment approach seems to be a promising and computationally efficient method that can be combined with practically all popular alignment methods and may lead to significant improvements in the generated alignments.

【 授权许可】

   
2014 Lyras and Metzler; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117013209376.pdf 3066KB PDF download
Figure 7. 109KB Image download
Figure 6. 82KB Image download
Figure 5. 128KB Image download
Figure 4. 95KB Image download
Figure 3. 76KB Image download
Figure 2. 141KB Image download
Figure 1. 58KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

【 参考文献 】
  • [1]Notredame C: Recent evolutions of multiple sequence alignment algorithms. PLoS Comput Biol 2007, 3:e123.
  • [2]Edgar RC, Batzoglou S: Multiple sequence alignment. Curr Opin Struct Biol 2006, 16:368-373.
  • [3]Notredame C: Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 2002, 3:131-144.
  • [4]Do CB, Katoh K: Protein multiple sequence alignment. Methods Mol Biol Clifton NJ 2008, 484:379-413.
  • [5]Murata M, Richardson JS, Sussman JL: Simultaneous comparison of three protein sequences. Proc Natl Acad Sci U S A 1985, 82:3073-3077.
  • [6]Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994, 22:4673-4680.
  • [7]Lassmann T, Frings O, Sonnhammer ELL: Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features. Nucleic Acids Res 2009, 37:858-865.
  • [8]Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004, 32:1792-1797.
  • [9]Just W: Computational complexity of multiple sequence alignment with SP-score. J Comput Biol J Comput Mol Cell Biol 2001, 8:615-623.
  • [10]Kececioglu J, Starrett D: Aligning alignments exactly. In Proc Eighth Annu Int Conf Res Comput Mol Biol. New York, NY, USA: ACM; 2004:85-96. RECOMB ’04
  • [11]Wang L, Jiang T: On the complexity of multiple sequence alignment. J Comput Biol J Comput Mol Cell Biol 1994, 1:337-348.
  • [12]Bonizzoni P, Vedova GD: The complexity of multiple sequence alignment with SP-score that is a metric. Theor Comput Sci 2001, 259:63-79.
  • [13]Feng DF, Doolittle RF: Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 1987, 25:351-360.
  • [14]Hogeweg P, Hesper B: The alignment of sets of sequences and the construction of phyletic trees: an integrated method. J Mol Evol 1984, 20:175-186.
  • [15]Barton GJ, Sternberg MJ: A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. J Mol Biol 1987, 198:327-337.
  • [16]Higgins DG, Sharp PM: Fast and sensitive multiple sequence alignments on a microcomputer. Comput Appl Biosci CABIOS 1989, 5:151-153.
  • [17]Durbin R, Eddy S, Krogh A, Mitchison G: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge: University Press; 1998.
  • [18]Berger MP, Munson PJ: A novel randomized iterative strategy for aligning multiple protein sequences. Comput Appl Biosci CABIOS 1991, 7:479-484.
  • [19]Gotoh O: Optimal alignment between groups of sequences and its application to multiple sequence alignment. Comput Appl Biosci CABIOS 1993, 9:361-370.
  • [20]Roskin KM, Paten B, Haussler D: Meta-alignment with crumble and prune: partitioning very large alignment problems for performance and parallelization. BMC Bioinformatics 2011, 12:1-12.
  • [21]Gotoh O: An improved algorithm for matching biological sequences. J Mol Biol 1982, 162:705-708.
  • [22]Needleman SB, Wunsch CD: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 1970, 48:443-453.
  • [23]Ye X, Wang G, Altschul SF: An assessment of substitution scores for protein profile-profile comparison. Bioinformatics 2011, 27:3356-3363.
  • [24]Edgar RC: Optimizing substitution matrix choice and gap parameters for sequence alignment. BMC Bioinformatics 2009, 10:396.
  • [25]Chiaromonte F, Yap VB, Miller W: Scoring pairwise genomic sequence alignments. Pac Symp Biocomput Pac Symp Biocomput 2002, 115-126.
  • [26]Edgar RC: MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 2004, 5:113. doi:10.1186/1471-2105-5-113
  • [27]Gardner PP, Wilm A, Washietl S: A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res 2005, 33:2433-2439.
  • [28]Wilm A, Mainz I, Steger G: An enhanced RNA alignment benchmark for sequence alignment programs. Algorithms Mol Biol 2006, 1:19.
  • [29]Carroll H, Beckstead W, O’Connor T, Ebbert M, Clement M, Snell Q, McClellan D: DNA reference alignment benchmarks based on tertiary structure of encoded proteins. Bioinformatics 2007, 23:2648-2649.
  • [30]Gardner PP, Giegerich R: A comprehensive comparison of comparative RNA structure prediction approaches. BMC Bioinformatics 2004, 5:140.
  • [31]Thompson JD, Koehl P, Ripp R, Poch O: BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark. Proteins Struct Funct Bioinforma 2005, 61:127-136.
  • [32]Raghava GPS, Searle SM, Audley PC, Barber JD, Barton GJ: OXBench: A benchmark for evaluation of protein multiple sequence alignment accuracy. BMC Bioinformatics 2003, 4:47.
  • [33]Ponting CP, Schultz J, Milpetz F, Bork P: SMART: identification and annotation of domains from signalling and extracellular protein sequences. Nucleic Acids Res 1999, 27:229-232.
  • [34]Sauder JM, Arthur JW, Dunbrack RL Jr: Large-scale comparison of protein sequence alignment algorithms with structure alignments. Proteins 2000, 40:6-22.
  • [35]Thompson JD, Plewniak F, Poch O: A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res 1999, 27:2682-2690.
  • [36]Cline M, Hughey R, Karplus K: Predicting reliable regions in protein sequence alignments. Bioinformatics 2002, 18:306-314.
  • [37]Blackburne BP, Whelan S: Measuring the distance between multiple sequence alignments. Bioinformatics 2012, 28:495-502.
  • [38]Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG: Clustal W and Clustal X version 2.0. Bioinformatics 2007, 23:2947-2948.
  • [39]Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, Thompson JD, Higgins DG: Fast, scalable generation of high‒quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 2011, 7:539. doi:10.1038/msb.2011.75
  • [40]Katoh K, Standley DM: MAFFT multiple sequence alignment software Version 7: improvements in performance and usability. Mol Biol Evol 2013, 30:772-780.
  • [41]Russell DJ, Way SF, Benson AK, Sayood K: A grammar-based distance metric enables fast and accurate clustering of large sets of 16S sequences. BMC Bioinformatics 2010, 11:601.
  • [42]Do CB, Mahabhashyam MSP, Brudno M, Batzoglou S: ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res 2005, 15:330-340.
  • [43]Wilm A, Higgins DG, Notredame C: R-Coffee: a method for multiple alignment of non-coding RNA. Nucleic Acids Res 2008, 36:e52.
  • [44]Notredame C, Higgins DG, Heringa J: T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 2000, 302:205-217.
  • [45]Sahraeian SME, Yoon B-J: PicXAA: greedy probabilistic construction of maximum expected accuracy alignment of multiple sequences. Nucleic Acids Res 2010, 38:4917-4928.
  • [46]Subramanian AR, Kaufmann M, Morgenstern B: DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol Biol 2008, 3:6.
  文献评价指标  
  下载次数:175次 浏览次数:48次