期刊论文详细信息
Algorithms for Molecular Biology
Probabilistic approaches to alignment with tandem repeats
Michal Nánási1  Tomáš Vinař2  Broňa Brejová1 
[1] Department of Computer Science, Faculty of Mathematics, Physics, and Informatics, Comenius University, Mlynská dolina, 842 48 Bratislava, Slovakia
[2] Department of Applied Informatics, Faculty of Mathematics, Physics, and Informatics, Comenius University, Mlynská dolina, 842 48 Bratislava, Slovakia
关键词: Tandem repeat;    Hidden Markov model;    Sequence alignment;   
Others  :  793032
DOI  :  10.1186/1748-7188-9-3
 received in 2013-12-09, accepted in 2014-02-24,  发布年份 2014
PDF
【 摘 要 】

Background

Short tandem repeats are ubiquitous in genomic sequences and due to their complex evolutionary history pose a challenge for sequence alignment tools.

Results

To better account for the presence of tandem repeats in pairwise sequence alignments, we propose a simple tractable pair hidden Markov model that explicitly models their presence. Using the framework of gain functions, we design several optimization criteria for decoding this model and describe resulting decoding algorithms, ranging from the traditional Viterbi and posterior decoding to block-based decoding algorithms tailored to our model. We compare the accuracy of individual decoding algorithms on simulated and real data and find that our approach is superior to the classical three-state pair HMM.

Conclusions

Our study illustrates versatility of pair hidden Markov models coupled with appropriate decoding criteria as a modeling tool for capturing complex sequence features.

【 授权许可】

   
2014 Nánási et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140705042701558.pdf 557KB PDF download
Figure 7. 18KB Image download
Figure 6. 27KB Image download
Figure 5. 27KB Image download
Figure 4. 15KB Image download
Figure 3. 27KB Image download
Figure 2. 43KB Image download
Figure 1. 32KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

【 参考文献 】
  • [1]Durbin R, Eddy S, Krogh A, Mitchison G: Biological Sequence Analysis Probabilistic Models of Proteins and Nucleic Acids. Cambridge: Cambridge University Press; 1998.
  • [2]Needleman SB, Wunsch CD: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 1970, 48(3):443-443.
  • [3]Lunter G, Rocco A, Mimouni N, Heger A, Caldeira A, Hein J: Uncertainty in homology inferences: assessing and improving genomic sequence alignment. Genome Res 2008, 18(2):298-309.
  • [4]Miyazawa S: A reliable sequence alignment method based on probabilities of residue correspondences. Protein Eng 1995, 8(10):999-1009.
  • [5]Holmes I, Durbin R: Dynamic programming alignment accuracy. J Comput Biol 1998, 5(3):493-504.
  • [6]Schwartz AS, Pachter L: Multiple alignment by sequence annealing. Bioinformatics 2007, 23(2):e24-e29.
  • [7]Hudek AK: Improvements in the accuracy of pairwise genomic alignment. PhD thesis, University of Waterloo, Canada 2010.
  • [8]Satija R, Hein J, Lunter GA: Genome-wide functional element detection using pairwise statistical alignment outperforms multiple genome footprinting techniques. Bioinformatics 2010, 26(17):2116-2120.
  • [9]Gemayel R, Vinces MD, Legendre M, Verstrepen KJ: Variable tandem repeats accelerate evolution of coding and regulatory sequences. Annu Rev Genet 2010, 44:445-477.
  • [10]Messer PW, Arndt PF: The majority of recent short DNA insertions in the human genome are tandem duplications. Mol Biol Evol 2007, 24(5):1190-1197.
  • [11]Benson G: Sequence alignment with tandem duplication. J Comput Biol 1997, 4(3):351-357.
  • [12]Sammeth M, Stoye J: Comparing tandem repeats with duplications and excisions of variable degree. IEEE/ACM Trans Comput Biol Bioinform 2006, 3(4):395-407.
  • [13]Bérard S, Nicolas F, Buard J, Gascuel O, Rivals E: A fast and specific alignment method for minisatellite maps. Evol Bioinformatics Online 2006, 2:303.
  • [14]Freschi V, Bogliolo A: A lossy compression technique enabling duplication-aware sequence alignment. Evol Bioinformatics Online 2012, 8:171.
  • [15]Hickey G, Blanchette M: A probabilistic model for sequence alignment with context-sensitive indels. J Comput Biol 2011, 18(11):1449-1464.
  • [16]Kováč P: Aligning sequences with repetitive motifs. In Information Technologies - Applications and Theory (ITAT): 17–21 September 2012; Magura, Slovakia. CEUR-WS Workshop Proceedings vol. 990. Edited by Horváth T. 2012, 41-48.
  • [17]Benson G: Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 1999, 27(2):573-580.
  • [18]Frith MC: A new repeat-masking method enables specific detection of homologous sequences. Nucleic Acids Res 2011, 39(4):e23.
  • [19]Hamada M, Kiryu H, Sato K, Mituyama T, Asai K: Prediction of RNA secondary structure using generalized centroid estimators. Bioinformatics 2009, 25(4):465-473.
  • [20]Pachter L, Alexandersson M, Cawley S: Applications of generalized pair hidden Markov models to alignment and gene finding problems. J Comput Biol 2002, 9(2):389-399.
  • [21]Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004, 32(5):1792-1797.
  • [22]Kolpakov R, Bana G, Kucherov G: mreps: efficient and flexible detection of tandem repeats in DNA. Nucleic Acids Res 2003, 31(13):3672-3678.
  • [23]Wexler Y, Yakhini Z, Kashi Y, Geiger D: Finding approximate tandem repeats in genomic sequences. J Comput Biol 2005, 12(7):928-942.
  • [24]Meyer LR, Zweig AS, Hinrichs AS, Karolchik D, Kuhn RM, Wong M, Sloan CA, Rosenbloom KR, Roe G, Rhead B, Raney BJ, Pohl A, Malladi VS, Li CH, Lee BT, Learned K, Kirkup V, Hsu F, Heitner S, Harte RA, Haeussler M, Guruvadoo L, Goldman M, Giardine BM, Fujita PA, Dreszer TR, Diekhans M, Cline MS, Clawson H, Barber GP, et al.: The UCSC genome browser database: extensions and updates 2013. Nucleic Acids Res 2013, 41(Database issue):D64-D69.
  • [25]Hubisz MJ, Pollard KS, Siepel A: PHAST and RPHAST: phylogenetic analysis with space/time models. Brief Bioinform 2011, 12:41-51.
  文献评价指标  
  下载次数:35次 浏览次数:27次