期刊论文详细信息
BMC Research Notes
Efficient computation of spaced seeds
Silvana Ilie1 
[1] Department of Mathematics, Ryerson University, Toronto, ON M5B 2K3, Canada
关键词: Sensitivity;    Heuristic algorithm;    Spaced seed;    Local alignment;    Similarity search;   
Others  :  1166621
DOI  :  10.1186/1756-0500-5-123
 received in 2011-11-10, accepted in 2012-02-28,  发布年份 2012
PDF
【 摘 要 】

Background

The most frequently used tools in bioinformatics are those searching for similarities, or local alignments, between biological sequences. Since the exact dynamic programming algorithm is quadratic, linear-time heuristics such as BLAST are used. Spaced seeds are much more sensitive than the consecutive seed of BLAST and using several seeds represents the current state of the art in approximate search for biological sequences. The most important aspect is computing highly sensitive seeds. Since the problem seems hard, heuristic algorithms are used. The leading software in the common Bernoulli model is the SpEED program.

Findings

SpEED uses a hill climbing method based on the overlap complexity heuristic. We propose a new algorithm for this heuristic that improves its speed by over one order of magnitude. We use the new implementation to compute improved seeds for several software programs. We compute as well multiple seeds of the same weight as MegaBLAST, that greatly improve its sensitivity.

Conclusion

Multiple spaced seeds are being successfully used in bioinformatics software programs. Enabling researchers to compute very fast high quality seeds will help expanding the range of their applications.

【 授权许可】

   
2011 Ilie et al; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150416051316596.pdf 288KB PDF download
Figure 7. 87KB Image download
Figure 6. 61KB Image download
Figure 5. 38KB Image download
Figure 4. 96KB Image download
Figure 3. 20KB Image download
Figure 2. 64KB Image download
Figure 1. 23KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

【 参考文献 】
  • [1]Lipman D, Pearson W: Rapid and sensitive protein similarity searches. Science 1985, 227(4693):1435-1441.
  • [2]Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215(3):403-410.
  • [3]Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389-3402.
  • [4]Califano A, Rigoutsos I: FLASH: a fast look-up algorithm for string homology. Computer Vision and Pattern Recognition 1993 Proceedings CVPR'93 1993 IEEE Computer Society Conference on 1993, 353-359.
  • [5]Buhler J: Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics 2001, 17(5):419-428.
  • [6]Ma B, Tromp J, Li M: PatternHunter: faster and more sensitive homology search. Bioinformatics 2002, 18(3):440-445.
  • [7]Burkhardt S, Kärkkäinen J: Better Filtering with Gapped q-Grams. Fundam Inform 2003, 56(1-2):51-70.
  • [8]Brown DG: A survey of seeding for sequence alignments. In In Bioinformatics Algorithms: Techniques and Applications. Edited by Mandoiu I, Zelikovsky A. Hoboken: J. Wiley and Sons Inc; 2007:117-142.
  • [9]Li M, Ma B, Kisman D, Tromp J: PatternHunterII: Highly Sensitive and Fast Homology Search. J Bioinformatics and Computational Biology 2004, 2(3):417-440.
  • [10]Noé L, Kucherov G: YASS: enhancing the sensitivity of DNA similarity search. Nucleic Acids Res 2005, 33(suppl 2):W540-W543.
  • [11]Homer N, Merriman B, Nelson SF: BFAST: An Alignment Tool for Large Scale Genome Resequencing. PLoS One 2009, 4(11):e7767.
  • [12]Rumble SM, Lacroute P, Dalca AV, Fiume M, Sidow A, Brudno M: SHRiMP: Accurate Mapping of Short Color-space Reads. PLoS Comput Biol 2009, 5(5):e1000386.
  • [13]Feng S, Tillier ER: A fast and flexible approach to oligonucleotide probe design for genomes and gene families. Bioinformatics 2007, 23(10):1195-1202.
  • [14]Ma B, Li M: On the complexity of the spaced seeds. J Comput Syst Sci 2007, 73(7):1024-1034.
  • [15]Ma B, Yao H: Seed Optimization Is No Easier than Optimal Golomb Ruler Design. APBC 2008, 133-144.
  • [16]Buhler J, Keich U, Sun Y: Designing seeds for similarity search in genomic DNA In Proceedings of RECOMB'03. New York: ACM; 2003:67-75.
  • [17]Kucherov G, Noé L, Roytberg MA: A Unifying Framework for Seed Sensitivity and its Application to Subset Seeds. J Bioinformatics and Computational Biology 2006, 4(2):553-570.
  • [18]Ilie L, Ilie S, Mansouri Bigvand A: SpEED: fast computation of sensitive spaced seeds. Bioinformatics 2011, 27(17):2433-2434.
  • [19]Ilie L, Ilie S: Multiple spaced seeds for homology search. Bioinformatics 2007, 23(22):2969-2977.
  文献评价指标  
  下载次数:51次 浏览次数:6次