期刊论文

【摘要】

BackgroundWe consider the problem of identifying motifs, recurring or conserved patterns, in the biological sequence data sets. To solve this task, we present a new deterministic algorithm for finding patterns that are embedded as exact or inexact instances in all or most of the input strings.ResultsThe proposed algorithm (1) improves search efficiency compared to existing algorithms, and (2) scales well with the size of alphabet. On a synthetic planted DNA motif finding problem our algorithm is over 10× more efficient than MITRA, PMSPrune, and RISOTTO for long motifs. Improvements are orders of magnitude higher in the same setting with large alphabets. On benchmark TF-binding site problems (FNP, CRP, LexA) we observed reduction in running time of over 12×, with high detection accuracy. The algorithm was also successful in rapidly identifying protein motifs in Lipocalin, Zinc metallopeptidase, and supersecondary structure motifs for Cadherin and Immunoglobin families.ConclusionsOur algorithm reduces computational complexity of the current motif finding algorithms and demonstrate strong running time improvements over existing exact algorithms, especially in important and difficult cases of large-alphabet sequences.

【授权许可】

CC BY
© Pavlovic and Kuksa; licensee BioMed Central Ltd. 2010

【预览】

附件列表
Files	Size	Format	View
RO202311099039298ZK.pdf	1384KB	PDF	download

【参考文献】

[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]

BMC Bioinformatics
Efficient motif finding algorithms for large-alphabet inputs
Research
Vladimir Pavlovic¹ Pavel P Kuksa¹
[1] Department of Computer Science, Rutgers University, 08854, Piscataway, NJ, USA;
关键词: Input String; Alphabet Size; Motif Finding Algorithm; Large Alphabet; Mismatch Region;
DOI : 10.1186/1471-2105-11-S8-S1
来源: Springer
PDF


	文献评价指标
	下载次数：6次	浏览次数：4次

【 摘 要 】

【 授权许可】

【 预 览 】

【 参考文献 】

【摘要】

【授权许可】

【预览】

【参考文献】