BMC Bioinformatics | |
Efficient motif finding algorithms for large-alphabet inputs | |
Research | |
Vladimir Pavlovic1  Pavel P Kuksa1  | |
[1] Department of Computer Science, Rutgers University, 08854, Piscataway, NJ, USA; | |
关键词: Input String; Alphabet Size; Motif Finding Algorithm; Large Alphabet; Mismatch Region; | |
DOI : 10.1186/1471-2105-11-S8-S1 | |
来源: Springer | |
![]() |
【 摘 要 】
BackgroundWe consider the problem of identifying motifs, recurring or conserved patterns, in the biological sequence data sets. To solve this task, we present a new deterministic algorithm for finding patterns that are embedded as exact or inexact instances in all or most of the input strings.ResultsThe proposed algorithm (1) improves search efficiency compared to existing algorithms, and (2) scales well with the size of alphabet. On a synthetic planted DNA motif finding problem our algorithm is over 10× more efficient than MITRA, PMSPrune, and RISOTTO for long motifs. Improvements are orders of magnitude higher in the same setting with large alphabets. On benchmark TF-binding site problems (FNP, CRP, LexA) we observed reduction in running time of over 12×, with high detection accuracy. The algorithm was also successful in rapidly identifying protein motifs in Lipocalin, Zinc metallopeptidase, and supersecondary structure motifs for Cadherin and Immunoglobin families.ConclusionsOur algorithm reduces computational complexity of the current motif finding algorithms and demonstrate strong running time improvements over existing exact algorithms, especially in important and difficult cases of large-alphabet sequences.
【 授权许可】
CC BY
© Pavlovic and Kuksa; licensee BioMed Central Ltd. 2010
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311099039298ZK.pdf | 1384KB | ![]() |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]