BMC Genomics | |
Efficient sequential and parallel algorithms for finding edit distance based motifs | |
Research | |
Peng Xiao1  Sanguthevar Rajasekaran1  Soumitra Pal1  | |
[1] Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Road,, 06269, Storrs, CT, USA; | |
关键词: Motif; Edit distance; Trie; Radix sort; | |
DOI : 10.1186/s12864-016-2789-9 | |
来源: Springer | |
【 摘 要 】
BackgroundMotif search is an important step in extracting meaningful patterns from biological data. The general problem of motif search is intractable and there is a pressing need to develop efficient, exact and approximation algorithms to solve this problem. In this paper, we present several novel, exact, sequential and parallel algorithms for solving the (l,d) Edit-distance-based Motif Search (EMS) problem: given two integers l,d and n biological strings, find all strings of length l that appear in each input string with atmost d errors of types substitution, insertion and deletion.MethodsOne popular technique to solve the problem is to explore for each input string the set of all possible l-mers that belong to the d-neighborhood of any substring of the input string and output those which are common for all input strings. We introduce a novel and provably efficient neighborhood exploration technique. We show that it is enough to consider the candidates in neighborhood which are at a distance exactly d. We compactly represent these candidate motifs using wildcard characters and efficiently explore them with very few repetitions. Our sequential algorithm uses a trie based data structure to efficiently store and sort the candidate motifs. Our parallel algorithm in a multi-core shared memory setting uses arrays for storing and a novel modification of radix-sort for sorting the candidate motifs.ResultsThe algorithms for EMS are customarily evaluated on several challenging instances such as (8,1), (12,2), (16,3), (20,4), and so on. The best previously known algorithm, EMS1, is sequential and in estimated 3 days solves up to instance (16,3). Our sequential algorithms are more than 20 times faster on (16,3). On other hard instances such as (9,2), (11,3), (13,4), our algorithms are much faster. Our parallel algorithm has more than 600 % scaling performance while using 16 threads.ConclusionsOur algorithms have pushed up the state-of-the-art of EMS solvers and we believe that the techniques introduced in this paper are also applicable to other motif search problems such as Planted Motif Search (PMS) and Simple Motif Search (SMS).
【 授权许可】
CC BY
© The Author(s) 2016
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311104944460ZK.pdf | 1116KB | download | |
12951_2015_155_Article_IEq23.gif | 1KB | Image | download |
Fig. 1 | 238KB | Image | download |
Fig. 1 | 1909KB | Image | download |
Fig. 4 | 161KB | Image | download |
MediaObjects/13046_2023_2846_MOESM8_ESM.pdf | 161KB | download | |
Fig. 6 | 83KB | Image | download |
12951_2015_111_Article_IEq1.gif | 1KB | Image | download |
Fig. 1 | 2753KB | Image | download |
Table 2 | 61KB | Table | download |
MediaObjects/13046_2023_2846_MOESM10_ESM.pdf | 123KB | download | |
Fig. 3 | 393KB | Image | download |
12951_2017_270_Article_IEq5.gif | 1KB | Image | download |
Fig. 2 | 804KB | Image | download |
MediaObjects/12936_2023_4728_MOESM6_ESM.docx | 78KB | Other | download |
12951_2015_155_Article_IEq24.gif | 1KB | Image | download |
12951_2015_155_Article_IEq25.gif | 1KB | Image | download |
13100_2023_302_Article_IEq7.gif | 1KB | Image | download |
Fig. 1 | 192KB | Image | download |
12951_2015_111_Article_IEq4.gif | 1KB | Image | download |
Fig. 1 | 50KB | Image | download |
MediaObjects/42004_2023_1020_MOESM3_ESM.mov | 3708KB | Other | download |
12951_2017_270_Article_IEq8.gif | 1KB | Image | download |
Fig. 2 | 123KB | Image | download |
12951_2017_270_Article_IEq9.gif | 1KB | Image | download |
MediaObjects/13690_2023_1195_MOESM1_ESM.docx | 19KB | Other | download |
MediaObjects/13690_2023_1195_MOESM2_ESM.docx | 146KB | Other | download |
MediaObjects/12888_2023_5201_MOESM1_ESM.docx | 13KB | Other | download |
Fig. 6 | 196KB | Image | download |
MediaObjects/12888_2023_5201_MOESM2_ESM.pdf | 264KB | download | |
Fig. 5 | 50KB | Image | download |
12951_2015_155_Article_IEq30.gif | 1KB | Image | download |
MediaObjects/12888_2023_5201_MOESM3_ESM.pdf | 221KB | download | |
Fig. 1 | 132KB | Image | download |
12951_2015_111_Article_IEq5.gif | 1KB | Image | download |
Fig. 7 | 198KB | Image | download |
Fig. 3 | 3571KB | Image | download |
Fig. 3 | 3612KB | Image | download |
Fig. 8 | 1473KB | Image | download |
12951_2017_270_Article_IEq10.gif | 1KB | Image | download |
Fig. 1 | 96KB | Image | download |
Fig. 2 | 582KB | Image | download |
Fig. 2 | 4215KB | Image | download |
Fig. 2 | 172KB | Image | download |
Fig. 1 | 104KB | Image | download |
12864_2016_2889_Article_IEq3.gif | 1KB | Image | download |
Fig. 1 | 181KB | Image | download |
12951_2015_155_Article_IEq32.gif | 1KB | Image | download |
12951_2015_155_Article_IEq33.gif | 1KB | Image | download |
12951_2015_155_Article_IEq34.gif | 1KB | Image | download |
Fig. 1 | 134KB | Image | download |
MediaObjects/12888_2023_5242_MOESM1_ESM.docx | 20KB | Other | download |
Fig. 9 | 7247KB | Image | download |
【 图 表 】
Fig. 9
Fig. 1
12951_2015_155_Article_IEq34.gif
12951_2015_155_Article_IEq33.gif
12951_2015_155_Article_IEq32.gif
Fig. 1
12864_2016_2889_Article_IEq3.gif
Fig. 1
Fig. 2
Fig. 2
Fig. 2
Fig. 1
12951_2017_270_Article_IEq10.gif
Fig. 8
Fig. 3
Fig. 3
Fig. 7
12951_2015_111_Article_IEq5.gif
Fig. 1
12951_2015_155_Article_IEq30.gif
Fig. 5
Fig. 6
12951_2017_270_Article_IEq9.gif
Fig. 2
12951_2017_270_Article_IEq8.gif
Fig. 1
12951_2015_111_Article_IEq4.gif
Fig. 1
13100_2023_302_Article_IEq7.gif
12951_2015_155_Article_IEq25.gif
12951_2015_155_Article_IEq24.gif
Fig. 2
12951_2017_270_Article_IEq5.gif
Fig. 3
Fig. 1
12951_2015_111_Article_IEq1.gif
Fig. 6
Fig. 4
Fig. 1
Fig. 1
12951_2015_155_Article_IEq23.gif
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]