BMC Genomics | |
Efficient sequential and parallel algorithms for finding edit distance based motifs | |
Research | |
Peng Xiao1  Sanguthevar Rajasekaran1  Soumitra Pal1  | |
[1] Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Road,, 06269, Storrs, CT, USA; | |
关键词: Motif; Edit distance; Trie; Radix sort; | |
DOI : 10.1186/s12864-016-2789-9 | |
来源: Springer | |
【 摘 要 】
BackgroundMotif search is an important step in extracting meaningful patterns from biological data. The general problem of motif search is intractable and there is a pressing need to develop efficient, exact and approximation algorithms to solve this problem. In this paper, we present several novel, exact, sequential and parallel algorithms for solving the (l,d) Edit-distance-based Motif Search (EMS) problem: given two integers l,d and n biological strings, find all strings of length l that appear in each input string with atmost d errors of types substitution, insertion and deletion.MethodsOne popular technique to solve the problem is to explore for each input string the set of all possible l-mers that belong to the d-neighborhood of any substring of the input string and output those which are common for all input strings. We introduce a novel and provably efficient neighborhood exploration technique. We show that it is enough to consider the candidates in neighborhood which are at a distance exactly d. We compactly represent these candidate motifs using wildcard characters and efficiently explore them with very few repetitions. Our sequential algorithm uses a trie based data structure to efficiently store and sort the candidate motifs. Our parallel algorithm in a multi-core shared memory setting uses arrays for storing and a novel modification of radix-sort for sorting the candidate motifs.ResultsThe algorithms for EMS are customarily evaluated on several challenging instances such as (8,1), (12,2), (16,3), (20,4), and so on. The best previously known algorithm, EMS1, is sequential and in estimated 3 days solves up to instance (16,3). Our sequential algorithms are more than 20 times faster on (16,3). On other hard instances such as (9,2), (11,3), (13,4), our algorithms are much faster. Our parallel algorithm has more than 600 % scaling performance while using 16 threads.ConclusionsOur algorithms have pushed up the state-of-the-art of EMS solvers and we believe that the techniques introduced in this paper are also applicable to other motif search problems such as Planted Motif Search (PMS) and Simple Motif Search (SMS).
【 授权许可】
CC BY
© The Author(s) 2016
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311095253054ZK.pdf | 1116KB | download | |
12864_2016_3098_Article_IEq6.gif | 1KB | Image | download |
12864_2016_3425_Article_IEq2.gif | 1KB | Image | download |
12864_2015_2055_Article_IEq59.gif | 1KB | Image | download |
12864_2017_4186_Article_IEq8.gif | 1KB | Image | download |
12864_2017_4179_Article_IEq8.gif | 1KB | Image | download |
12864_2017_4179_Article_IEq9.gif | 1KB | Image | download |
12864_2017_4179_Article_IEq10.gif | 1KB | Image | download |
12864_2017_4179_Article_IEq11.gif | 2KB | Image | download |
12864_2017_4179_Article_IEq12.gif | 1KB | Image | download |
12864_2017_4179_Article_IEq13.gif | 1KB | Image | download |
12864_2015_2055_Article_IEq8.gif | 1KB | Image | download |
12864_2017_4132_Article_IEq1.gif | 1KB | Image | download |
12864_2015_2055_Article_IEq9.gif | 1KB | Image | download |
12888_2016_877_Article_IEq1.gif | 1KB | Image | download |
12888_2016_877_Article_IEq2.gif | 1KB | Image | download |
12864_2016_2789_Article_IEq16.gif | 1KB | Image | download |
12888_2016_877_Article_IEq16.gif | 1KB | Image | download |
12864_2017_3670_Article_IEq12.gif | 1KB | Image | download |
12864_2017_3610_Article_IEq3.gif | 1KB | Image | download |
12864_2016_2789_Article_IEq20.gif | 1KB | Image | download |
12864_2017_4132_Article_IEq27.gif | 1KB | Image | download |
12864_2017_4132_Article_IEq30.gif | 1KB | Image | download |
12864_2017_3990_Article_IEq10.gif | 1KB | Image | download |
12888_2017_1528_Article_IEq1.gif | 1KB | Image | download |
12864_2017_4132_Article_IEq31.gif | 1KB | Image | download |
12864_2017_4133_Article_IEq19.gif | 1KB | Image | download |
12864_2017_4133_Article_IEq20.gif | 1KB | Image | download |
12864_2017_3798_Article_IEq1.gif | 1KB | Image | download |
12864_2017_3487_Article_IEq42.gif | 1KB | Image | download |
12864_2017_3605_Article_IEq6.gif | 1KB | Image | download |
12888_2017_1365_Article_IEq2.gif | 1KB | Image | download |
12888_2017_1284_Article_IEq1.gif | 1KB | Image | download |
12864_2017_3487_Article_IEq51.gif | 1KB | Image | download |
12864_2017_4133_Article_IEq30.gif | 1KB | Image | download |
12864_2017_3676_Article_IEq5.gif | 1KB | Image | download |
12864_2017_4190_Article_IEq6.gif | 1KB | Image | download |
12864_2017_3676_Article_IEq7.gif | 1KB | Image | download |
12864_2017_3492_Article_IEq14.gif | 1KB | Image | download |
12864_2017_3492_Article_IEq16.gif | 1KB | Image | download |
12864_2017_3733_Article_IEq60.gif | 1KB | Image | download |
12864_2016_2789_Article_IEq41.gif | 1KB | Image | download |
12864_2016_2789_Article_IEq42.gif | 1KB | Image | download |
12864_2017_3492_Article_IEq19.gif | 1KB | Image | download |
12864_2016_2789_Article_IEq44.gif | 1KB | Image | download |
12864_2017_3655_Article_IEq7.gif | 1KB | Image | download |
12864_2016_2682_Article_IEq29.gif | 1KB | Image | download |
12888_2017_1557_Article_IEq8.gif | 1KB | Image | download |
12888_2017_1557_Article_IEq9.gif | 1KB | Image | download |
12864_2016_2789_Article_IEq49.gif | 1KB | Image | download |
12864_2017_3783_Article_IEq4.gif | 1KB | Image | download |
12864_2016_2789_Article_IEq51.gif | 1KB | Image | download |
12864_2016_2789_Article_IEq52.gif | 1KB | Image | download |
【 图 表 】
12864_2016_2789_Article_IEq52.gif
12864_2016_2789_Article_IEq51.gif
12864_2017_3783_Article_IEq4.gif
12864_2016_2789_Article_IEq49.gif
12888_2017_1557_Article_IEq9.gif
12888_2017_1557_Article_IEq8.gif
12864_2016_2682_Article_IEq29.gif
12864_2017_3655_Article_IEq7.gif
12864_2016_2789_Article_IEq44.gif
12864_2017_3492_Article_IEq19.gif
12864_2016_2789_Article_IEq42.gif
12864_2016_2789_Article_IEq41.gif
12864_2017_3733_Article_IEq60.gif
12864_2017_3492_Article_IEq16.gif
12864_2017_3492_Article_IEq14.gif
12864_2017_3676_Article_IEq7.gif
12864_2017_4190_Article_IEq6.gif
12864_2017_3676_Article_IEq5.gif
12864_2017_4133_Article_IEq30.gif
12864_2017_3487_Article_IEq51.gif
12888_2017_1284_Article_IEq1.gif
12888_2017_1365_Article_IEq2.gif
12864_2017_3605_Article_IEq6.gif
12864_2017_3487_Article_IEq42.gif
12864_2017_3798_Article_IEq1.gif
12864_2017_4133_Article_IEq20.gif
12864_2017_4133_Article_IEq19.gif
12864_2017_4132_Article_IEq31.gif
12888_2017_1528_Article_IEq1.gif
12864_2017_3990_Article_IEq10.gif
12864_2017_4132_Article_IEq30.gif
12864_2017_4132_Article_IEq27.gif
12864_2016_2789_Article_IEq20.gif
12864_2017_3610_Article_IEq3.gif
12864_2017_3670_Article_IEq12.gif
12888_2016_877_Article_IEq16.gif
12864_2016_2789_Article_IEq16.gif
12888_2016_877_Article_IEq2.gif
12888_2016_877_Article_IEq1.gif
12864_2015_2055_Article_IEq9.gif
12864_2017_4132_Article_IEq1.gif
12864_2015_2055_Article_IEq8.gif
12864_2017_4179_Article_IEq13.gif
12864_2017_4179_Article_IEq12.gif
12864_2017_4179_Article_IEq11.gif
12864_2017_4179_Article_IEq10.gif
12864_2017_4179_Article_IEq9.gif
12864_2017_4179_Article_IEq8.gif
12864_2017_4186_Article_IEq8.gif
12864_2015_2055_Article_IEq59.gif
12864_2016_3425_Article_IEq2.gif
12864_2016_3098_Article_IEq6.gif
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]