期刊论文详细信息
BMC Genomics
Efficient sequential and parallel algorithms for finding edit distance based motifs
Research
Peng Xiao1  Sanguthevar Rajasekaran1  Soumitra Pal1 
[1] Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Road,, 06269, Storrs, CT, USA;
关键词: Motif;    Edit distance;    Trie;    Radix sort;   
DOI  :  10.1186/s12864-016-2789-9
来源: Springer
PDF
【 摘 要 】

BackgroundMotif search is an important step in extracting meaningful patterns from biological data. The general problem of motif search is intractable and there is a pressing need to develop efficient, exact and approximation algorithms to solve this problem. In this paper, we present several novel, exact, sequential and parallel algorithms for solving the (l,d) Edit-distance-based Motif Search (EMS) problem: given two integers l,d and n biological strings, find all strings of length l that appear in each input string with atmost d errors of types substitution, insertion and deletion.MethodsOne popular technique to solve the problem is to explore for each input string the set of all possible l-mers that belong to the d-neighborhood of any substring of the input string and output those which are common for all input strings. We introduce a novel and provably efficient neighborhood exploration technique. We show that it is enough to consider the candidates in neighborhood which are at a distance exactly d. We compactly represent these candidate motifs using wildcard characters and efficiently explore them with very few repetitions. Our sequential algorithm uses a trie based data structure to efficiently store and sort the candidate motifs. Our parallel algorithm in a multi-core shared memory setting uses arrays for storing and a novel modification of radix-sort for sorting the candidate motifs.ResultsThe algorithms for EMS are customarily evaluated on several challenging instances such as (8,1), (12,2), (16,3), (20,4), and so on. The best previously known algorithm, EMS1, is sequential and in estimated 3 days solves up to instance (16,3). Our sequential algorithms are more than 20 times faster on (16,3). On other hard instances such as (9,2), (11,3), (13,4), our algorithms are much faster. Our parallel algorithm has more than 600 % scaling performance while using 16 threads.ConclusionsOur algorithms have pushed up the state-of-the-art of EMS solvers and we believe that the techniques introduced in this paper are also applicable to other motif search problems such as Planted Motif Search (PMS) and Simple Motif Search (SMS).

【 授权许可】

CC BY   
© The Author(s) 2016

【 预 览 】
附件列表
Files Size Format View
RO202311104944460ZK.pdf 1116KB PDF download
12951_2015_155_Article_IEq23.gif 1KB Image download
Fig. 1 238KB Image download
Fig. 1 1909KB Image download
Fig. 4 161KB Image download
MediaObjects/13046_2023_2846_MOESM8_ESM.pdf 161KB PDF download
Fig. 6 83KB Image download
12951_2015_111_Article_IEq1.gif 1KB Image download
Fig. 1 2753KB Image download
Table 2 61KB Table download
MediaObjects/13046_2023_2846_MOESM10_ESM.pdf 123KB PDF download
Fig. 3 393KB Image download
12951_2017_270_Article_IEq5.gif 1KB Image download
Fig. 2 804KB Image download
MediaObjects/12936_2023_4728_MOESM6_ESM.docx 78KB Other download
12951_2015_155_Article_IEq24.gif 1KB Image download
12951_2015_155_Article_IEq25.gif 1KB Image download
13100_2023_302_Article_IEq7.gif 1KB Image download
Fig. 1 192KB Image download
12951_2015_111_Article_IEq4.gif 1KB Image download
Fig. 1 50KB Image download
MediaObjects/42004_2023_1020_MOESM3_ESM.mov 3708KB Other download
12951_2017_270_Article_IEq8.gif 1KB Image download
Fig. 2 123KB Image download
12951_2017_270_Article_IEq9.gif 1KB Image download
MediaObjects/13690_2023_1195_MOESM1_ESM.docx 19KB Other download
MediaObjects/13690_2023_1195_MOESM2_ESM.docx 146KB Other download
MediaObjects/12888_2023_5201_MOESM1_ESM.docx 13KB Other download
Fig. 6 196KB Image download
MediaObjects/12888_2023_5201_MOESM2_ESM.pdf 264KB PDF download
Fig. 5 50KB Image download
12951_2015_155_Article_IEq30.gif 1KB Image download
MediaObjects/12888_2023_5201_MOESM3_ESM.pdf 221KB PDF download
Fig. 1 132KB Image download
12951_2015_111_Article_IEq5.gif 1KB Image download
Fig. 7 198KB Image download
Fig. 3 3571KB Image download
Fig. 3 3612KB Image download
Fig. 8 1473KB Image download
12951_2017_270_Article_IEq10.gif 1KB Image download
Fig. 1 96KB Image download
Fig. 2 582KB Image download
Fig. 2 4215KB Image download
Fig. 2 172KB Image download
Fig. 1 104KB Image download
12864_2016_2889_Article_IEq3.gif 1KB Image download
Fig. 1 181KB Image download
12951_2015_155_Article_IEq32.gif 1KB Image download
12951_2015_155_Article_IEq33.gif 1KB Image download
12951_2015_155_Article_IEq34.gif 1KB Image download
Fig. 1 134KB Image download
MediaObjects/12888_2023_5242_MOESM1_ESM.docx 20KB Other download
Fig. 9 7247KB Image download
【 图 表 】

Fig. 9

Fig. 1

12951_2015_155_Article_IEq34.gif

12951_2015_155_Article_IEq33.gif

12951_2015_155_Article_IEq32.gif

Fig. 1

12864_2016_2889_Article_IEq3.gif

Fig. 1

Fig. 2

Fig. 2

Fig. 2

Fig. 1

12951_2017_270_Article_IEq10.gif

Fig. 8

Fig. 3

Fig. 3

Fig. 7

12951_2015_111_Article_IEq5.gif

Fig. 1

12951_2015_155_Article_IEq30.gif

Fig. 5

Fig. 6

12951_2017_270_Article_IEq9.gif

Fig. 2

12951_2017_270_Article_IEq8.gif

Fig. 1

12951_2015_111_Article_IEq4.gif

Fig. 1

13100_2023_302_Article_IEq7.gif

12951_2015_155_Article_IEq25.gif

12951_2015_155_Article_IEq24.gif

Fig. 2

12951_2017_270_Article_IEq5.gif

Fig. 3

Fig. 1

12951_2015_111_Article_IEq1.gif

Fig. 6

Fig. 4

Fig. 1

Fig. 1

12951_2015_155_Article_IEq23.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  文献评价指标  
  下载次数:1次 浏览次数:2次