期刊论文详细信息
BMC Genomics
Efficient sequential and parallel algorithms for finding edit distance based motifs
Research
Peng Xiao1  Sanguthevar Rajasekaran1  Soumitra Pal1 
[1] Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Road,, 06269, Storrs, CT, USA;
关键词: Motif;    Edit distance;    Trie;    Radix sort;   
DOI  :  10.1186/s12864-016-2789-9
来源: Springer
PDF
【 摘 要 】

BackgroundMotif search is an important step in extracting meaningful patterns from biological data. The general problem of motif search is intractable and there is a pressing need to develop efficient, exact and approximation algorithms to solve this problem. In this paper, we present several novel, exact, sequential and parallel algorithms for solving the (l,d) Edit-distance-based Motif Search (EMS) problem: given two integers l,d and n biological strings, find all strings of length l that appear in each input string with atmost d errors of types substitution, insertion and deletion.MethodsOne popular technique to solve the problem is to explore for each input string the set of all possible l-mers that belong to the d-neighborhood of any substring of the input string and output those which are common for all input strings. We introduce a novel and provably efficient neighborhood exploration technique. We show that it is enough to consider the candidates in neighborhood which are at a distance exactly d. We compactly represent these candidate motifs using wildcard characters and efficiently explore them with very few repetitions. Our sequential algorithm uses a trie based data structure to efficiently store and sort the candidate motifs. Our parallel algorithm in a multi-core shared memory setting uses arrays for storing and a novel modification of radix-sort for sorting the candidate motifs.ResultsThe algorithms for EMS are customarily evaluated on several challenging instances such as (8,1), (12,2), (16,3), (20,4), and so on. The best previously known algorithm, EMS1, is sequential and in estimated 3 days solves up to instance (16,3). Our sequential algorithms are more than 20 times faster on (16,3). On other hard instances such as (9,2), (11,3), (13,4), our algorithms are much faster. Our parallel algorithm has more than 600 % scaling performance while using 16 threads.ConclusionsOur algorithms have pushed up the state-of-the-art of EMS solvers and we believe that the techniques introduced in this paper are also applicable to other motif search problems such as Planted Motif Search (PMS) and Simple Motif Search (SMS).

【 授权许可】

CC BY   
© The Author(s) 2016

【 预 览 】
附件列表
Files Size Format View
RO202311095253054ZK.pdf 1116KB PDF download
12864_2016_3098_Article_IEq6.gif 1KB Image download
12864_2016_3425_Article_IEq2.gif 1KB Image download
12864_2015_2055_Article_IEq59.gif 1KB Image download
12864_2017_4186_Article_IEq8.gif 1KB Image download
12864_2017_4179_Article_IEq8.gif 1KB Image download
12864_2017_4179_Article_IEq9.gif 1KB Image download
12864_2017_4179_Article_IEq10.gif 1KB Image download
12864_2017_4179_Article_IEq11.gif 2KB Image download
12864_2017_4179_Article_IEq12.gif 1KB Image download
12864_2017_4179_Article_IEq13.gif 1KB Image download
12864_2015_2055_Article_IEq8.gif 1KB Image download
12864_2017_4132_Article_IEq1.gif 1KB Image download
12864_2015_2055_Article_IEq9.gif 1KB Image download
12888_2016_877_Article_IEq1.gif 1KB Image download
12888_2016_877_Article_IEq2.gif 1KB Image download
12864_2016_2789_Article_IEq16.gif 1KB Image download
12888_2016_877_Article_IEq16.gif 1KB Image download
12864_2017_3670_Article_IEq12.gif 1KB Image download
12864_2017_3610_Article_IEq3.gif 1KB Image download
12864_2016_2789_Article_IEq20.gif 1KB Image download
12864_2017_4132_Article_IEq27.gif 1KB Image download
12864_2017_4132_Article_IEq30.gif 1KB Image download
12864_2017_3990_Article_IEq10.gif 1KB Image download
12888_2017_1528_Article_IEq1.gif 1KB Image download
12864_2017_4132_Article_IEq31.gif 1KB Image download
12864_2017_4133_Article_IEq19.gif 1KB Image download
12864_2017_4133_Article_IEq20.gif 1KB Image download
12864_2017_3798_Article_IEq1.gif 1KB Image download
12864_2017_3487_Article_IEq42.gif 1KB Image download
12864_2017_3605_Article_IEq6.gif 1KB Image download
12888_2017_1365_Article_IEq2.gif 1KB Image download
12888_2017_1284_Article_IEq1.gif 1KB Image download
12864_2017_3487_Article_IEq51.gif 1KB Image download
12864_2017_4133_Article_IEq30.gif 1KB Image download
12864_2017_3676_Article_IEq5.gif 1KB Image download
12864_2017_4190_Article_IEq6.gif 1KB Image download
12864_2017_3676_Article_IEq7.gif 1KB Image download
12864_2017_3492_Article_IEq14.gif 1KB Image download
12864_2017_3492_Article_IEq16.gif 1KB Image download
12864_2017_3733_Article_IEq60.gif 1KB Image download
12864_2016_2789_Article_IEq41.gif 1KB Image download
12864_2016_2789_Article_IEq42.gif 1KB Image download
12864_2017_3492_Article_IEq19.gif 1KB Image download
12864_2016_2789_Article_IEq44.gif 1KB Image download
12864_2017_3655_Article_IEq7.gif 1KB Image download
12864_2016_2682_Article_IEq29.gif 1KB Image download
12888_2017_1557_Article_IEq8.gif 1KB Image download
12888_2017_1557_Article_IEq9.gif 1KB Image download
12864_2016_2789_Article_IEq49.gif 1KB Image download
12864_2017_3783_Article_IEq4.gif 1KB Image download
12864_2016_2789_Article_IEq51.gif 1KB Image download
12864_2016_2789_Article_IEq52.gif 1KB Image download
【 图 表 】

12864_2016_2789_Article_IEq52.gif

12864_2016_2789_Article_IEq51.gif

12864_2017_3783_Article_IEq4.gif

12864_2016_2789_Article_IEq49.gif

12888_2017_1557_Article_IEq9.gif

12888_2017_1557_Article_IEq8.gif

12864_2016_2682_Article_IEq29.gif

12864_2017_3655_Article_IEq7.gif

12864_2016_2789_Article_IEq44.gif

12864_2017_3492_Article_IEq19.gif

12864_2016_2789_Article_IEq42.gif

12864_2016_2789_Article_IEq41.gif

12864_2017_3733_Article_IEq60.gif

12864_2017_3492_Article_IEq16.gif

12864_2017_3492_Article_IEq14.gif

12864_2017_3676_Article_IEq7.gif

12864_2017_4190_Article_IEq6.gif

12864_2017_3676_Article_IEq5.gif

12864_2017_4133_Article_IEq30.gif

12864_2017_3487_Article_IEq51.gif

12888_2017_1284_Article_IEq1.gif

12888_2017_1365_Article_IEq2.gif

12864_2017_3605_Article_IEq6.gif

12864_2017_3487_Article_IEq42.gif

12864_2017_3798_Article_IEq1.gif

12864_2017_4133_Article_IEq20.gif

12864_2017_4133_Article_IEq19.gif

12864_2017_4132_Article_IEq31.gif

12888_2017_1528_Article_IEq1.gif

12864_2017_3990_Article_IEq10.gif

12864_2017_4132_Article_IEq30.gif

12864_2017_4132_Article_IEq27.gif

12864_2016_2789_Article_IEq20.gif

12864_2017_3610_Article_IEq3.gif

12864_2017_3670_Article_IEq12.gif

12888_2016_877_Article_IEq16.gif

12864_2016_2789_Article_IEq16.gif

12888_2016_877_Article_IEq2.gif

12888_2016_877_Article_IEq1.gif

12864_2015_2055_Article_IEq9.gif

12864_2017_4132_Article_IEq1.gif

12864_2015_2055_Article_IEq8.gif

12864_2017_4179_Article_IEq13.gif

12864_2017_4179_Article_IEq12.gif

12864_2017_4179_Article_IEq11.gif

12864_2017_4179_Article_IEq10.gif

12864_2017_4179_Article_IEq9.gif

12864_2017_4179_Article_IEq8.gif

12864_2017_4186_Article_IEq8.gif

12864_2015_2055_Article_IEq59.gif

12864_2016_3425_Article_IEq2.gif

12864_2016_3098_Article_IEq6.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  文献评价指标  
  下载次数:59次 浏览次数:0次