期刊论文详细信息
BMC Genomics
A new algorithm for “the LCS problem” with application in compressing genome resequencing data
Research
Tazin Afrin1  Donald Adjeroh1  Aliya Farheen1  Richard Beal1 
[1] Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, USA;
关键词: Longest common subsequence;    LCS;    Longest previous factor;    LPF;    Compression;    Biology;    Genome resequencing;   
DOI  :  10.1186/s12864-016-2793-0
来源: Springer
PDF
【 摘 要 】

BackgroundThe longest common subsequence (LCS) problem is a classical problem in computer science, and forms the basis of the current best-performing reference-based compression schemes for genome resequencing data.MethodsFirst, we present a new algorithm for the LCS problem. Using the generalized suffix tree, we identify the common substrings shared between the two input sequences. Using the maximal common substrings, we construct a directed acyclic graph (DAG), based on which we determine the LCS as the longest path in the DAG. Then, we introduce an LCS-motivated reference-based compression scheme using the components of the LCS, rather than the LCS itself.ResultsOur basic scheme compressed the Homo sapiens genome (with an original size of 3,080,436,051 bytes) to 15,460,478 bytes. An improvement on the basic method further reduced this to 8,556,708 bytes, or an overall compression ratio of 360. This can be compared to the previous state-of-the-art compression ratios of 157 (Wang and Zhang, 2011) and 171 (Pinho, Pratas, and Garcia, 2011).ConclusionWe propose a new algorithm to address the longest common subsequence problem. Motivated by our LCS algorithm, we introduce a new reference-based compression scheme for genome resequencing data. Comparative results against state-of-the-art reference-based compression algorithms demonstrate the performance of the proposed method.

【 授权许可】

CC BY   
© The Author(s) 2016

【 预 览 】
附件列表
Files Size Format View
RO202311099179772ZK.pdf 1753KB PDF download
12864_2017_4025_Article_IEq10.gif 1KB Image download
12888_2017_1504_Article_IEq3.gif 1KB Image download
12888_2017_1504_Article_IEq4.gif 1KB Image download
12864_2016_2793_Article_IEq4.gif 1KB Image download
12864_2017_3771_Article_IEq2.gif 1KB Image download
12864_2016_2793_Article_IEq6.gif 1KB Image download
12864_2017_4004_Article_IEq11.gif 1KB Image download
12864_2015_1945_Article_IEq4.gif 1KB Image download
12864_2017_3521_Article_IEq1.gif 1KB Image download
12864_2017_4030_Article_IEq26.gif 1KB Image download
12864_2017_4004_Article_IEq15.gif 1KB Image download
12864_2017_4030_Article_IEq28.gif 1KB Image download
12864_2016_2793_Article_IEq13.gif 1KB Image download
12864_2017_4226_Article_IEq2.gif 1KB Image download
12864_2017_4132_Article_IEq10.gif 1KB Image download
12864_2017_4228_Article_IEq2.gif 1KB Image download
12864_2017_4132_Article_IEq11.gif 1KB Image download
12864_2017_4132_Article_IEq13.gif 1KB Image download
12864_2017_4132_Article_IEq14.gif 1KB Image download
12864_2017_3916_Article_IEq1.gif 1KB Image download
12864_2016_2793_Article_IEq21.gif 1KB Image download
12864_2015_2273_Article_IEq15.gif 1KB Image download
12864_2016_2793_Article_IEq23.gif 1KB Image download
12864_2015_2273_Article_IEq16.gif 1KB Image download
12864_2017_4179_Article_IEq29.gif 1KB Image download
12864_2017_3670_Article_IEq10.gif 1KB Image download
12864_2017_3670_Article_IEq11.gif 1KB Image download
12894_2015_Article_85_TeX2GIF_IEq2.gif 1KB Image download
12864_2017_3670_Article_IEq12.gif 1KB Image download
12864_2017_4179_Article_IEq34.gif 1KB Image download
12864_2016_2897_Article_IEq12.gif 1KB Image download
12864_2017_3990_Article_IEq8.gif 1KB Image download
12864_2015_2192_Article_IEq25.gif 1KB Image download
12864_2017_3661_Article_IEq2.gif 1KB Image download
12864_2016_2793_Article_IEq35.gif 1KB Image download
12864_2016_2793_Article_IEq36.gif 1KB Image download
12864_2017_3605_Article_IEq2.gif 1KB Image download
12864_2015_1944_Article_IEq4.gif 1KB Image download
12864_2017_3683_Article_IEq2.gif 1KB Image download
12864_2016_2793_Article_IEq40.gif 1KB Image download
12864_2015_1944_Article_IEq8.gif 1KB Image download
12864_2017_4020_Article_IEq2.gif 1KB Image download
12864_2017_3920_Article_IEq4.gif 1KB Image download
12864_2017_3676_Article_IEq1.gif 1KB Image download
12864_2017_4020_Article_IEq4.gif 1KB Image download
12864_2017_3733_Article_IEq41.gif 1KB Image download
12864_2017_4020_Article_IEq5.gif 1KB Image download
12864_2017_3777_Article_IEq4.gif 1KB Image download
12864_2017_4020_Article_IEq8.gif 1KB Image download
12864_2017_4271_Article_IEq1.gif 1KB Image download
12864_2017_3781_Article_IEq3.gif 1KB Image download
12864_2016_2793_Article_IEq52.gif 1KB Image download
12864_2017_3990_Article_IEq15.gif 1KB Image download
12864_2017_3487_Article_IEq45.gif 1KB Image download
12888_2017_1365_Article_IEq2.gif 1KB Image download
12888_2017_1365_Article_IEq3.gif 1KB Image download
【 图 表 】

12888_2017_1365_Article_IEq3.gif

12888_2017_1365_Article_IEq2.gif

12864_2017_3487_Article_IEq45.gif

12864_2017_3990_Article_IEq15.gif

12864_2016_2793_Article_IEq52.gif

12864_2017_3781_Article_IEq3.gif

12864_2017_4271_Article_IEq1.gif

12864_2017_4020_Article_IEq8.gif

12864_2017_3777_Article_IEq4.gif

12864_2017_4020_Article_IEq5.gif

12864_2017_3733_Article_IEq41.gif

12864_2017_4020_Article_IEq4.gif

12864_2017_3676_Article_IEq1.gif

12864_2017_3920_Article_IEq4.gif

12864_2017_4020_Article_IEq2.gif

12864_2015_1944_Article_IEq8.gif

12864_2016_2793_Article_IEq40.gif

12864_2017_3683_Article_IEq2.gif

12864_2015_1944_Article_IEq4.gif

12864_2017_3605_Article_IEq2.gif

12864_2016_2793_Article_IEq36.gif

12864_2016_2793_Article_IEq35.gif

12864_2017_3661_Article_IEq2.gif

12864_2015_2192_Article_IEq25.gif

12864_2017_3990_Article_IEq8.gif

12864_2016_2897_Article_IEq12.gif

12864_2017_4179_Article_IEq34.gif

12864_2017_3670_Article_IEq12.gif

12894_2015_Article_85_TeX2GIF_IEq2.gif

12864_2017_3670_Article_IEq11.gif

12864_2017_3670_Article_IEq10.gif

12864_2017_4179_Article_IEq29.gif

12864_2015_2273_Article_IEq16.gif

12864_2016_2793_Article_IEq23.gif

12864_2015_2273_Article_IEq15.gif

12864_2016_2793_Article_IEq21.gif

12864_2017_3916_Article_IEq1.gif

12864_2017_4132_Article_IEq14.gif

12864_2017_4132_Article_IEq13.gif

12864_2017_4132_Article_IEq11.gif

12864_2017_4228_Article_IEq2.gif

12864_2017_4132_Article_IEq10.gif

12864_2017_4226_Article_IEq2.gif

12864_2016_2793_Article_IEq13.gif

12864_2017_4030_Article_IEq28.gif

12864_2017_4004_Article_IEq15.gif

12864_2017_4030_Article_IEq26.gif

12864_2017_3521_Article_IEq1.gif

12864_2015_1945_Article_IEq4.gif

12864_2017_4004_Article_IEq11.gif

12864_2016_2793_Article_IEq6.gif

12864_2017_3771_Article_IEq2.gif

12864_2016_2793_Article_IEq4.gif

12888_2017_1504_Article_IEq4.gif

12888_2017_1504_Article_IEq3.gif

12864_2017_4025_Article_IEq10.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  文献评价指标  
  下载次数:240次 浏览次数:5次