期刊论文详细信息
BMC Bioinformatics
Read mapping on de Bruijn graphs
Research Article
Pierre Peterlongo1  Antoine Limasset1  Eric Rivals2  Bastien Cazaux2 
[1] IRISA Inria Rennes Bretagne Atlantique, GenScale team, Campus de Beaulieu, 35042, Rennes, France;L.I.R.M.M., UMR 5506, Université de Montpellier et CNRS, 860 rue de St Priest, F-34392, Montpellier Cedex 5, France;Institut Biologie Computationnelle, Université de Montpellier, F-34392, Montpellier, France;
关键词: Read mapping;    De Bruijn graph;    NGS;    Sequence graph;    path;    Hamiltonian path;    Genomics;    Assembly;    NP-complete;   
DOI  :  10.1186/s12859-016-1103-9
 received in 2015-12-09, accepted in 2016-05-26,  发布年份 2016
来源: Springer
PDF
【 摘 要 】

BackgroundNext Generation Sequencing (NGS) has dramatically enhanced our ability to sequence genomes, but not to assemble them. In practice, many published genome sequences remain in the state of a large set of contigs. Each contig describes the sequence found along some path of the assembly graph, however, the set of contigs does not record all the sequence information contained in that graph. Although many subsequent analyses can be performed with the set of contigs, one may ask whether mapping reads on the contigs is as informative as mapping them on the paths of the assembly graph. Currently, one lacks practical tools to perform mapping on such graphs.ResultsHere, we propose a formal definition of mapping on a de Bruijn graph, analyse the problem complexity which turns out to be NP-complete, and provide a practical solution. We propose a pipeline called GGMAP (Greedy Graph MAPping). Its novelty is a procedure to map reads on branching paths of the graph, for which we designed a heuristic algorithm called BGREAT (de Bruijn Graph REAd mapping Tool). For the sake of efficiency, BGREAT rewrites a read sequence as a succession of unitigs sequences. GGMAP can map millions of reads per CPU hour on a de Bruijn graph built from a large set of human genomic reads. Surprisingly, results show that up to 22 % more reads can be mapped on the graph but not on the contig set.ConclusionsAlthough mapping reads on a de Bruijn graph is complex task, our proposal offers a practical solution combining efficiency with an improved mapping capacity compared to assembly-based mapping even for complex eukaryotic data.

【 授权许可】

CC BY   
© Limasset et al. 2016

【 预 览 】
附件列表
Files Size Format View
RO202311107116900ZK.pdf 1189KB PDF download
Fig. 2 3290KB Image download
Fig. 3 356KB Image download
MediaObjects/12888_2023_5198_MOESM1_ESM.docx 17KB Other download
Fig. 1 806KB Image download
Fig. 3 447KB Image download
MediaObjects/13046_2023_2865_MOESM4_ESM.tif 8864KB Other download
Fig. 5 471KB Image download
Fig. 1 2453KB Image download
12944_2023_1932_Article_IEq4.gif 2KB Image download
Fig. 4 557KB Image download
MediaObjects/12974_2023_2918_MOESM1_ESM.jpg 395KB Other download
Fig. 3 825KB Image download
12936_2016_1315_Article_IEq9.gif 1KB Image download
Fig. 5 466KB Image download
MediaObjects/12936_2023_4475_MOESM1_ESM.doc 84KB Other download
MediaObjects/12974_2023_2918_MOESM2_ESM.jpg 726KB Other download
12936_2015_966_Article_IEq12.gif 1KB Image download
Fig. 3 3082KB Image download
Fig. 3 606KB Image download
12936_2017_2014_Article_IEq61.gif 1KB Image download
Fig. 5 121KB Image download
MediaObjects/13046_2022_2359_MOESM2_ESM.docx 15KB Other download
Fig. 12 729KB Image download
Fig. 1 247KB Image download
MediaObjects/12951_2023_2121_MOESM1_ESM.docx 9323KB Other download
【 图 表 】

Fig. 1

Fig. 12

Fig. 5

12936_2017_2014_Article_IEq61.gif

Fig. 3

Fig. 3

12936_2015_966_Article_IEq12.gif

Fig. 5

12936_2016_1315_Article_IEq9.gif

Fig. 3

Fig. 4

12944_2023_1932_Article_IEq4.gif

Fig. 1

Fig. 5

Fig. 3

Fig. 1

Fig. 3

Fig. 2

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  文献评价指标  
  下载次数:5次 浏览次数:1次