期刊论文详细信息
BMC Bioinformatics
RegCloser: a robust regression approach to closing genome gaps
Research
Mengtian Li1  Shenghao Cao1  Lei M. Li1 
[1] National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, 100190, Beijing, China;University of Chinese Academy of Sciences, 100049, Beijing, China;
关键词: Genome assembly;    Closing gaps;    Robust regression;    Tandem repeat;   
DOI  :  10.1186/s12859-023-05367-0
 received in 2023-03-01, accepted in 2023-05-27,  发布年份 2023
来源: Springer
PDF
【 摘 要 】

BackgroundClosing gaps in draft genomes leads to more complete and continuous genome assemblies. The ubiquitous genomic repeats are challenges to the existing gap-closing methods, based on either the k-mer representation by the de Bruijn graph or the overlap-layout-consensus paradigm. Besides, chimeric reads will cause erroneous k-mers in the former and false overlaps of reads in the latter.ResultsWe propose a novel local assembly approach to gap closing, called RegCloser. It represents read coordinates and their overlaps respectively by parameters and observations in a linear regression model. The optimal overlap is searched only in the restricted range consistent with insert sizes. Under this linear regression framework, the local DNA assembly becomes a robust parameter estimation problem. We solved the problem by a customized robust regression procedure that resists the influence of false overlaps by optimizing a convex global Huber loss function. The global optimum is obtained by iteratively solving the sparse system of linear equations. On both simulated and real datasets, RegCloser outperformed other popular methods in accurately resolving the copy number of tandem repeats, and achieved superior completeness and contiguity. Applying RegCloser to a plateau zokor draft genome that had been improved by long reads further increased contig N50 to 3-fold long. We also tested the robust regression approach on layout generation of long reads.ConclusionsRegCloser is a competitive gap-closing tool. The software is available at https://github.com/csh3/RegCloser. The robust regression approach has a prospect to be incorporated into the layout module of long read assemblers.

【 授权许可】

CC BY   
© The Author(s) 2023

【 预 览 】
附件列表
Files Size Format View
RO202309079176995ZK.pdf 4120KB PDF download
13690_2023_1130_Article_IEq25.gif 1KB Image download
Fig. 1 1112KB Image download
40517_2023_252_Article_IEq8.gif 1KB Image download
Fig. 5 584KB Image download
MediaObjects/13046_2023_2710_MOESM8_ESM.pdf 1032KB PDF download
Fig. 2 249KB Image download
Fig. 4 93KB Image download
MediaObjects/40360_2023_664_MOESM1_ESM.docx 126KB Other download
Fig. 9 152KB Image download
Fig. 1 149KB Image download
Fig. 8 348KB Image download
MediaObjects/13046_2023_2710_MOESM13_ESM.pdf 626KB PDF download
Fig. 13 799KB Image download
MediaObjects/12888_2023_4879_MOESM1_ESM.doc 416KB Other download
MediaObjects/12902_2023_1381_MOESM1_ESM.docx 16KB Other download
12936_2023_4634_Article_IEq5.gif 1KB Image download
Fig. 16 74KB Image download
MediaObjects/42004_2023_909_MOESM1_ESM.pdf 612KB PDF download
Fig. 5 1522KB Image download
MediaObjects/12888_2023_4948_MOESM1_ESM.doc 123KB Other download
MediaObjects/13041_2023_1045_MOESM3_ESM.docx 438KB Other download
MediaObjects/12864_2023_9442_MOESM13_ESM.xlsx 179KB Other download
Fig. 7 783KB Image download
Fig. 24 514KB Image download
Fig. 2 1719KB Image download
Fig. 6 479KB Image download
MediaObjects/41408_2023_874_MOESM1_ESM.docx 167KB Other download
Fig. 7 320KB Image download
Fig. 1 616KB Image download
Fig. 1 181KB Image download
MediaObjects/12888_2023_4917_MOESM1_ESM.docx 859KB Other download
41116_2023_37_Article_IEq1.gif 1KB Image download
MediaObjects/12951_2023_1944_MOESM6_ESM.tif 8002KB Other download
41116_2023_37_Article_IEq2.gif 1KB Image download
41116_2023_37_Article_IEq3.gif 1KB Image download
41116_2023_37_Article_IEq4.gif 1KB Image download
MediaObjects/12888_2023_4944_MOESM1_ESM.docx 50KB Other download
41116_2023_37_Article_IEq6.gif 1KB Image download
41116_2023_37_Article_IEq7.gif 1KB Image download
MediaObjects/12888_2023_4917_MOESM2_ESM.docx 1739KB Other download
MediaObjects/12888_2023_4950_MOESM1_ESM.docx 32KB Other download
41116_2023_37_Article_IEq10.gif 1KB Image download
Fig. 2 116KB Image download
41116_2023_37_Article_IEq12.gif 1KB Image download
Fig. 1 41KB Image download
41116_2023_37_Article_IEq13.gif 1KB Image download
MediaObjects/12888_2023_4936_MOESM1_ESM.docx 20KB Other download
41116_2023_37_Article_IEq14.gif 1KB Image download
Fig. 1 185KB Image download
Fig. 1 199KB Image download
41116_2023_37_Article_IEq26.gif 1KB Image download
41116_2023_37_Article_IEq27.gif 1KB Image download
41116_2023_37_Article_IEq28.gif 1KB Image download
41116_2023_37_Article_IEq29.gif 1KB Image download
41116_2023_37_Article_IEq31.gif 1KB Image download
41116_2023_37_Article_IEq32.gif 1KB Image download
41116_2023_37_Article_IEq33.gif 1KB Image download
41116_2023_37_Article_IEq34.gif 1KB Image download
Fig. 1 130KB Image download
41116_2023_37_Article_IEq36.gif 1KB Image download
41116_2023_37_Article_IEq37.gif 1KB Image download
41116_2023_37_Article_IEq38.gif 1KB Image download
Fig. 2 2474KB Image download
41116_2023_37_Article_IEq39.gif 1KB Image download
Fig. 2 465KB Image download
41116_2023_37_Article_IEq40.gif 1KB Image download
MediaObjects/12888_2023_4902_MOESM1_ESM.docx 27KB Other download
41116_2023_37_Article_IEq42.gif 1KB Image download
41116_2023_37_Article_IEq43.gif 1KB Image download
41116_2023_37_Article_IEq44.gif 1KB Image download
41116_2023_37_Article_IEq45.gif 1KB Image download
41116_2023_37_Article_IEq46.gif 1KB Image download
Fig. 2 122KB Image download
41116_2023_37_Article_IEq48.gif 1KB Image download
41116_2023_37_Article_IEq49.gif 1KB Image download
41116_2023_37_Article_IEq50.gif 1KB Image download
Fig. 1 284KB Image download
41116_2023_37_Article_IEq52.gif 1KB Image download
41116_2023_37_Article_IEq53.gif 1KB Image download
41116_2023_37_Article_IEq54.gif 1KB Image download
41116_2023_37_Article_IEq55.gif 1KB Image download
41116_2023_37_Article_IEq56.gif 1KB Image download
41116_2023_37_Article_IEq57.gif 1KB Image download
41116_2023_37_Article_IEq58.gif 1KB Image download
41116_2023_37_Article_IEq59.gif 1KB Image download
41116_2023_37_Article_IEq60.gif 1KB Image download
41116_2023_37_Article_IEq61.gif 1KB Image download
41116_2023_37_Article_IEq62.gif 1KB Image download
41116_2023_37_Article_IEq63.gif 1KB Image download
41116_2023_37_Article_IEq64.gif 1KB Image download
41116_2023_37_Article_IEq65.gif 1KB Image download
Fig. 2 473KB Image download
Fig. 2 278KB Image download
41116_2023_37_Article_IEq69.gif 1KB Image download
Fig. 2 250KB Image download
MediaObjects/12888_2023_4905_MOESM1_ESM.docx 1745KB Other download
41116_2023_37_Article_IEq73.gif 1KB Image download
41116_2023_37_Article_IEq74.gif 1KB Image download
41116_2023_37_Article_IEq75.gif 1KB Image download
41116_2023_37_Article_IEq76.gif 1KB Image download
Fig. 3 432KB Image download
41116_2023_37_Article_IEq78.gif 1KB Image download
41116_2023_37_Article_IEq79.gif 1KB Image download
41116_2023_37_Article_IEq80.gif 1KB Image download
Fig. 4 251KB Image download
41116_2023_37_Article_IEq82.gif 1KB Image download
41116_2023_37_Article_IEq83.gif 1KB Image download
MediaObjects/12888_2023_4809_MOESM1_ESM.docx 27KB Other download
41116_2023_37_Article_IEq85.gif 1KB Image download
41116_2023_37_Article_IEq86.gif 1KB Image download
Fig. 1 173KB Image download
41116_2023_37_Article_IEq88.gif 1KB Image download
41116_2023_37_Article_IEq89.gif 1KB Image download
Fig. 2 212KB Image download
41116_2023_37_Article_IEq91.gif 1KB Image download
41116_2023_37_Article_IEq92.gif 1KB Image download
41116_2023_37_Article_IEq93.gif 1KB Image download
41116_2023_37_Article_IEq95.gif 1KB Image download
41116_2023_37_Article_IEq96.gif 1KB Image download
41116_2023_37_Article_IEq97.gif 1KB Image download
MediaObjects/12888_2023_4950_MOESM3_ESM.docx 25KB Other download
41116_2023_37_Article_IEq99.gif 1KB Image download
41116_2023_37_Article_IEq100.gif 1KB Image download
41116_2023_37_Article_IEq101.gif 1KB Image download
41116_2023_37_Article_IEq102.gif 1KB Image download
41116_2023_37_Article_IEq103.gif 1KB Image download
41116_2023_37_Article_IEq104.gif 1KB Image download
Fig. 3 574KB Image download
41116_2023_37_Article_IEq106.gif 1KB Image download
Fig. 1 69KB Image download
41116_2023_37_Article_IEq108.gif 1KB Image download
41116_2023_37_Article_IEq109.gif 1KB Image download
Fig. 2 56KB Image download
Fig. 4 169KB Image download
Fig. 3 123KB Image download
41116_2023_37_Article_IEq112.gif 1KB Image download
MediaObjects/12888_2023_4980_MOESM1_ESM.docx 19KB Other download
【 图 表 】

41116_2023_37_Article_IEq112.gif

Fig. 3

Fig. 4

Fig. 2

41116_2023_37_Article_IEq109.gif

41116_2023_37_Article_IEq108.gif

Fig. 1

41116_2023_37_Article_IEq106.gif

Fig. 3

41116_2023_37_Article_IEq104.gif

41116_2023_37_Article_IEq103.gif

41116_2023_37_Article_IEq102.gif

41116_2023_37_Article_IEq101.gif

41116_2023_37_Article_IEq100.gif

41116_2023_37_Article_IEq99.gif

41116_2023_37_Article_IEq97.gif

41116_2023_37_Article_IEq96.gif

41116_2023_37_Article_IEq95.gif

41116_2023_37_Article_IEq93.gif

41116_2023_37_Article_IEq92.gif

41116_2023_37_Article_IEq91.gif

Fig. 2

41116_2023_37_Article_IEq89.gif

41116_2023_37_Article_IEq88.gif

Fig. 1

41116_2023_37_Article_IEq86.gif

41116_2023_37_Article_IEq85.gif

41116_2023_37_Article_IEq83.gif

41116_2023_37_Article_IEq82.gif

Fig. 4

41116_2023_37_Article_IEq80.gif

41116_2023_37_Article_IEq79.gif

41116_2023_37_Article_IEq78.gif

Fig. 3

41116_2023_37_Article_IEq76.gif

41116_2023_37_Article_IEq75.gif

41116_2023_37_Article_IEq74.gif

41116_2023_37_Article_IEq73.gif

Fig. 2

41116_2023_37_Article_IEq69.gif

Fig. 2

Fig. 2

41116_2023_37_Article_IEq65.gif

41116_2023_37_Article_IEq64.gif

41116_2023_37_Article_IEq63.gif

41116_2023_37_Article_IEq62.gif

41116_2023_37_Article_IEq61.gif

41116_2023_37_Article_IEq60.gif

41116_2023_37_Article_IEq59.gif

41116_2023_37_Article_IEq58.gif

41116_2023_37_Article_IEq57.gif

41116_2023_37_Article_IEq56.gif

41116_2023_37_Article_IEq55.gif

41116_2023_37_Article_IEq54.gif

41116_2023_37_Article_IEq53.gif

41116_2023_37_Article_IEq52.gif

Fig. 1

41116_2023_37_Article_IEq50.gif

41116_2023_37_Article_IEq49.gif

41116_2023_37_Article_IEq48.gif

Fig. 2

41116_2023_37_Article_IEq46.gif

41116_2023_37_Article_IEq45.gif

41116_2023_37_Article_IEq44.gif

41116_2023_37_Article_IEq43.gif

41116_2023_37_Article_IEq42.gif

41116_2023_37_Article_IEq40.gif

Fig. 2

41116_2023_37_Article_IEq39.gif

Fig. 2

41116_2023_37_Article_IEq38.gif

41116_2023_37_Article_IEq37.gif

41116_2023_37_Article_IEq36.gif

Fig. 1

41116_2023_37_Article_IEq34.gif

41116_2023_37_Article_IEq33.gif

41116_2023_37_Article_IEq32.gif

41116_2023_37_Article_IEq31.gif

41116_2023_37_Article_IEq29.gif

41116_2023_37_Article_IEq28.gif

41116_2023_37_Article_IEq27.gif

41116_2023_37_Article_IEq26.gif

Fig. 1

Fig. 1

41116_2023_37_Article_IEq14.gif

41116_2023_37_Article_IEq13.gif

Fig. 1

41116_2023_37_Article_IEq12.gif

Fig. 2

41116_2023_37_Article_IEq10.gif

41116_2023_37_Article_IEq7.gif

41116_2023_37_Article_IEq6.gif

41116_2023_37_Article_IEq4.gif

41116_2023_37_Article_IEq3.gif

41116_2023_37_Article_IEq2.gif

41116_2023_37_Article_IEq1.gif

Fig. 1

Fig. 1

Fig. 7

Fig. 6

Fig. 2

Fig. 24

Fig. 7

Fig. 5

Fig. 16

12936_2023_4634_Article_IEq5.gif

Fig. 13

Fig. 8

Fig. 1

Fig. 9

Fig. 4

Fig. 2

Fig. 5

40517_2023_252_Article_IEq8.gif

Fig. 1

13690_2023_1130_Article_IEq25.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  文献评价指标  
  下载次数:11次 浏览次数:2次