| BMC Bioinformatics | |
| RegCloser: a robust regression approach to closing genome gaps | |
| Research | |
| Mengtian Li1  Shenghao Cao1  Lei M. Li1  | |
| [1] National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, 100190, Beijing, China;University of Chinese Academy of Sciences, 100049, Beijing, China; | |
| 关键词: Genome assembly; Closing gaps; Robust regression; Tandem repeat; | |
| DOI : 10.1186/s12859-023-05367-0 | |
| received in 2023-03-01, accepted in 2023-05-27, 发布年份 2023 | |
| 来源: Springer | |
PDF
|
|
【 摘 要 】
BackgroundClosing gaps in draft genomes leads to more complete and continuous genome assemblies. The ubiquitous genomic repeats are challenges to the existing gap-closing methods, based on either the k-mer representation by the de Bruijn graph or the overlap-layout-consensus paradigm. Besides, chimeric reads will cause erroneous k-mers in the former and false overlaps of reads in the latter.ResultsWe propose a novel local assembly approach to gap closing, called RegCloser. It represents read coordinates and their overlaps respectively by parameters and observations in a linear regression model. The optimal overlap is searched only in the restricted range consistent with insert sizes. Under this linear regression framework, the local DNA assembly becomes a robust parameter estimation problem. We solved the problem by a customized robust regression procedure that resists the influence of false overlaps by optimizing a convex global Huber loss function. The global optimum is obtained by iteratively solving the sparse system of linear equations. On both simulated and real datasets, RegCloser outperformed other popular methods in accurately resolving the copy number of tandem repeats, and achieved superior completeness and contiguity. Applying RegCloser to a plateau zokor draft genome that had been improved by long reads further increased contig N50 to 3-fold long. We also tested the robust regression approach on layout generation of long reads.ConclusionsRegCloser is a competitive gap-closing tool. The software is available at https://github.com/csh3/RegCloser. The robust regression approach has a prospect to be incorporated into the layout module of long read assemblers.
【 授权许可】
CC BY
© The Author(s) 2023
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO202309079176995ZK.pdf | 4120KB | ||
| 13690_2023_1130_Article_IEq25.gif | 1KB | Image | |
| Fig. 1 | 1112KB | Image | |
| 40517_2023_252_Article_IEq8.gif | 1KB | Image | |
| Fig. 5 | 584KB | Image | |
| MediaObjects/13046_2023_2710_MOESM8_ESM.pdf | 1032KB | ||
| Fig. 2 | 249KB | Image | |
| Fig. 4 | 93KB | Image | |
| MediaObjects/40360_2023_664_MOESM1_ESM.docx | 126KB | Other | |
| Fig. 9 | 152KB | Image | |
| Fig. 1 | 149KB | Image | |
| Fig. 8 | 348KB | Image | |
| MediaObjects/13046_2023_2710_MOESM13_ESM.pdf | 626KB | ||
| Fig. 13 | 799KB | Image | |
| MediaObjects/12888_2023_4879_MOESM1_ESM.doc | 416KB | Other | |
| MediaObjects/12902_2023_1381_MOESM1_ESM.docx | 16KB | Other | |
| 12936_2023_4634_Article_IEq5.gif | 1KB | Image | |
| Fig. 16 | 74KB | Image | |
| MediaObjects/42004_2023_909_MOESM1_ESM.pdf | 612KB | ||
| Fig. 5 | 1522KB | Image | |
| MediaObjects/12888_2023_4948_MOESM1_ESM.doc | 123KB | Other | |
| MediaObjects/13041_2023_1045_MOESM3_ESM.docx | 438KB | Other | |
| MediaObjects/12864_2023_9442_MOESM13_ESM.xlsx | 179KB | Other | |
| Fig. 7 | 783KB | Image | |
| Fig. 24 | 514KB | Image | |
| Fig. 2 | 1719KB | Image | |
| Fig. 6 | 479KB | Image | |
| MediaObjects/41408_2023_874_MOESM1_ESM.docx | 167KB | Other | |
| Fig. 7 | 320KB | Image | |
| Fig. 1 | 616KB | Image | |
| Fig. 1 | 181KB | Image | |
| MediaObjects/12888_2023_4917_MOESM1_ESM.docx | 859KB | Other | |
| 41116_2023_37_Article_IEq1.gif | 1KB | Image | |
| MediaObjects/12951_2023_1944_MOESM6_ESM.tif | 8002KB | Other | |
| 41116_2023_37_Article_IEq2.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq3.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq4.gif | 1KB | Image | |
| MediaObjects/12888_2023_4944_MOESM1_ESM.docx | 50KB | Other | |
| 41116_2023_37_Article_IEq6.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq7.gif | 1KB | Image | |
| MediaObjects/12888_2023_4917_MOESM2_ESM.docx | 1739KB | Other | |
| MediaObjects/12888_2023_4950_MOESM1_ESM.docx | 32KB | Other | |
| 41116_2023_37_Article_IEq10.gif | 1KB | Image | |
| Fig. 2 | 116KB | Image | |
| 41116_2023_37_Article_IEq12.gif | 1KB | Image | |
| Fig. 1 | 41KB | Image | |
| 41116_2023_37_Article_IEq13.gif | 1KB | Image | |
| MediaObjects/12888_2023_4936_MOESM1_ESM.docx | 20KB | Other | |
| 41116_2023_37_Article_IEq14.gif | 1KB | Image | |
| Fig. 1 | 185KB | Image | |
| Fig. 1 | 199KB | Image | |
| 41116_2023_37_Article_IEq26.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq27.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq28.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq29.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq31.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq32.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq33.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq34.gif | 1KB | Image | |
| Fig. 1 | 130KB | Image | |
| 41116_2023_37_Article_IEq36.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq37.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq38.gif | 1KB | Image | |
| Fig. 2 | 2474KB | Image | |
| 41116_2023_37_Article_IEq39.gif | 1KB | Image | |
| Fig. 2 | 465KB | Image | |
| 41116_2023_37_Article_IEq40.gif | 1KB | Image | |
| MediaObjects/12888_2023_4902_MOESM1_ESM.docx | 27KB | Other | |
| 41116_2023_37_Article_IEq42.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq43.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq44.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq45.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq46.gif | 1KB | Image | |
| Fig. 2 | 122KB | Image | |
| 41116_2023_37_Article_IEq48.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq49.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq50.gif | 1KB | Image | |
| Fig. 1 | 284KB | Image | |
| 41116_2023_37_Article_IEq52.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq53.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq54.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq55.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq56.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq57.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq58.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq59.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq60.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq61.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq62.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq63.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq64.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq65.gif | 1KB | Image | |
| Fig. 2 | 473KB | Image | |
| Fig. 2 | 278KB | Image | |
| 41116_2023_37_Article_IEq69.gif | 1KB | Image | |
| Fig. 2 | 250KB | Image | |
| MediaObjects/12888_2023_4905_MOESM1_ESM.docx | 1745KB | Other | |
| 41116_2023_37_Article_IEq73.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq74.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq75.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq76.gif | 1KB | Image | |
| Fig. 3 | 432KB | Image | |
| 41116_2023_37_Article_IEq78.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq79.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq80.gif | 1KB | Image | |
| Fig. 4 | 251KB | Image | |
| 41116_2023_37_Article_IEq82.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq83.gif | 1KB | Image | |
| MediaObjects/12888_2023_4809_MOESM1_ESM.docx | 27KB | Other | |
| 41116_2023_37_Article_IEq85.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq86.gif | 1KB | Image | |
| Fig. 1 | 173KB | Image | |
| 41116_2023_37_Article_IEq88.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq89.gif | 1KB | Image | |
| Fig. 2 | 212KB | Image | |
| 41116_2023_37_Article_IEq91.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq92.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq93.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq95.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq96.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq97.gif | 1KB | Image | |
| MediaObjects/12888_2023_4950_MOESM3_ESM.docx | 25KB | Other | |
| 41116_2023_37_Article_IEq99.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq100.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq101.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq102.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq103.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq104.gif | 1KB | Image | |
| Fig. 3 | 574KB | Image | |
| 41116_2023_37_Article_IEq106.gif | 1KB | Image | |
| Fig. 1 | 69KB | Image | |
| 41116_2023_37_Article_IEq108.gif | 1KB | Image | |
| 41116_2023_37_Article_IEq109.gif | 1KB | Image | |
| Fig. 2 | 56KB | Image | |
| Fig. 4 | 169KB | Image | |
| Fig. 3 | 123KB | Image | |
| 41116_2023_37_Article_IEq112.gif | 1KB | Image | |
| MediaObjects/12888_2023_4980_MOESM1_ESM.docx | 19KB | Other |
【 图 表 】
41116_2023_37_Article_IEq112.gif
Fig. 3
Fig. 4
Fig. 2
41116_2023_37_Article_IEq109.gif
41116_2023_37_Article_IEq108.gif
Fig. 1
41116_2023_37_Article_IEq106.gif
Fig. 3
41116_2023_37_Article_IEq104.gif
41116_2023_37_Article_IEq103.gif
41116_2023_37_Article_IEq102.gif
41116_2023_37_Article_IEq101.gif
41116_2023_37_Article_IEq100.gif
41116_2023_37_Article_IEq99.gif
41116_2023_37_Article_IEq97.gif
41116_2023_37_Article_IEq96.gif
41116_2023_37_Article_IEq95.gif
41116_2023_37_Article_IEq93.gif
41116_2023_37_Article_IEq92.gif
41116_2023_37_Article_IEq91.gif
Fig. 2
41116_2023_37_Article_IEq89.gif
41116_2023_37_Article_IEq88.gif
Fig. 1
41116_2023_37_Article_IEq86.gif
41116_2023_37_Article_IEq85.gif
41116_2023_37_Article_IEq83.gif
41116_2023_37_Article_IEq82.gif
Fig. 4
41116_2023_37_Article_IEq80.gif
41116_2023_37_Article_IEq79.gif
41116_2023_37_Article_IEq78.gif
Fig. 3
41116_2023_37_Article_IEq76.gif
41116_2023_37_Article_IEq75.gif
41116_2023_37_Article_IEq74.gif
41116_2023_37_Article_IEq73.gif
Fig. 2
41116_2023_37_Article_IEq69.gif
Fig. 2
Fig. 2
41116_2023_37_Article_IEq65.gif
41116_2023_37_Article_IEq64.gif
41116_2023_37_Article_IEq63.gif
41116_2023_37_Article_IEq62.gif
41116_2023_37_Article_IEq61.gif
41116_2023_37_Article_IEq60.gif
41116_2023_37_Article_IEq59.gif
41116_2023_37_Article_IEq58.gif
41116_2023_37_Article_IEq57.gif
41116_2023_37_Article_IEq56.gif
41116_2023_37_Article_IEq55.gif
41116_2023_37_Article_IEq54.gif
41116_2023_37_Article_IEq53.gif
41116_2023_37_Article_IEq52.gif
Fig. 1
41116_2023_37_Article_IEq50.gif
41116_2023_37_Article_IEq49.gif
41116_2023_37_Article_IEq48.gif
Fig. 2
41116_2023_37_Article_IEq46.gif
41116_2023_37_Article_IEq45.gif
41116_2023_37_Article_IEq44.gif
41116_2023_37_Article_IEq43.gif
41116_2023_37_Article_IEq42.gif
41116_2023_37_Article_IEq40.gif
Fig. 2
41116_2023_37_Article_IEq39.gif
Fig. 2
41116_2023_37_Article_IEq38.gif
41116_2023_37_Article_IEq37.gif
41116_2023_37_Article_IEq36.gif
Fig. 1
41116_2023_37_Article_IEq34.gif
41116_2023_37_Article_IEq33.gif
41116_2023_37_Article_IEq32.gif
41116_2023_37_Article_IEq31.gif
41116_2023_37_Article_IEq29.gif
41116_2023_37_Article_IEq28.gif
41116_2023_37_Article_IEq27.gif
41116_2023_37_Article_IEq26.gif
Fig. 1
Fig. 1
41116_2023_37_Article_IEq14.gif
41116_2023_37_Article_IEq13.gif
Fig. 1
41116_2023_37_Article_IEq12.gif
Fig. 2
41116_2023_37_Article_IEq10.gif
41116_2023_37_Article_IEq7.gif
41116_2023_37_Article_IEq6.gif
41116_2023_37_Article_IEq4.gif
41116_2023_37_Article_IEq3.gif
41116_2023_37_Article_IEq2.gif
41116_2023_37_Article_IEq1.gif
Fig. 1
Fig. 1
Fig. 7
Fig. 6
Fig. 2
Fig. 24
Fig. 7
Fig. 5
Fig. 16
12936_2023_4634_Article_IEq5.gif
Fig. 13
Fig. 8
Fig. 1
Fig. 9
Fig. 4
Fig. 2
Fig. 5
40517_2023_252_Article_IEq8.gif
Fig. 1
13690_2023_1130_Article_IEq25.gif
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
PDF