| BMC Bioinformatics | |
| ILP-based maximum likelihood genome scaffolding | |
| Proceedings | |
| Ion Măndoiu1  James Lindsay1  Hamed Salooti2  Alex Zelikovsky2  | |
| [1] Computer Science and Engineering Department, University of Connecticut, 371 Fairfield Way, 06269-4155, Storrs, CT, USA;Department of Computer Science, Georgia State University, 34 Peachtree Street, 30303, Atlanta, GA, USA; | |
| 关键词: Integer Linear Program; Read Pair; Integer Linear Program Formulation; Metagenomic Sample; Maximum Likelihood Model; | |
| DOI : 10.1186/1471-2105-15-S9-S9 | |
| 来源: Springer | |
PDF
|
|
【 摘 要 】
BackgroundInterest in de novo genome assembly has been renewed in the past decade due to rapid advances in high-throughput sequencing (HTS) technologies which generate relatively short reads resulting in highly fragmented assemblies consisting of contigs. Additional long-range linkage information is typically used to orient, order, and link contigs into larger structures referred to as scaffolds. Due to library preparation artifacts and erroneous mapping of reads originating from repeats, scaffolding remains a challenging problem. In this paper, we provide a scalable scaffolding algorithm (SILP2) employing a maximum likelihood model capturing read mapping uncertainty and/or non-uniformity of contig coverage which is solved using integer linear programming. A Non-Serial Dynamic Programming (NSDP) paradigm is applied to render our algorithm useful in the processing of larger mammalian genomes. To compare scaffolding tools, we employ novel quantitative metrics in addition to the extant metrics in the field. We have also expanded the set of experiments to include scaffolding of low-complexity metagenomic samples.ResultsSILP2 achieves better scalability throughg a more efficient NSDP algorithm than previous release of SILP. The results show that SILP2 compares favorably to previous methods OPERA and MIP in both scalability and accuracy for scaffolding single genomes of up to human size, and significantly outperforms them on scaffolding low-complexity metagenomic samples.ConclusionsEquipped with NSDP, SILP2 is able to scaffold large mammalian genomes, resulting in the longest and most accurate scaffolds. The ILP formulation for the maximum likelihood model is shown to be flexible enough to handle metagenomic samples.
【 授权许可】
CC BY
© Lindsay et al.; licensee BioMed Central Ltd. 2014
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO202311098036955ZK.pdf | 1969KB |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]
- [37]
- [38]
- [39]
- [40]
PDF