期刊论文详细信息
BMC Bioinformatics
SIS: a program to generate draft genome sequence scaffolds for prokaryotes
João C Setubal1  Ulisses Dias2  Zanoni Dias2 
[1]Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA
[2]Instituto de Computação, Universidade Estadual de Campinas, Campinas, SP, Brazil
关键词: Prokaryotes;    Inversion;    Scaffold;    Contig order;    Genome assembly;   
Others  :  1088274
DOI  :  10.1186/1471-2105-13-96
 received in 2011-12-19, accepted in 2012-03-19,  发布年份 2012
PDF
【 摘 要 】

Background

Decreasing costs of DNA sequencing have made prokaryotic draft genome sequences increasingly common. A contig scaffold is an ordering of contigs in the correct orientation. A scaffold can help genome comparisons and guide gap closure efforts. One popular technique for obtaining contig scaffolds is to map contigs onto a reference genome. However, rearrangements that may exist between the query and reference genomes may result in incorrect scaffolds, if these rearrangements are not taken into account. Large-scale inversions are common rearrangement events in prokaryotic genomes. Even in draft genomes it is possible to detect the presence of inversions given sufficient sequencing coverage and a sufficiently close reference genome.

Results

We present a linear-time algorithm that can generate a set of contig scaffolds for a draft genome sequence represented in contigs given a reference genome. The algorithm is aimed at prokaryotic genomes and relies on the presence of matching sequence patterns between the query and reference genomes that can be interpreted as the result of large-scale inversions; we call these patterns inversion signatures. Our algorithm is capable of correctly generating a scaffold if at least one member of every inversion signature pair is present in contigs and no inversion signatures have been overwritten in evolution. The algorithm is also capable of generating scaffolds in the presence of any kind of inversion, even though in this general case there is no guarantee that all scaffolds in the scaffold set will be correct. We compare the performance of SIS, the program that implements the algorithm, to seven other scaffold-generating programs. The results of our tests show that SIS has overall better performance.

Conclusions

SIS is a new easy-to-use tool to generate contig scaffolds, available both as stand-alone and as a web server. The good performance of SIS in our tests adds evidence that large-scale inversions are widespread in prokaryotic genomes.

【 授权许可】

   
2012 Dias et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117092523262.pdf 853KB PDF download
Figure 9. 51KB Image download
Figure 8. 57KB Image download
Figure 7. 184KB Image download
Figure 6. 29KB Image download
Figure 5. 57KB Image download
Figure 4. 45KB Image download
Figure 3. 35KB Image download
Figure 2. 35KB Image download
Figure 1. 46KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

【 参考文献 】
  • [1]Gao S, Sung WK, Nagarajan N: Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. J Comput Biol 2011, 18(11):1681-1691.
  • [2]Warren RL, Varabei D, Platt D, Huang X, et al.: Physical map-assisted whole-genome shotgun sequence assemblies. Genome Res 2006, 16:768-775.
  • [3]Nagarajan N, Read TD, Pop M: Scaffolding and validation of bacterial genome assemblies using optical restriction maps. Bioinformatics 2008, 24:1229-1235.
  • [4]Valouev A, Zhang Y, Schwartz DC, Waterman MS: Refinement of optical map assemblies. Bioinformatics 2006, 22:1217-1224.
  • [5]Assefa S, Keane TM, Otto TD, Newbold C, Berriman M: ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics 2009, 25:1968-1969.
  • [6]Munoz A, Zheng C, Zhu Q, Albert VA, Rounsley S, Sankoff D: Scaffold filling, contig fusion and comparative gene order inference. BMC Bioinf 2010, 11:304. BioMed Central Full Text
  • [7]Rissman AI, Mau B, Biehl BS, Darling AE, Glasner JD, Perna NT: Reordering contigs of draft genomes using the Mauve aligner. Bioinformatics 2009, 25:2071-2073.
  • [8]Richter DC, Schuster SC, Huson DH: OSLay: optimal syntenic layout of unfinished assemblies. Bioinformatics 2007, 23:1573-1579.
  • [9]van Hijum, Zomer AL, Kuipers OP, Kok J: Projector 2: contig mapping for efficient gap-closure of prokaryotic genome sequence assemblies. Nucleic Acids Res 2005, 33:W560-566.
  • [10]Husemann P, Stoye J: r2cat: synteny plots and comparative assembly. Bioinformatics 2010, 26:570-571.
  • [11]Zhao F, Hou H, Bao Q, Wu J: PGA4genomics for comparative genome assembly based on genetic algorithm optimization. Genomics 2009, 94:284-286.
  • [12]Galardini M, Biondi EG, Bazzicalupo M, Mengoni A: CONTIGuator: a bacterial genomes finishing tool for structural insights on draft genomes. Source Code Biol Med 2011, 6(11):.
  • [13]Darling AE, Miklós I, Ragan MA: Dynamics of genome rearrangement in bacterial populations. PLoS Genet 2008, 4(7):e1000128.
  • [14]Eisen JA, Heidelberg JF, White O, Salzberg SL: Evidence for symmetric chromosomal inversions around the replication origin in bacteria. Genome Biol 2000, 1(6):research0011.1-0011.9. BioMed Central Full Text
  • [15]Darling AE, Mau B, Blattner FR, Perna NT: Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res 2004, 14:1394-1403.
  • [16]Darling AE, Mau B, Blattner FR, Perna NT: GRIL: genome rearrangement and inversion locator. Bioinformatics 2004, 20:122-124.
  • [17]Swenson KM, Moret BM: Inversion-based genomic signatures. BMC Bioinformatics 2009, 10 Suppl 1:S7.
  • [18]Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL: Versatile and open software for comparing large genomes. Genome Biol 2004, 5(2):R12. BioMed Central Full Text
  • [19]Kent WJ: BLAT–the BLAST-like alignment tool. Genome Res 2002, 12:656-664.
  • [20]Altschul S, Madden T, Schäffer A, Zhang J, Zhang Z, Miller W, Lipman D: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389-3402.
  • [21]Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL: BLAST+: architecture and applications. BMC Bioinf 2009, 10:421. BioMed Central Full Text
  • [22]Dias U, Dias Z, Setubal JC: Two new whole-genome distance measures. In Proceedings of the 6th Brazilian Symposium on Bioinformatics (BSB’2011). , ; 2011:61-64.
  • [23]Deloger M, El Karoui, Petit MA: A genomic distance based on MUM indicates discontinuity between most bacterial species and genera. J Bacteriol 2009, 191:91-99.
  • [24]Dias Z, Dias U, Setubal JC: Using Inversion Signatures to Generate Draft Genome Sequence Scaffolds. In Proceedings of the 2nd ACM International Conference on Bioinformatics, Computational Biology and Biomedicine (ACM BCB 2011). , ; 2011:39-48.
  • [25]Rozen S, Skaletsky H: Primer3 on the WWW for general users and for biologist programmers. Methods in Molecular Biology 2000, 132:365-386.
  文献评价指标  
  下载次数:87次 浏览次数:20次