期刊论文详细信息
BMC Research Notes
FGAP: an automated gap closing tool
Roberto T Raittz1  Emanuel M Souza2  Fabio O Pedrosa2  Maria BR Steffens2  Vinicius A Weiss2  Helisson Faoro2  Vitor C Piro1 
[1] Laboratory of Bioinformatics, Professional and Technological Education Sector, Federal University of Paraná, Curitiba, PR, Brazil, Rua Dr. Alcides Vieira Arcoverde 1225, Curitiba, Paraná, Brazil;Department of Biochemistry and Molecular Biology, Federal University of Paraná, Curitiba, PR, Brazil, Av. Cel. Francisco H. dos Santos, Curitiba, Paraná, Brazil
关键词: Gap closure;    Gap filling;    Genome finishing;   
Others  :  1132468
DOI  :  10.1186/1756-0500-7-371
 received in 2014-02-27, accepted in 2014-06-09,  发布年份 2014
PDF
【 摘 要 】

Background

The fast reduction of prices of DNA sequencing allowed rapid accumulation of genome data. However, the process of obtaining complete genome sequences is still very time consuming and labor demanding. In addition, data produced from various sequencing technologies or alternative assemblies remain underexplored to improve assembly of incomplete genome sequences.

Findings

We have developed FGAP, a tool for closing gaps of draft genome sequences that takes advantage of different datasets. FGAP uses BLAST to align multiple contigs against a draft genome assembly aiming to find sequences that overlap gaps. The algorithm selects the best sequence to fill and eliminate the gap.

Conclusions

FGAP reduced the number of gaps by 78% in an E. coli draft genome assembly using two different sequencing technologies, Illumina and 454. Using PacBio long reads, 98% of gaps were solved. In human chromosome 14 assemblies, FGAP reduced the number of gaps by 35%. All the inserted sequences were validated with a reference genome using QUAST. The source code and a web tool are available at http://www.bioinfo.ufpr.br/fgap/ webcite.

【 授权许可】

   
2014 Piro et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150303211707321.pdf 308KB PDF download
Figure 1. 26KB Image download
【 图 表 】

Figure 1.

【 参考文献 】
  • [1]Pagani I, Liolios K, Jansson J, Chen I-MA, Smirnova T, Nosrat B, Markowitz VM, Kyrpides NC: The Genomes OnLine Database (GOLD) v.4 . Nucleic Acids Res 2012, 40(Database issue):571-579.
  • [2]Koren S, Schatz MC, Walenz BP, Martin J, Howard JT, Ganapathy G, Wang Z, Rasko DA, McCombie WR, Jarvis ED, Phillippy AM: Hybrid error correction and de novo assembly of single-molecule sequencing reads . Nat Biotechnol 2012, 30(7):693-700.
  • [3]Bashir A, Klammer AA, Robins WP, Chin C-S, Webster D, Paxinos E, Hsu D, Ashby M, Wang S, Peluso P, Sebra R, Sorenson J, Bullard J, Yen J, Valdovino M, Mollova E, Luong K, Lin S, Lamay B, Joshi A, Rowe L, Frace M, Tarr CL, Turnsek M, Davis BM, Kasarskis A, Mekalanos JJ, Waldor MK, Schadt EE: A hybrid approach for the automated finishing of bacterial genomes . Nat Biotechnol 2012, 30(7):701-707.
  • [4]Salzberg SL, Phillippy AM, Zimin A, Puiu D, Magoc T, Koren S, Treangen TJ, Schatz MC, Delcher AL, Roberts M, Marçais G, Pop M, Yorke JA: GAGE: A critical evaluation of genome assemblies and assembly algorithms . Genome Res 2012, 22(3):557-567.
  • [5]Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y, Tang J, Wu G, Zhang H, Shi Y, Liu Y, Yu C, Wang B, Lu Y, Han C, Cheung DW, Yiu S-M, Peng S, Xiaoqian Z, Liu G, Liao X, Li Y, Yang H, Wang J, Lam T-W, Wang J: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler . GigaScience 2012, 1(1):18. BioMed Central Full Text
  • [6]Boetzer M, Pirovano W: Toward almost closed genomes with GapFiller . Genome Biol 2012, 13(6):56. BioMed Central Full Text
  • [7]Nadalin F, Vezzi F, Policriti A: GapFiller: a de novo assembly approach to fill the gap within paired reads . BMC Bioinformatics 2012, 13 Suppl 1(Suppl 14):8.
  • [8]Tsai IJ, Otto TD, Berriman M: Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps . Genome Biol 2010, 11(4):41. BioMed Central Full Text
  • [9]Gao S, Bertrand D, Nagarajan N: FinIS: Improved in silico finishing using an exact quadratic programming formulation . Lecture Notes Comput Sci 2012, 7534:314-325.
  • [10]Yang X, Medvin D, Narasimhan G, Yoder-Himes D, Lory S: CloG: A pipeline for closing gaps in a draft assembly using short reads . In 2011 IEEE 1st International Conference on Computational Advances in Bio and Medical Sciences (ICCABS). Washington, DC, USA: IEEE Computer Societ; 2011:202-207.
  • [11]Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs . Nucleic Acids Res 1997, 25(17):3389-3402.
  • [12]Gnerre S, Maccallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker BJ, Sharpe T, Hall G, Shea TP, Sykes S, Berlin AM, Aird D, Costello M, Daza R, Williams L, Nicol R, Gnirke A, Nusbaum C, Lander ES, Jaffe DB: High-quality draft assemblies of mammalian genomes from massively parallel sequence data . Proc Nat Acad Sci USA 2011, 108(4):1513-1518.
  • [13]Miller JR, Delcher AL, Koren S, Venter E, Walenz BP, Brownley A, Johnson J, Li K, Mobarry C, Sutton G: Aggressive assembly of pyrosequencing reads with mates . Bioinformatics (Oxford, England) 2008, 24(24):2818-2824.
  • [14]Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL: Versatile and open software for comparing large genomes . Genome Biol 2004, 5(2):12. BioMed Central Full Text
  • [15]Piro VC: FGAP an automated gap closing tool . [http://www.bioinfo.ufpr.br/fgap webcite]
  • [16]Gurevich A, Saveliev V, Vyahhi N, Tesler G: QUAST: Quality assessment tool for genome assemblies . Bioinformatics (Oxford, England) 2013, 29(8):1072-1075.
  文献评价指标  
  下载次数:3次 浏览次数:9次