GigaScience | |
NCBI BLAST+ integrated into Galaxy | |
Nicola Soranzo2  James E. Johnson1  Björn Grüning4  John M. Chilton1  Peter J. A. Cock3  | |
[1] Minnesota Supercomputing Institute, University of Minnesota, 599 Walter Library, 117 Pleasant St. SE, Minneapolis, 55455, MN, USA;CRS4, Loc. Piscina Manna, Pula, 09010, CA, Italy;Information and Computational Sciences, James Hutton Institute, Invergowrie, Dundee DD2 5DA, Scotland, UK;Department of Computer Science, Albert-Ludwigs-University of Freiburg, Georges-Köhler-Allee 106, Freiburg 79110, Germany | |
关键词: Sequence analysis; Annotation; Reproducibility; Workflow; Accessibility; Pipeline; BLAST; Galaxy; | |
Others : 1223626 DOI : 10.1186/s13742-015-0080-7 |
|
received in 2014-12-31, accepted in 2015-08-18, 发布年份 2015 | |
![]() |
【 摘 要 】
Background
The NCBI BLAST suite has become ubiquitous in modern molecular biology and is used for small tasks such as checking capillary sequencing results of single PCR products, genome annotation or even larger scale pan-genome analyses. For early adopters of the Galaxy web-based biomedical data analysis platform, integrating BLAST into Galaxy was a natural step for sequence comparison workflows.
Findings
The command line NCBI BLAST+ tool suite was wrapped for use within Galaxy. Appropriate datatypes were defined as needed. The integration of the BLAST+ tool suite into Galaxy has the goal of making common BLAST tasks easy and advanced tasks possible.
Conclusions
This project is an informal international collaborative effort, and is deployed and used on Galaxy servers worldwide. Several examples of applications are described here.
【 授权许可】
2015 Cock et al.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150903042903294.pdf | 793KB | ![]() |
|
20150130180421736.pdf | 248KB | ![]() |
【 参考文献 】
- [1]Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990; 215(3):403-10.
- [2]Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C et al.. The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 2002; 12(10):1611-8.
- [3]Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A et al.. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009; 25(11):1422-3.
- [4]Holland RCG, Down TA, Pocock M, Prlić A, Huen D, James K et al.. BioJava: an open-source framework for bioinformatics. Bioinformatics. 2008; 24(18):2096-7.
- [5]Goto N, Prins P, Nakao M, Bonnal R, Aerts J, Katayama T. BioRuby: bioinformatics software for the Ruby programming language. Bioinformatics. 2010; 26(20):2617-9.
- [6]Goecks J, Nekrutenko A, Taylor J. The Galaxy Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010; 11(8):R86. BioMed Central Full Text
- [7]Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M, et al. Galaxy: A Web-Based Genome Analysis Tool for Experimentalists. Curr Protoc Mol Biol. 2010;19:{19.10.1–19.10.21}. doi:10.1002/0471142727.mb1910s89.
- [8]Goble CA, Bhagat J, Aleksejevs S, Cruickshank D, Michaelides D, Newman D et al.. myExperiment: a repository and social network for the sharing of bioinformatics workflows. Nucleic Acids Res. 2010; 38(suppl 2):W677-82.
- [9]Blankenberg D, Von Kuster G, Bouvier E, Baker D, Afgan E, Stoler N et al.. Dissemination of scientific software with Galaxy ToolShed. Genome Biol. 2014; 15(2):403. BioMed Central Full Text
- [10]Cock PJA, Grüning BA, Paszkiewicz K, Pritchard L. Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology. Peer J. 2013; 1:e167.
- [11]Ramírez F, Dündar F, Diehl S, Grüning BA, Manke T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 2014; 42(W1):W187-91.
- [12]Aranguren ME, Breis JTF, Antezana E, Mungall C, Gonzalez AR, Wilkinson M. OPPL-Galaxy, a Galaxy tool for enhancing ontology exploitation as part of bioinformatics workflows. J Biomed Semantics. 2013; 4(1):2. BioMed Central Full Text
- [13]Cuccuru G, Orsini M, Pinna A, Sbardellati A, Soranzo N, Travaglione A et al.. Orione, a web-based framework for NGS analysis in microbiology. Bioinformatics. 2014; 30(13):1928-9.
- [14]Afgan E, Baker D, Coraor N, Chapman B, Nekrutenko A, Taylor J. Galaxy CloudMan: delivering cloud compute clusters. BMC Bioinformatics. 2010; 11 Suppl 12:S4. BioMed Central Full Text
- [15]Grau J, Boch J, Posch S. TALENoffer: genome-wide TALEN off-target prediction. Bioinformatics. 2013; 29(22):2931-2.
- [16]Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K et al.. BLAST+: architecture and applications. BMC Bioinformatics. 2009; 10:421-9. BioMed Central Full Text
- [17]Keller O, Kollmar M, Stanke M, Waack S. A novel hybrid gene prediction method employing protein multiple sequence alignments. Bioinformatics. 2011; 27(6):757-63.
- [18]Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007; 23(6):673-9.
- [19]Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014; 30(14):2068-9.
- [20]Kumar S, Jones M, Koutsovoulos G, Clarke M, Blaxter M. Blobology: exploring raw genome data for contaminants, symbionts and parasites using taxon-annotated GC-coverage plots. Front Genet. 2013; 4:237.
- [21]Galaxy Tool Shed Repository “clc_assembly_cell”:. https://toolshed. g2.bx.psu.edu/view/peterjc/clc_assembly_cell/ webcite
- [22]Galaxy Tool Shed Repository “blast_top_hit_species”:. https://toolshed. g2.bx.psu.edu/view/peterjc/blast_top_hit_species/ webcite
- [23]myExperiment Species of top BLAST hits:. http://www. myexperiment.org/workflows/4637.html webcite
- [24]Yong E. There’s No Plague on the NYC Subway. No Platypuses Either. National Geographic Magazine, Phenomena: Not Exactly Rocket Science; 2015:. http://phenomena. nationalgeographic.com/2015/02/10/theres-no-plague-on-the-nyc-subway-no-platypuses-either/ webcite
- [25]Galaxy Tool Shed Repository “Filter sequences by ID”:. https://toolshed. g2.bx.psu.edu/view/peterjc/seq_filter_by_id/ webcite
- [26]Fischbach M, Voigt CA. Prokaryotic gene clusters: A rich toolbox for synthetic biology. Biotechnol J. 2010; 5(12):1277-96.
- [27]Präg A, Grüning BA, Häckh M, Lüdeke S, Wilde M, Luzhetskyy A et al.. Regio- and stereoselective intermolecular oxidative phenol coupling in Streptomyces. J Am Chem Soc. 2014; 136(17):6195-8.
- [28]Galaxy Tool Shed Repository “find_genes_located_nearby_workflow”:. https://toolshed. g2.bx.psu.edu/view/bgruening/find_genes_located_nearby_workflow/ webcite
- [29]myExperiment Galaxy workflow for the identification of candidate genes clusters:. http://www. myexperiment.org/workflows/4584.html webcite
- [30]Jagtap PD, Johnson JE, Onsongo G, Sadler FW, Murray K, Wang Y et al.. Flexible and accessible workflows for improved proteogenomic analysis using the Galaxy framework. J Proteome Res. 2014; 13(12):5898-908.
- [31]Travis CI Galaxy code and wrappers:. https://travis-ci. org/peterjc/galaxy_blast webcite
- [32]Cock PJA, Fields CJ, Goto N, Heuer ML, Rice PM. The Sanger FASTQ File Format for Sequences with Quality Scores, and the Solexa/Illumina FASTQ Variants. Nucleic Acids Res. 2010; 38(6):1767-71.
- [33]Grüning B, Cock PJA. docker-galaxy-blast: The NCBI Blast + 2.2.29 release. 2015.
- [34]NCBI BLAST databases:. ftp://ftp. ncbi.nlm.nih.gov/blast/db/ webcite
- [35]Blankenberg D, Johnson JE, Taylor J, Nekrutenko A. Wrangling Galaxy’s reference data. Bioinformatics. 2014; 30(13):1917-9.
- [36]Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W et al.. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997; 25(17):3389-409.
- [37]Schäffer AA, Aravind L, Madden TL, Shavirin S, Spouge JL, Wolf YI et al.. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res. 2001; 29(14):2994-3005.
- [38]Marchler-Bauer A, Bryant SH. CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 2004; 32(suppl 2):W327-31.
- [39]Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C et al.. CDD: a Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Res. 2011; 39(suppl 1):D225-9.
- [40]Goecks J, Eberhard C, Too T, Nekrutenko A, Taylor J. Web-based visual analysis for high-throughput genomics. BMC Genomics. 2013; 14:397. BioMed Central Full Text
- [41]Cock PJA, Chilton JM, Grüning B, Johnson JE, Soranzo N. Supporting data and materials for “NCBI BLAST+ integrated into Galaxy”. GigaScience Database. 2015. http://dx. doi.org/10.5524/100149 webcite