期刊论文详细信息
BMC Bioinformatics
BLASTGrabber: a bioinformatic tool for visualization, analysis and sequence selection of massive BLAST data
Ralf Stefan Neumann2  Surendra Kumar1  Thomas Hendricus Augustus Haverkamp3  Kamran Shalchian-Tabrizi2 
[1] Current address: Department of Clinical Molecular Biology and Laboratory Science (EpiGen), Division of Medicine, Akershus University Hospital, 1478 Akershus, Norway
[2] Section for Genetics and Evolutionary Biology (EVOGENE) and Centre for Epigenetics, Development and Evolution (CEDE), University of Oslo, Oslo, Norway
[3] Centre of Ecological and Evolutionary synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
关键词: Visualization;    Text-mining;    Taxonomy;    High-throughput;    BLAST;    Analysis;   
Others  :  818623
DOI  :  10.1186/1471-2105-15-128
 received in 2014-01-07, accepted in 2014-03-31,  发布年份 2014
PDF
【 摘 要 】

Background

Advances in sequencing efficiency have vastly increased the sizes of biological sequence databases, including many thousands of genome-sequenced species. The BLAST algorithm remains the main search engine for retrieving sequence information, and must consequently handle data on an unprecedented scale. This has been possible due to high-performance computers and parallel processing. However, the raw BLAST output from contemporary searches involving thousands of queries becomes ill-suited for direct human processing. Few programs attempt to directly visualize and interpret BLAST output; those that do often provide a mere basic structuring of BLAST data.

Results

Here we present a bioinformatics application named BLASTGrabber suitable for high-throughput sequencing analysis. BLASTGrabber, being implemented as a Java application, is OS-independent and includes a user friendly graphical user interface. Text or XML-formatted BLAST output files can be directly imported, displayed and categorized based on BLAST statistics. Query names and FASTA headers can be analysed by text-mining. In addition to visualizing sequence alignments, BLAST data can be ordered as an interactive taxonomy tree. All modes of analysis support selection, export and storage of data. A Java interface-based plugin structure facilitates the addition of customized third party functionality.

Conclusion

The BLASTGrabber application introduces new ways of visualizing and analysing massive BLAST output data by integrating taxonomy identification, text mining capabilities and generic multi-dimensional rendering of BLAST hits. The program aims at a non-expert audience in terms of computer skills; the combination of new functionalities makes the program flexible and useful for a broad range of operations.

【 授权许可】

   
2014 Neumann et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140711124335289.pdf 3361KB PDF download
Figure 6. 157KB Image download
Figure 5. 96KB Image download
Figure 4. 128KB Image download
Figure 3. 115KB Image download
Figure 2. 144KB Image download
Figure 1. 63KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

【 参考文献 】
  • [1]Nilsson R, Larsson K, Ursing BM: Galaxie—CGI scripts for sequence identification through automated phylogenetic analysis. Bioinformatics 2004, 20:1447-1452.
  • [2]Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215:403-410.
  • [3]Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25:3389-3402.
  • [4]Kanehisa M, Bork P: Bioinformatics in the post-sequence era. Nat Genet 2003, 33:305-310.
  • [5]Koonin E: Sequence-Evolution-Function: Computational Approaches in Comparative Genomics. Dordrecht: Kluwer Academic Publishers; 2003.
  • [6]Dong Q, Brendel V: Computational Identification of Related Proteins. In The Proteomics Protocols Handbook. Edited by Walker JM. Totowa: Humana Press; 2005:555-570.
  • [7]She R, Shih-Chieh Chu J, Uyar B, Wang J, Wang K, Chen N: genBlastG: using BLAST searches to build homologous gene models. Bioinformatics 2011, 27:2141-2143.
  • [8]Dereeper A, Audic S, Claverie J-M, Blanc G: BLAST-EXPLORER helps you building datasets for phylogenetic analysis. BMC Evol Biol 2010, 10:1-6. BioMed Central Full Text
  • [9]Xing L, Brendel V: Multi-query sequence BLAST output examination with MuSeqBox. Bioinformatics 2001, 17:744-745.
  • [10]Pirooznia M, Perkins EJ, Deng Y: Batch Blast Extractor: an automated blastx parser application. BMC Genomics 2008, 9:1-5. BioMed Central Full Text
  • [11]Zhao S, Shao C, Goropashnaya AV, Stewart NC, Xu Y, Tøien Ø, Barnes BM, Fedorov VB, Yan J: Genomic analysis of expressed sequence tags in American black bear Ursus americanus. BMC Genomics 2010, 11:201. BioMed Central Full Text
  • [12]Koltes J, Hu Z, Fritz E: BEAP: The BLAST Extension and Alignment Program-a tool for contig construction and analysis of preliminary genome sequence. BMC Res Notes 2009, 2:11. BioMed Central Full Text
  • [13]Paquola ACM, Machado AA, Reis EM, da Silva AM, Verjovski-Almeida S: Zerg: a very fast BLAST parser library. Bioinformatics 2003, 19:1035-1036.
  • [14]Suyama M, Torrents D, Bork P: BLAST2GENE: a comprehensive conversion of BLAST output into independent genes and gene fragments. Bioinformatics 2004, 20:1968-1970.
  • [15]Wall DP, Fraser HB, Hirsh AE: Detecting putative orthologs. Bioinformatics 2003, 19:1710-1711.
  • [16]Zhao S, Burki F, Keeling P: Collodictyon – an ancient lineage in the tree of eukaryotes. Mol Biol Evol 2012, 29:1557-1568.
  • [17]Koski LB, Golding GB: The closest BLAST hit is often not the nearest neighbor. J Mol Evol 2001, 52:540-542.
  • [18]Shalchian-Tabrizi K, Minge MA, Espelund M, Orr R, Ruden T, Jakobsen KS, Cavalier-Smith T: Multigene phylogeny of choanozoa and the origin of animals. PLoS ONE 2008, 3:1-7.
  • [19]Huson DH, Auch AF, Qi J, Schuster SC: MEGAN analysis of metagenomic data. Genome Res 2007, 17:377-386.
  • [20]Procter JB, Thompson J, Letunic I, Creevey C, Jossinet F, Barton GJ: Visualization of multiple alignments, phylogenies and gene family evolution. Nat Publ Group 2010, 7:S16-S25.
  • [21]Mount DW: Bioinformatics: Sequence and Genome Analysis. New York: Cold Spring Harbor Laboratory Press; 2004.
  • [22]Weintraub B: Building BLAST for Coprocessor Accelerators Using Macah. PhD thesis. 2008. University of Washington, Computer Science And Engineering
  • [23]Decker JE, Pires JC, Conant GC, McKay SD, Heaton MP, Chen K, Cooper A, Vilkki J, Seabury CM, Caetano AR: Resolving the evolution of extant and extinct ruminants with high-throughput phylogenomics. Proc Natl Acad Sci 2009, 106:18644-18649.
  • [24]O’Donoghue SI, Gavin A-C, Gehlenborg N, Goodsell DS, Hériché J-K, Nielsen CB, North C, Olson AJ, Procter JB, Shattuck DW, Walter T, Wong B: Visualizing biological data — now and in the future. Nat Publ Group 2010, 7:S2-S4.
  • [25]Fayyad U, Piatetsky-Shapiro G, Smyth P: From data mining to knowledge discovery in databases. AI Mag 1996, 17:37-54.
  • [26]Darzentas N: Circoletto: visualizing sequence similarity with circos. Bioinformatics 2010, 26:2620-2621.
  • [27]Otto A, Bernhardt J, Meyer H, Schaffer M, Herbst F-A, Siebourg J, Mäder U, Lalk M, Hecker M, Becher D: Systems-wide temporal proteomic profiling in glucose-starved Bacillus subtilis. Nat Commun 2010, 1:1-9.
  • [28]Krzywinski M, Birol I, Jones SJ, Marra M: Hive plots--rational approach to visualizing networks. Brief Bioinform 2011, 13:1-18.
  • [29]Lagnel J, Tsigenopoulos CS, Iliopoulos I: NOBLAST and JAMBLAST: New Options for BLAST and a Java Application Manager for BLAST results. Bioinformatics 2009, 25:824-826.
  • [30]Nilsson RH, Kristiansson E, Ryberg M, Larsson KH: Approaching the taxonomic affiliation of unidentified sequences in public databases–an example from the mycorrhizal fungi. BMC Bioinforma 2005, 6:178. BioMed Central Full Text
  • [31]Pozhitkov A: Molecular taxonomy. Bioinformatics and practical evaluation. PhD thesis. 2003. Universitet Köln, Mathematisch-Naturwissenschaftliche Fakultät
  • [32]Coin L, Bateman A: Enhanced protein domain discovery using taxonomy. BMC Bioinforma 2004, 5:56. BioMed Central Full Text
  • [33]Yang JY, Chen X: Improving taxonomy-based protein fold recognition by using global and local features. Proteins Struct Funct Bioinformatics 2011, 79:2053-2064.
  文献评价指标  
  下载次数:199次 浏览次数:60次