期刊论文详细信息
BMC Research Notes
STINGRAY: system for integrated genomic resources and analysis
Alberto MR Dávila5  Marta Mattoso4  Maria LM Campos1  Maria C Cavalcanti2  Edmundo C Grisard9  André N Pitaluga3  Christian M Probst6  Vanessa E Emmel8  Antonio CB Ribeiro3  Kary ACS Ocaña5  Daniel R Loureiro5  Diogo A Tschoeke5  Rodrigo Jardim5  Glauber Wagner7 
[1] Instituto de Matemática, Departamento de Ciência da Computação, Universidade Federal do Rio de Janeiro, Bloco C, CCMN, Sala E-2206, Ilha do Fundão, 21945-970 Rio de Janeiro, Rio de Janeiro, Brazil;Instituto Militar de Engenharia (IME), Seção de Engenharia de Computação (SE-8), Praça General Tibúrcio 80, Praia Vermelha, Urca, 22290-270 Rio de Janeiro, Rio de Janeiro, Brazil;Laboratório de Biologia Molecular de Parasitas e Vetores, Instituto Oswaldo Cruz (IOC), Fundação Oswaldo Cruz (FIOCRUZ), Avenida Brasil 4365, 21040-360 Rio de Janeiro, Rio de Janeiro, Brazil;Instituto Alberto Luiz Coimbra de Pós-graduação e Pesquisa de Engenharia (COPPE), Universidade Federal do Rio de Janeiro, P.O. Box 68511, Ilha do Fundão, 21941-972 Rio de Janeiro, Rio de Janeiro, Brazil;Pólo de Biologia Computacional e Sistemas, Instituto Oswaldo Cruz (IOC), Fundação Oswaldo Cruz (FIOCRUZ), Avenida Brasil 4365, 21040-360 Rio de Janeiro, Rio de Janeiro, Brazil;Laboratório de Bioinformática, Instituto Carlos Chagas (ICC), Fundação Oswaldo Cruz (FIOCRUZ), Avenida Algacyr Munhoz Mader, 3775, Cidade Industrial, 81350-010 Curitiba, Paraná, Brazil;Laboratório de Doenças Infecciosas e Parasitárias (LDIP), Área de Ciências Biológicas e da Saúde (ACBS), Universidade do Oeste de Santa Catarina (Unoesc), Rua Getúlio Vargas 2125, Flor da Serra, 89600-000 Joaçaba, Santa Catarina, Brazil;Laboratório de Genética Molecular de Microrganismos, Instituto Oswaldo Cruz (IOC), Fundação Oswaldo Cruz (FIOCRUZ), Avenida Brasil 4365, 21040-360 Rio de Janeiro, Rio de Janeiro, Brazil;Laboratório de Protozoologia, Departamento de Microbiologia, Imunologia e Parasitologia (MIP), Centro de Ciências Biológicas (CCB), Universidade Federal de Santa Catarina (UFSC), Campus Universitário, Setor F, Bloco A, Trindade, 88040-970, Caixa Postal 476, Florianópolis, Santa Catarina, Brazil
关键词: Data integration;    Sanger;    Next generation sequencing;    Workflow;    Annotation;    Genome;   
Others  :  1134354
DOI  :  10.1186/1756-0500-7-132
 received in 2013-10-16, accepted in 2014-02-28,  发布年份 2014
PDF
【 摘 要 】

Background

The STINGRAY system has been conceived to ease the tasks of integrating, analyzing, annotating and presenting genomic and expression data from Sanger and Next Generation Sequencing (NGS) platforms.

Findings

STINGRAY includes: (a) a complete and integrated workflow (more than 20 bioinformatics tools) ranging from functional annotation to phylogeny; (b) a MySQL database schema, suitable for data integration and user access control; and (c) a user-friendly graphical web-based interface that makes the system intuitive, facilitating the tasks of data analysis and annotation.

Conclusion

STINGRAY showed to be an easy to use and complete system for analyzing sequencing data. While both Sanger and NGS platforms are supported, the system could be faster using Sanger data, since the large NGS datasets could potentially slow down the MySQL database usage. STINGRAY is available athttp://stingray.biowebdb.org webcite and the open source code athttp://sourceforge.net/projects/stingray-biowebdb/ webcite.

【 授权许可】

   
2014 Wagner et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150305173312445.pdf 1007KB PDF download
Figure 2. 120KB Image download
Figure 1. 115KB Image download
【 图 表 】

Figure 1.

Figure 2.

【 参考文献 】
  • [1]Xu H, He L, Zhu Y, Huang W, Fang L, Tao L, Zhu Y, Cai L, Xu H, Zhang L, Xu H, Zhou Y: EST pipeline system: detailed and automated EST data processing and mining. Genomics Proteomics Bioinformatics 2003, 1:236-242.
  • [2]Almeida LGP, Paixão R, Souza RC, da Costa GC, Barrientos FJA, dos Santos MT, de Almeida DF, Vasconcelos ATR: A System for Automated Bacterial (genome) Integrated Annotation-SABIA. Bioinformatics 2004, 20:2832-2833.
  • [3]Dávila AMR, Lorenzini DM, Mendes PN, Satake TS, Sousa GR, Campos LM, Mazzoni CJ, Wagner G, Pires PF, Grisard EC, Cavalcanti MCR, Campos MLM: GARSA: genomic analysis resources for sequence annotation. Bioinformatics 2005, 21:4302-4303.
  • [4]Fujita A, Massirer KB, Durham AM, Ferreira CE, Sogayar MC: The GATO gene annotation tool for research laboratories. Braz J Med Biol Res 2005, 38:1571-1574.
  • [5]Latorre M, Silva H, Saba J, Guziolowski C, Vizoso P, Martinez V, Maldonado J, Morales A, Caroca R, Cambiazo V, Campos-Vargas R, Gonzalez M, Orellana A, Retamales J, Meisel LA: JUICE: a data management system that facilitates the analysis of large volumes of information in an EST project workflow. BMC Bioinforma 2006, 7:513. BioMed Central Full Text
  • [6]Huang K, Yellapantula V, Baier L, Dinu V: NGSPE: a pipeline for end-to-end analysis of DNA sequencing data and comparison between different platforms. Comput Biol Med 2013, 43:1171-1176.
  • [7]D’Antonio M, D’Onorio De Meo P, Paoletti D, Elmi B, Pallocca M, Sanna N, Picardi E, Pesole G, Castrignanò T: WEP: a high-performance analysis pipeline for whole-exome data. BMC Bioinforma 2013, 14(Suppl 7):S11. BioMed Central Full Text
  • [8]Nagasaki H, Mochizuki T, Kodama Y, Saruhashi S, Morizaki S, Sugawara H, Ohyanagi H, Kurata N, Okubo K, Takagi T, Kaminuma E, Nakamura Y: DDBJ read annotation pipeline: a cloud computing-based pipeline for high-throughput analysis of next-generation sequencing data. DNA Res 2013, 20:383-390.
  • [9]Benson DA, Karsch-Mizrachi I, Clark K, Lipman DJ, Ostell J, Sayers EW: GenBank. Nucleic Acids Res 2012, 40:D48-D53.
  • [10]Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Müller WEG, Wetter T, Suhai S: Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res 2004, 14:1147-1159.
  • [11]Zerbino DR, Birney E: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 2008, 18:821-829.
  • [12]Machado M, Magalhães WC, Sene A, Araújo B, Faria-Campos AC, Chanock SJ, Scott L, Oliveira G, Tarazona-Santos E, Rodrigues MR: Phred-Phrap package to analyses tools: a pipeline to facilitate population genetics re-sequencing studies. Investig Genet 2011, 2:3. BioMed Central Full Text
  • [13]Smit AFA, Hubley R, Green P: RepeatMasker. [ http://www.repeatmasker.org webcite]
  • [14]Huang X, Madan A: CAP3: a DNA sequence assembly program. Genome Res 1999, 9:868-877.
  • [15]Delcher AL, Bratke KA, Powers EC, Salzberg SL: Identifying bacterial genes and endosymbiont DNA with glimmer. Bioinformatics 2007, 23:673-679.
  • [16]Majoros WH, Pertea M, Salzberg SL: TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 2004, 20:2878-2879.
  • [17]Rice P, Longden I, Bleasby A: EMBOSS: the European molecular biology open software suite. Trends Genet 2000, 16:276-277.
  • [18]Lowe TM, Eddy SR: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 1997, 25:955-964.
  • [19]Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25:3389-3402.
  • [20]Petersen TN, Brunak S, von Heijne G, Nielsen H: SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods 2011, 8:785-786.
  • [21]Horton P, Park K-J, Obayashi T, Fujita N, Harada H, Adams-Collier CJ, Nakai K: WoLF PSORT: protein localization predictor. Nucleic Acids Res 2007, 35:W585-W587.
  • [22]Katoh K, Toh H: Parallelization of the MAFFT multiple sequence alignment program. Bioinformatics 2010, 26:1899-1900.
  • [23]Do CB, Mahabhashyam MSP, Brudno M, Batzoglou S: ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 2005, 15:330-340.
  • [24]Crooks GE, Hon G, Chandonia J-M, Brenner SE: WebLogo: a sequence logo generator. Genome Res 2004, 14:1188-1190.
  • [25]Cai W, Pei J, Grishin NV: Reconstruction of ancestral protein sequences and its applications. BMC Evol Biol 2004, 4:33. BioMed Central Full Text
  • [26]Felsenstein J: PHYLIP - phylogeny inference package (version 3.2). Cladistics 1989, 5:164-166.
  • [27]Bruno WJ, Socci ND, Halpern AL: Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. Mol Biol Evol 2000, 17:189-197.
  • [28]Keane TM, Creevey CJ, Pentony MM, Naughton TJ, Mclnerney JO: Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol Biol 2006, 6:29. BioMed Central Full Text
  • [29]Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen Y-J, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Ho CH, Irzyk GP, Jando SC, Alenquer MLI, Jarvie TP, Jirage KB, Kim J-B, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, et al.: Genome sequencing in microfabricated high-density picolitre reactors. Nature 2005, 437:376-380.
  • [30]Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG: Clustal W and Clustal X version 2.0. Bioinformatics 2007, 23:2947-2948.
  • [31]Zdobnov EM, Apweiler R: InterProScan–an integration platform for the signature-recognition methods in InterPro. Bioinformatics 2001, 17:847-848.
  • [32]Apweiler R: UniProt: the universal protein knowledgebase. Nucleic Acids Res 2004, 32:115D-119D.
  • [33]Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, Bateman A, Bernard T, Binns D, Bork P, Burge S, de Castro E, Coggill P, Corbett M, Das U, Daugherty L, Duquenne L, Finn RD, Fraser M, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, et al.: InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res 2011, 40:D306-D312.
  • [34]Jones CE, Baumann U, Brown AL: Automated methods of predicting the function of biological sequences using GO and BLAST. BMC Bioinforma 2005, 6:27. BioMed Central Full Text
  • [35]Guerreiro LT, Souza SS, Wagner G, De Souza EA, Mendes PN, Campos LM, Barros L, Pires PF, Campos ML, Grisard EC, Dávila AM: Exploring the genome of Trypanosoma vivax through GSS and in silico comparative analysis. OMICS 2005, 9:116-128.
  • [36]Cidade DA, Simão TA, Dávila AM, Wagner G, Junqueira-de-Azevedo IL, Ho PL, Bon C, Zingali RB, Albano RM: Bothrops jararaca venom gland transcriptome: analysis of the gene expression pattern. Toxicon 2006, 48:437-461.
  • [37]Pitaluga AN, Beteille V, Lobo AR, Ortigão-Farias JR, Dávila AM, Souza AA, Ramalho-Ortigão JM, Traub-Cseko YM: EST sequencing of blood-fed and Leishmania-infected midgut of Lutzomyia longipalpis, the principal visceral leishmaniasis vector in the Americas. Mol Genet Genomics 2009, 282:307-317.
  • [38]Azevedo RV, Dias DB, Bretãs JA, Mazzoni CJ, Souza NA, Albano RM, Wagner G, Davila AM, Peixoto AA: The transcriptome of Lutzomyia longipalpis (Diptera: Psychodidae) male reproductive organs. PLoS One 2012, 7:e34495.
  • [39]Almeida CR, Stoco PH, Wagner G, Sincero TC, Rotava G, Bayer-Santos E, Rodrigues JB, Sperandio MM, Maia AA, Ojopi EP, Zaha A, Ferreira HB, Tyler KM, Dávila AM, Grisard EC, Dias-Neto E: Transcriptome analysis of taenia solium cysticerci using open reading frame ESTs (ORESTES). Parasit Vectors 2009, 2:35. BioMed Central Full Text
  • [40]Grisard EC, Stoco PH, Wagner G, Sincero TC, Rotava G, Rodrigues JB, Snoeijer CQ, Koerich LB, Sperandio MM, Bayer-Santos E, Fragoso SP, Goldenberg S, Triana O, Vallejo GA, Tyler KM, Dávila AM, Steindel M: Transcriptomic analyses of the avirulent protozoan parasite Trypanosoma rangeli. Mol Biochem Parasitol 2010, 174:18-25.
  • [41]Otto TD, Dillon GP, Degrave WS, Berriman M: RATT: rapid annotation transfer tool. Nucleic Acids Res 2011, 39:e57.
  • [42]Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell B: Artemis: sequence visualization and annotation. Bioinformatics 2000, 16:944-945.
  • [43]Carver T, Harris SR, Berriman M, Parkhill J, McQuillan JA: Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data. Bioinformatics 2012, 28:464-469.
  文献评价指标  
  下载次数:34次 浏览次数:18次