BMC Bioinformatics | |
BioMaS: a modular pipeline for Bioinformatic analysis of Metagenomic AmpliconS | |
Bruno Fosso4  Monica Santamaria4  Marinella Marzano4  Daniel Alonso-Alemany3  Gabriel Valiente3  Giacinto Donvito1  Alfonso Monaco1  Pasquale Notarangelo1  Graziano Pesole2  | |
[1] National Institute of Nuclear Physics, via E. Orabona 4, Bari 70125, Italy | |
[2] Center of Excellence in Comparative Genomics, University of Bari “A. Moro”, via E. Orabona, 4, Bari 70125, Italy | |
[3] Algorithms, Bioinformatics, Complexity and Formal Methods Research Group, Technical University of Catalonia, E-08034, Barcelona, Spain | |
[4] Institute of Biomembranes and Bioenergetics, Consiglio Nazionale delle Ricerche, via Amendola 165/A, Bari 70126, Italy | |
关键词: High-Throughput Sequencing; Meta-barcoding; Microbiome; Bioinformatics; Metagenomics; | |
Others : 1231822 DOI : 10.1186/s12859-015-0595-z |
|
received in 2015-03-03, accepted in 2015-04-23, 发布年份 2015 |
【 摘 要 】
Background
Substantial advances in microbiology, molecular evolution and biodiversity have been carried out in recent years thanks to Metagenomics, which allows to unveil the composition and functions of mixed microbial communities in any environmental niche. If the investigation is aimed only at the microbiome taxonomic structure, a target-based metagenomic approach, here also referred as Meta-barcoding, is generally applied. This approach commonly involves the selective amplification of a species-specific genetic marker (DNA meta-barcode) in the whole taxonomic range of interest and the exploration of its taxon-related variants through High-Throughput Sequencing (HTS) technologies. The accessibility to proper computational systems for the large-scale bioinformatic analysis of HTS data represents, currently, one of the major challenges in advanced Meta-barcoding projects.
Results
BioMaS (Bioinformatic analysis of Metagenomic AmpliconS) is a new bioinformatic pipeline designed to support biomolecular researchers involved in taxonomic studies of environmental microbial communities by a completely automated workflow, comprehensive of all the fundamental steps, from raw sequence data upload and cleaning to final taxonomic identification, that are absolutely required in an appropriately designed Meta-barcoding HTS-based experiment. In its current version, BioMaS allows the analysis of both bacterial and fungal environments starting directly from the raw sequencing data from either Roche 454 or Illumina HTS platforms, following two alternative paths, respectively. BioMaS is implemented into a public web service available at https://recasgateway.ba.infn.it/ and is also available in Galaxy at http://galaxy.cloud.ba.infn.it:8080 (only for Illumina data).
Conclusion
BioMaS is a friendly pipeline for Meta-barcoding HTS data analysis specifically designed for users without particular computing skills. A comparative benchmark, carried out by using a simulated dataset suitably designed to broadly represent the currently known bacterial and fungal world, showed that BioMaS outperforms QIIME and MOTHUR in terms of extent and accuracy of deep taxonomic sequence assignments.
【 授权许可】
2015 Fosso et al.
【 参考文献 】
- [1]Klindworth A, Pruesse E, Schweer T, Peplies J, Quast C, Horn M et al.. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies. Nucleic Acids Res. 2013; 41(1):e1.
- [2]Pearson WR, Robins G, Wrege DE, Zhang TT. On the primer selection problem in polymerase chain reaction experiments. Discrete Appl Math. 1996; 71(1–3):231-46.
- [3]Bazinet AL, Cummings MP. A comparative evaluation of sequence classification programs. BMC Bioinformatics. 2012; 13:92. BioMed Central Full Text
- [4]Santamaria M, Fosso B, Consiglio A, De Caro G, Grillo G, Licciulli F et al.. Reference databases for taxonomic assignment in metagenomics. Brief Bioinform. 2012; 13(6):682-95.
- [5]FastQC. http://www. bioinformatics.babraham.ac.uk/projects/fastqc/ webcite
- [6]Quince C, Lanzen A, Davenport RJ, Turnbaugh PJ. Removing noise from pyrosequenced amplicons. BMC Bioinformatics. 2011; 12:38. BioMed Central Full Text
- [7]Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic Local Alignment Search Tool. J Mol Biol. 1990; 215(3):403-10.
- [8]Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012; 9(4):357-9.
- [9]Alonso-Alemany D, Barre A, Beretta S, Bonizzoni P, Nikolski M, Valiente G. Further Steps in TANGO: improved taxonomic assignment in metagenomics. Bioinformatics. 2014; 30(1):17-23.
- [10]Clemente JC, Jansson J, Valiente G. Flexible taxonomic assignment of ambiguous sequencing reads. BMC Bioinformatics. 2011; 12:8. BioMed Central Full Text
- [11]Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK et al.. QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 2010; 7(5):335-6.
- [12]Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB et al.. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009; 75(23):7537-41.
- [13]Andersson AF, Lindberg M, Jakobsson H, Backhed F, Nyren P, Engstrand L. Comparative analysis of human gut microbiota by barcoded pyrosequencing. PLoS One. 2008; 3(7):e2836.
- [14]Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ et al.. The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res. 2009; 37(Database issue):D141-5.
- [15]DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K et al.. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol. 2006; 72(7):5069-72.
- [16]Federhen S. The NCBI Taxonomy database. Nucleic Acids Res. 2012; 40(Database issue):D136-43.
- [17]Huerta-Cepas J, Dopazo J, Gabaldon T. ETE: a python Environment for Tree Exploration. BMC Bioinformatics. 2010; 11:24. BioMed Central Full Text
- [18]Magoc T, Salzberg SL. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics. 2011; 27(21):2957-63.
- [19]Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010; 26(19):2460-1.
- [20]Trim Galore! http://www. bioinformatics.babraham.ac.uk/projects/trim_galore/ webcite
- [21]Arndt D, Xia J, Liu Y, Zhou Y, Guo AC, Cruz JA et al.. METAGENassist: a comprehensive web server for comparative metagenomics. Nucleic Acids Res. 2012; 40(Web Server issue):W88-95.
- [22]Donvito G, Vicario S, Notarangelo P, Balech B: The BioVeL Project: Robust phylogenetic workflows running on the GRID. In: EGI Community Forum 2012/EMI Second Technical Conference. 2012.
- [23]Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M, et al. Galaxy: a web-based genome analysis tool for experimentalists. In: Ausubel FM et al., editors. Current protocols in molecular biology, vol. 19. 2010. Unit 19 10 11–21.
- [24]Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P et al.. Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 2005; 15(10):1451-5.
- [25]Goecks J, Nekrutenko A, Taylor J, Galaxy T. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010; 11(8):R86. BioMed Central Full Text
- [26]Yilmaz P, Parfrey LW, Yarza P, Gerken J, Pruesse E, Quast C et al.. The SILVA and "All-species Living Tree Project (LTP)" taxonomic frameworks. Nucleic Acids Res. 2014; 42(Database issue):D643-8.
- [27]Abarenkov K, Henrik Nilsson R, Larsson KH, Alexander IJ, Eberhardt U, Erland S et al.. The UNITE database for molecular identification of fungi–recent updates and future perspectives. New Phytol. 2010; 186(2):281-5.
- [28]Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007; 35(Database issue):D61-5.
- [29]Sayers EW, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V et al.. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2009; 37(Database issue):D5-15.
- [30]Pesole G, Liuni S, D'Souza M. PatSearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance. Bioinformatics. 2000; 16(5):439-50.
- [31]Grillo G, Licciulli F, Liuni S, Sbisa E, Pesole G. PatSearch: A program for the detection of patterns and structural motifs in nucleotide sequences. Nucleic Acids Res. 2003; 31(13):3608-12.
- [32]Stecher B, Chaffron S, Kappeli R, Hapfelmeier S, Freedrich S, Weber TC et al.. Like will to like: abundances of closely related species can predict susceptibility to intestinal colonization by pathogenic and commensal bacteria. PLoS Pathog. 2010; 6(1):e1000711.
- [33]Balint M, Schmidt PA, Sharma R, Thines M, Schmitt I. An Illumina metabarcoding pipeline for fungi. Ecol evol. 2014; 4(13):2642-53.
- [34]Huang WC, Li LP, Myers JR, Marth GT. ART: a next-generation sequencing read simulator. Bioinformatics. 2012; 28(4):593-4.
- [35]Balzer S, Malde K, Lanzen A, Sharma A, Jonassen I. Characteristics of 454 pyrosequencing data–enabling realistic simulation with flowsim. Bioinformatics. 2010; 26(18):i420-5.
- [36]Manzari C, Fosso B, Marzano M, Annese A, Caprioli R, D’Erchia AM, et al. The influence of invasive jellyfish blooms on the aquatic microbiome in a coastal lagoon (Varano, SE Italy) detected by an Illumina-based deep sequencing strategy. Biological Invasions 2015, (in press).