| BMC Genomics | |
| Evaluation of viral genome assembly and diversity estimation in deep metagenomes | |
| Antonio Alcamí1  Florent E Angly2  Daniel Aguirre de Cárcer1  | |
| [1] Centro de Biología Molecular Severo Ochoa, Consejo Superior de Investigaciones Científicas (CSIC)–Universidad Autónoma de Madrid, Madrid, Spain;Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia, Brisbane, QLD 4072, Australia | |
| 关键词: Virome; Metagenomics; Diversity; Assembly; | |
| Others : 1092302 DOI : 10.1186/1471-2164-15-989 |
|
| received in 2014-08-24, accepted in 2014-10-30, 发布年份 2014 | |
PDF
|
|
【 摘 要 】
Background
Viruses have unique properties, small genome and regions of high similarity, whose effects on metagenomic assemblies have not been characterized so far. This study uses diverse in silico simulated viromes to evaluate how extensively genomes can be assembled using different sequencing platforms and assemblers. Further, it investigates the suitability of different methods to estimate viral diversity in metagenomes.
Results
We created in silico metagenomes mimicking various platforms at different sequencing depths. The CLC assembler revealed subpar compared to IDBA_UD and CAMERA , which are metagenomic-specific. Up to a saturation point, Illumina platforms proved more capable of reconstructing large portions of viral genomes compared to 454. Read length was an important factor for limiting chimericity, while scaffolding marginally improved contig length and accuracy. The genome length of the various viruses in the metagenomes did not significantly affect genome reconstruction, but the co-existence of highly similar genomes was detrimental. When evaluating diversity estimation tools, we found that PHACCS results were more accurate than those from CatchAll and clustering, which were both orders of magnitude above expected.
Conclusions
Assemblers designed specifically for the analysis of metagenomes should be used to facilitate the creation of high-quality long contigs. Despite the high coverage possible, scientists should not expect to always obtain complete genomes, because their reconstruction may be hindered by co-existing species bearing highly similar genomic regions. Further development of metagenomics-oriented assemblers may help bypass these limitations in future studies. Meanwhile, the lack of fully reconstructed communities keeps methods to estimate viral diversity relevant. While none of the three methods tested had absolute precision, only PHACCS was deemed suitable for comparative studies.
【 授权许可】
2014 Aguirre de Cárcer et al.; licensee BioMed Central Ltd.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| 20150128182301661.pdf | 1118KB | ||
| Figure 4. | 101KB | Image | |
| Figure 3. | 44KB | Image | |
| Figure 2. | 66KB | Image | |
| Figure 1. | 59KB | Image |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
【 参考文献 】
- [1]Angly FE, Felts B, Breitbart M, Salamon P, Edwards RA, Carlson C, Chan AM, Haynes M, Kelley S, Liu H, Mahaffy JM, Mueller JE, Nulton J, Olson R, Parsons R, Rayhawk S, Suttle CA, Rohwer F: The Marine Viromes of Four Oceanic Regions. PLoS Biol 2006, 4:e368.
- [2]Mavromatis K, Ivanova N, Barry K, Shapiro H, Goltsman E, McHardy AC, Rigoutsos I, Salamov A, Korzeniewski F, Land M, Lapidus A, Grigoriev I, Richardson P, Hugenholtz P, Kyrpides NC: Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat Methods 2007, 4:495-500.
- [3]Pignatelli M, Moya A: Evaluating the fidelity of de novo short read metagenomic assembly using simulated data. PLoS One 2011, 6:23.
- [4]Charuvaka A, Rangwala H: Evaluation of short read metagenomic assembly. BMC Genomics 2011, 12:1471-2164.
- [5]Mende DR, Waller AS, Sunagawa S, Järvelin AI, Chan MM, Arumugam M, Raes J, Bork P: Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data. PLoS One 2012, 7:e31386.
- [6]Luo C, Tsementzi D, Kyrpides N, Read T, Konstantinidis KT: Direct comparisons of Illumina vs. Roche 454 sequencing technologies on the same microbial community DNA sample. PLoS One 2012, 7:10.
- [7]Luo C, Tsementzi D, Kyrpides NC, Konstantinidis KT: Individual genome assembly from complex community short-read metagenomic datasets. ISME J 2012, 6:898-901.
- [8]Vazquez-Castellanos JF, Garcia-Lopez R, Perez-Brocal V, Pignatelli M, Moya A: Comparison of different assembly and annotation tools on analysis of simulated viral metagenomic communities in the gut. BMC Genomics 2014, 15:1471-2164.
- [9]Solonenko S, Ignacio-Espinoza J, Alberti A, Cruaud C, Hallam S, Konstantinidis K, Tyson G, Wincker P, Sullivan M: Sequencing platform and library preparation choices impact viral metagenomes. BMC Genomics 2013, 14:320. BioMed Central Full Text
- [10]Edgar RC: Search and clustering orders of magnitude faster than BLAST. Bioinformatics 2010, 26:2460-2461.
- [11]Reyes A, Haynes M, Hanson N, Angly FE, Heath AC, Rohwer F, Gordon JI: Viruses in the faecal microbiota of monozygotic twins and their mothers. Nature 2010, 466:334-338.
- [12]Roux S, Faubladier M, Mahul A, Paulhe N, Bernard A, Debroas D, Enault F: Metavir: a web server dedicated to virome analysis. Bioinformatics 2011, 27:3074-3075.
- [13]Angly F, Rodriguez-Brito B, Bangor D, McNairnie P, Breitbart M, Salamon P, Felts B, Nulton J, Mahaffy J, Rohwer F: PHACCS, an online tool for estimating the structure and diversity of uncultured viral communities using metagenomic information. BMC Bioinformatics 2005, 6:41. BioMed Central Full Text
- [14]Allen HK, Bunge J, Foster JA, Bayles DO, Stanton TB: Estimation of viral richness from shotgun metagenomes using a frequency count approach. Microbiome 2013, 1:2049-2618.
- [15]Bunge J, Woodard L, Böhning D, Foster JA, Connolly S, Allen HK: Estimating population diversity with CatchAll. Bioinformatics 2012, 28:1045-1047.
- [16]Nagarajan N, Pop M: Sequence assembly demystified. Nat Rev Genet 2013, 14:157-167.
- [17]Hu X, Yuan J, Shi Y, Lu J, Liu B, Li Z, Chen Y, Mu D, Zhang H, Li N, Yue Z, Bai F, Li H, Fan W: pIRS: Profile-based Illumina pair-end reads simulator. Bioinformatics 2012, 28:1533-1535.
- [18]Astrovskaya I, Tork B, Mangul S, Westbrooks K, Mandoiu I, Balfe P, Zelikovsky A: Inferring viral quasispecies spectra from 454 pyrosequencing reads. BMC Bioinformatics 2011, 12 Supp 6:S1.
- [19]Zagordi O, Geyrhofer L, Roth V, Beerenwinkel N: Deep sequencing of a genetically heterogeneous sample: local haplotype reconstruction and read error correction. J Comput Biol 2010, 17:417-428.
- [20]Hoffmann KH, Rodriguez-Brito B, Breitbart M, Bangor D, Angly F, Felts B, Nulton J, Rohwer F, Salamon P: Power law rank–abundance models for marine phage communities. FEMS Microbiol Lett 2007, 273:224-228.
- [21]McElroy K, Luciani F, Thomas T: GemSIM: general, error-model based simulator of next-generation sequencing data. BMC Genomics 2012, 13:74. BioMed Central Full Text
- [22]Minoche A, Dohm J, Himmelbauer H: Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems. Genome Biol 2011, 12:R112. BioMed Central Full Text
- [23]Glenn TC: Field guide to next-generation DNA sequencers. Mol Ecol Resour 2011, 11:759-769.
- [24]Richter DC, Ott F, Auch AF, Schmid R, Huson DH: MetaSim—A Sequencing Simulator for Genomics and Metagenomics. PLoS One 2008, 3:e3373.
- [25]Konstantinidis KT, Ramette A, Tiedje JM: The bacterial species definition in the genomic era. Philos Trans R Soc Lond B Biol Sci 2006, 361:1929-1940.
- [26]Lopez-Bueno A, Tamames J, Velazquez D, Moya A, Quesada A, Alcami A: High diversity of the viral community from an Antarctic lake. Science 2009, 326:858-861.
- [27]Schmieder R, Edwards R: Quality control and preprocessing of metagenomic datasets. Bioinformatics 2011, 27:863-864.
- [28]Seshadri R, Kravitz SA, Smarr L, Gilna P, Frazier M: CAMERA: A Community Resource for Metagenomics. PLoS Biol 2007, 5:e75.
- [29]Peng Y, Leung HC, Yiu SM, Chin FY: IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 2012, 28:1420-1428.
- [30]Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL: Versatile and open software for comparing large genomes. Genome Biol 2004, 5:30. BioMed Central Full Text
- [31]Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Methods 2012, 9:357-359.
- [32]Dray S: The ade4 package: implementing the duality diagram for ecologists. J Stat Softw 2007, 22:1.
- [33]Pruitt KD, Tatusova T, Maglott DR: NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 2005, 33:D501-D504.
- [34]Angly FE, Willner D, Rohwer F, Hugenholtz P, Tyson GW: Grinder: a versatile amplicon and shotgun sequence simulator. Nucleic Acids Res 2012, 40:e94.
- [35]Treangen TJ, Sommer DD, Angly FE, Koren S, Pop M: Next generation sequence assembly with AMOS. Curr Protoc Bioinformatics 2011, 11:11-18.
PDF