BMC Genomics | |
Metatranscriptomes from diverse microbial communities: assessment of data reduction techniques for rigorous annotation | |
Vincent Moulton3  Thomas Mock2  Simon Moxon1  Andrew Toseland3  | |
[1] The Genome Analysis Centre (TGAC), Norwich Research Park, Norwich, Norfolk NR4 7UH, UK;School of Environmental Sciences, University of East Anglia, Norwich Research Park, Norwich, Norfolk NR4 7TJ, UK;School of Computing Sciences, University of East Anglia, Norwich Research Park, Norwich, Norfolk NR4 7TJ, UK | |
关键词: Assembly; Clustering; Data reduction; Sequence processing; Metatranscriptomics; | |
Others : 1128455 DOI : 10.1186/1471-2164-15-901 |
|
received in 2014-06-10, accepted in 2014-09-29, 发布年份 2014 | |
【 摘 要 】
Background
Metatranscriptome sequence data can contain highly redundant sequences from diverse populations of microbes and so data reduction techniques are often applied before taxonomic and functional annotation. For metagenomic data, it has been observed that the variable coverage and presence of closely related organisms can lead to fragmented assemblies containing chimeric contigs that may reduce the accuracy of downstream analyses and some advocate the use of alternate data reduction techniques. However, it is unclear how such data reduction techniques impact the annotation of metatranscriptome data and thus affect the interpretation of the results.
Results
To investigate the effect of such techniques on the annotation of metatranscriptome data we assess two commonly employed methods: clustering and de-novo assembly. To do this, we also developed an approach to simulate 454 and Illumina metatranscriptome data sets with varying degrees of taxonomic diversity. For the Illumina simulations, we found that a two-step approach of assembly followed by clustering of contigs and unassembled sequences produced the most accurate reflection of the real protein domain content of the sample. For the 454 simulations, the combined annotation of contigs and unassembled reads produced the most accurate protein domain annotations.
Conclusions
Based on these data we recommend that assembly be attempted, and that unassembled reads be included in the final annotation for metatranscriptome data, even from highly diverse environments as the resulting annotations should lead to a more accurate reflection of the transcriptional behaviour of the microbial population under investigation.
【 授权许可】
2014 Toseland et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150223180546370.pdf | 881KB | download | |
Figure 3. | 66KB | Image | download |
Figure 2. | 62KB | Image | download |
Figure 1. | 56KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
【 参考文献 】
- [1]Bailly J, Fraissinet-Tachet L, Verner M-C, Debaud J-C, Lemaire M, Wésolowski-Louvel M, Marmeisse R: Soil eukaryotic functional diversity, a metatranscriptomic approach. ISME J 2007, 1:632-642.
- [2]Urich T, Lanzén A, Qi J, Huson DH, Schleper C, Schuster SC: Simultaneous Assessment of Soil Microbial Community Structure and Function through Analysis of the Meta-Transcriptome. PLoS One 2008, 3:e2527.
- [3]Damon C, Lehembre F, Oger-Desfeux C, Luis P, Ranger J, Fraissinet-Tachet L, Marmeisse R: Metatranscriptomics Reveals the Diversity of Genes Expressed by Eukaryotes in Forest Soils. PLoS One 2012, 7:e28967.
- [4]Gilbert JA, Field D, Huang Y, Edwards R, Li W, Gilna P, Joint I: Detection of Large Numbers of Novel Sequences in the Metatranscriptomes of Complex Marine Microbial Communities. PLoS One 2008, 3:e3042.
- [5]Marchetti A, Schruth DM, Durkin CA, Parker MS, Kodner RB, Berthiaume CT, Morales R, Allen AE, Armbrust EV: Comparative metatranscriptomics identifies molecular bases for the physiological responses of phytoplankton to varying iron availability. Proc Natl Acad Sci 2012, 109:E317-E325.
- [6]Toseland A, Daines SJ, Clark JR, Kirkham A, Strauss J, Uhlig C, Lenton TM, Valentin K, Pearson GA, Moulton V, Mock T: The impact of temperature on marine phytoplankton resource allocation and metabolism. Nat Clim Chang 2013, 3:979-984.
- [7]Gosalbes MJ, Durbán A, Pignatelli M, Abellan JJ, Jiménez-Hernández N, Pérez-Cobas AE, Latorre A, Moya A: Metatranscriptomic Approach to Analyze the Functional Human Gut Microbiota. PLoS One 2011, 6:e17447.
- [8]Xiong X, Frank DN, Robertson CE, Hung SS, Markle J, Canty AJ, McCoy KD, Macpherson AJ, Poussier P, Danska JS, Parkinson J: Generation and Analysis of a Mouse Intestinal Metatranscriptome through Illumina Based RNA-Sequencing. PLoS One 2012, 7:e36009.
- [9]Niu B, Fu L, Sun S, Li W: Artificial and natural duplicates in pyrosequencing reads of metagenomic data. BMC Bioinformatics 2010, 11:187.
- [10]Wommack KE, Bhavsar J, Ravel J: Metagenomics: Read Length Matters. Appl Environ Microbiol 2008, 74:1453-1463.
- [11]Mavromatis K, Ivanova N, Barry K, Shapiro H, Goltsman E, McHardy AC, Rigoutsos I, Salamov A, Korzeniewski F, Land M, Lapidus A, Grigoriev I, Richardson P, Hugenholtz P, Kyrpides NC: Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat Methods 2007, 4:495-500.
- [12]Thomas T, Gilbert J, Meyer F: Metagenomics - a guide from sampling to data analysis. Microb Inform Exp 2012, 2:3.
- [13]Mende DR, Waller AS, Sunagawa S, Järvelin AI, Chan MM, Arumugam M, Raes J, Bork P: Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data. PLoS One 2012, 7:e31386.
- [14]Stewart FJ, Ulloa O, DeLong EF: Microbial metatranscriptomics in a permanent marine oxygen minimum zone. Environ Microbiol 2012, 14:23-40.
- [15]Eilers KG, Debenport S, Anderson S, Fierer N: Digging deeper to find unique microbial communities: The strong effect of depth on the structure of bacterial and archaeal communities in soil. Soil Biol Biochem 2012, 50:58-65.
- [16]Rinta-Kanto JM, Sun S, Sharma S, Kiene RP, Moran MA: Bacterial community transcription patterns during a marine phytoplankton bloom. Environ Microbiol 2012, 14:228-239.
- [17]Gilbert JA, Field D, Swift P, Thomas S, Cummings D, Temperton B, Weynberg K, Huse S, Hughes M, Joint I, Somerfield PJ, Mühling M: The Taxonomic and Functional Diversity of Microbes at a Temperate Coastal Site: A "Multi-Omic" Study of Seasonal and Diel Temporal Variation. PLoS One 2010, 5:e15545.
- [18]Hurwitz BL, Deng L, Poulos BT, Sullivan MB: Evaluation of methods to concentrate and purify ocean virus communities through comparative, replicated metagenomics. Environ Microbiol 2013, 15:1428-1440.
- [19]Richter DC, Ott F, Auch AF, Schmid R, Huson DH: MetaSim—A Sequencing Simulator for Genomics and Metagenomics. PLoS One 2008, 3:e3373.
- [20]Pignatelli M, Moya A: Evaluating the fidelity of de novo short read metagenomic assembly using simulated data. PLoS One 2011, 6:e19984.
- [21]Garcia-Etxebarria K, Garcia-Garcerà M, Calafell F: Consistency of metagenomic assignment programs in simulated and real data. BMC Bioinformatics 2014, 15:90.
- [22]Griebel T, Zacher B, Ribeca P, Raineri E, Lacroix V, Guigó R, Sammeth M: Modelling and simulating generic RNA-Seq experiments with the flux simulator. Nucleic Acids Res 2012, 40:10073-10083.
- [23]Larsen PE, Collart FR: BowStrap v1.0: Assigning statistical significance to expressed genes using short-read transcriptome data. BMC Res Notes 2012, 5:275.
- [24]Radax R, Rattei T, Lanzen A, Bayer C, Rapp HT, Urich T, Schleper C: Metatranscriptomics of the marine sponge Geodia barretti: tackling phylogeny and function of its microbial community. Environ Microbiol 2012, 14:1308-1324.
- [25]Cooper ED, Bentlage B, Gibbons TR, Bachvaroff TR, Delwiche CF: Metatranscriptome profiling of a harmful algal bloom. Harmful Algae 2014, 37:75-83.
- [26]Moran MA: Metatranscriptomics: eavesdropping on complex microbial communities. Microbe Mag 2009, 4:329-335.
- [27]Ueda HR, Hayashi S, Matsuyama S, Yomo T, Hashimoto S, Kay SA, Hogenesch JB, Iino M: Universality and flexibility in gene expression from bacteria to human. Proc Natl Acad Sci U S A 2004, 101:3765-3769.
- [28]Nacher JC, Akutsu T: Sensitivity of the power-law exponent in gene expression distribution to mRNA decay rate. Phys Lett A 2006, 360:174-178.
- [29]Markowitz VM, Korzeniewski F, Palaniappan K, Szeto E, Werner G, Padki A, Zhao X, Dubchak I, Hugenholtz P, Anderson I, Lykidis A, Mavromatis K, Ivanova N, Kyrpides NC: The integrated microbial genomes (IMG) system. Nucleic Acids Res 2006, 34(suppl 1):D344-D348.
- [30]Lysholm F, Andersson B, Persson B: An efficient simulator of 454 data using configurable statistical models. BMC Res Notes 2011, 4:449.
- [31]Huang W, Li L, Myers JR, Marth GT: ART: a next-generation sequencing read simulator. Bioinformatics 2012, 28:593-594.
- [32]Li W, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006, 22:1658-1659.
- [33]Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Müller WEG, Wetter T, Suhai S: Using the miraEST Assembler for Reliable and Automated mRNA Transcript Assembly and SNP Detection in Sequenced ESTs. Genome Res 2004, 14:1147-1159.
- [34]Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 2011, 29:644-652.
- [35]Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 2009, 10:1-10.
- [36]Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer ELL, Eddy SR, Bateman A, Finn RD: The Pfam protein families database. Nucleic Acids Res 2012, 40:D290-D301.
- [37]Charuvaka A, Rangwala H: Evaluation of short read metagenomic assembly. BMC Genomics 2011, 12(Suppl 2):S8.