期刊论文详细信息
BMC Genomics
Census-based rapid and accurate metagenome taxonomic profiling
Raja Mazumder5  Amy Zanne7  Vahan Simonyan1  Mariya Shcheglovitova2  Konstantinos Krampis3  W Evan Johnson4  Yang Pan6  Amirhossein Shamsaddini6 
[1] Center for Biologics Evaluation and Research, Food and Drug Administration, Rockville 20852, MD, USA;Department of Biological Sciences, The George Washington University, 2023 G Street NW, Washington DC 20052, USA;Bioinformatics Department, The J. Craig Venter Institute, Rockville, MD 20850, USA;Division of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA;McCormick Genomic and Proteomic Center, George Washington University, Washington DC 20037, USA;Department of Biochemistry and Molecular Medicine, George Washington University, Washington DC 20037, USA;Center for Conservation and Sustainable Development, Missouri Botanical Garden, St. Louis, MO 63166, USA
关键词: Diagnostics;    Taxonomic profiling;    Next-gen sequence analysis;    Census-based;    Metagenome;   
Others  :  1128438
DOI  :  10.1186/1471-2164-15-918
 received in 2013-09-11, accepted in 2014-10-06,  发布年份 2014
PDF
【 摘 要 】

Background

Understanding the taxonomic composition of a sample, whether from patient, food or environment, is important to several types of studies including pathogen diagnostics, epidemiological studies, biodiversity analysis and food quality regulation. With the decreasing costs of sequencing, metagenomic data is quickly becoming the preferred typed of data for such analysis.

Results

Rapidly defining the taxonomic composition (both taxonomic profile and relative frequency) in a metagenomic sequence dataset is challenging because the task of mapping millions of sequence reads from a metagenomic study to a non-redundant nucleotide database such as the NCBI non-redundant nucleotide database (nt) is a computationally intensive task. We have developed a robust subsampling-based algorithm implemented in a tool called CensuScope meant to take a ‘sneak peak’ into the population distribution and estimate taxonomic composition as if a census was taken of the metagenomic landscape. CensuScope is a rapid and accurate metagenome taxonomic profiling tool that randomly extracts a small number of reads (based on user input) and maps them to NCBI’s nt database. This process is repeated multiple times to ascertain the taxonomic composition that is found in majority of the iterations, thereby providing a robust estimate of the population and measures of the accuracy for the results.

Conclusion

CensuScope can be run on a laptop or on a high-performance computer. Based on our analysis we are able to provide some recommendations in terms of the number of sequence reads to analyze and the number of iterations to use. For example, to quantify taxonomic groups present in the sample at a level of 1% or higher a subsampling size of 250 random reads with 50 iterations yields a statistical power of >99%. Windows and UNIX versions of CensuScope are available for download at https://hive.biochemistry.gwu.edu/dna.cgi?cmd=censuscope webcite. CensuScope is also available through the High-performance Integrated Virtual Environment (HIVE) and can be used in conjunction with other HIVE analysis and visualization tools.

【 授权许可】

   
2014 Shamsaddini et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150223112700449.pdf 691KB PDF download
Figure 4. 55KB Image download
Figure 3. 42KB Image download
Figure 2. 51KB Image download
Figure 1. 104KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

【 参考文献 】
  • [1]Pagani I, Liolios K, Jansson J, Chen IM, Smirnova T, Nosrat B, Markowitz VM, Kyrpides NC: The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 2012, 40(Database issue):D571-D579.
  • [2]NCBI_Resource_Coordinators: Database resources of the national center for biotechnology information. Nucleic Acids Res 2013, 41(Database issue):D8-D20.
  • [3]Kennedy J, Flemer B, Jackson SA, Lejon DP, Morrissey JP, O’Gara F, Dobson AD: Marine metagenomics: new tools for the study and exploitation of marine microbial metabolism. Mar Drugs 2010, 8(3):608-628.
  • [4]Bru D, Ramette A, Saby NP, Dequiedt S, Ranjard L, Jolivet C, Arrouays D, Philippot L: Determinants of the distribution of nitrogen-cycling microbial communities at the landscape scale. ISME J 2011, 5(3):532-542.
  • [5]Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Yamada T, Mende DR, Li J, Xu J, Li S, Li D, Cao J, Wang B, Liang H, Zheng H, Xie Y, Tap J, Lepage P, Bertalan M, Batto JM, Hansen T, Le Paslier D, Linneberg A, Nielsen HB, Pelletier E, Renault P: A human gut microbial gene catalogue established by metagenomic sequencing. Nature 2010, 464(7285):59-65.
  • [6]Human_Microbiome_Project_Consortium: Structure, function and diversity of the healthy human microbiome. Nature 2012, 486(7402):207-214.
  • [7]Greenblum S, Turnbaugh PJ, Borenstein E: Metagenomic systems biology of the human gut microbiome reveals topological shifts associated with obesity and inflammatory bowel disease. Proc Natl Acad Sci U S A 2012, 109(2):594-599.
  • [8]Fierer N, Leff JW, Adams BJ, Nielsen UN, Bates ST, Lauber CL, Owens S, Gilbert JA, Wall DH, Caporaso JG: Cross-biome metagenomic analyses of soil microbial communities and their functional attributes. Proc Natl Acad Sci U S A 2012, 109(52):21390-21395.
  • [9]Huson DH, Auch AF, Qi J, Schuster SC: MEGAN analysis of metagenomic data. Genome Res 2007, 17(3):377-386.
  • [10]Segata N, Boernigen D, Tickle TL, Morgan XC, Garrett WS, Huttenhower C: Computational meta’omics for microbial community studies. Mol Syst Biol 2013, 9:666.
  • [11]Backhed F, Ding H, Wang T, Hooper LV, Koh GY, Nagy A, Semenkovich CF, Gordon JI: The gut microbiota as an environmental factor that regulates fat storage. Proc Natl Acad Sci U S A 2004, 101(44):15718-15723.
  • [12]Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, Ley RE, Sogin ML, Jones WJ, Roe BA, Affourtit JP, Egholm M, Henrissat B, Heath AC, Knight R, Gordon JI: A core gut microbiome in obese and lean twins. Nature 2009, 457(7228):480-484.
  • [13]Kau AL, Ahern PP, Griffin NW, Goodman AL, Gordon JI: Human nutrition, the gut microbiome and the immune system. Nature 2011, 474(7351):327-336.
  • [14]Manichanh C, Rigottier-Gois L, Bonnaud E, Gloux K, Pelletier E, Frangeul L, Nalin R, Jarrin C, Chardon P, Marteau P, Roca J, Dore J: Reduced diversity of faecal microbiota in Crohn’s disease revealed by a metagenomic approach. Gut 2006, 55(2):205-211.
  • [15]Morgan XC, Tickle TL, Sokol H, Gevers D, Devaney KL, Ward DV, Reyes JA, Shah SA, LeLeiko N, Snapper SB, Bousvaros A, Korzenik J, Sands BE, Xavier RJ, Huttenhower C: Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment. Genome Biol 2012, 13(9):R79. BioMed Central Full Text
  • [16]Blumberg R, Powrie F: Microbiota, disease, and back to health: a metastable journey. Sci Transl Med 2012, 4(137):137rv137.
  • [17]Steele HL, Streit WR: Metagenomics: advances in ecology and biotechnology. FEMS Microbiol Lett 2005, 247(2):105-111.
  • [18]Wooley JC, Godzik A, Friedberg I: A primer on metagenomics. PLoS Comput Biol 2010, 6(2):e1000667.
  • [19]Schmidt TM, DeLong EF, Pace NR: Analysis of a marine picoplankton community by 16S rRNA gene cloning and sequencing. J Bacteriol 1991, 173(14):4371-4378.
  • [20]Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap AH, Lomas MW, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Pfannkoch C, Rogers YH, Smith HO: Environmental genome shotgun sequencing of the Sargasso Sea. Science 2004, 304(5667):66-74.
  • [21]Meldrum D: Automation for genomics, part one: preparation for sequencing. Genome Res 2000, 10(8):1081-1092.
  • [22]Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, et al.: Genome sequencing in microfabricated high-density picolitre reactors. Nature 2005, 437(7057):376-380.
  • [23]Zhang K, Martiny AC, Reppas NB, Barry KW, Malek J, Chisholm SW, Church GM: Sequencing genomes from single cells by polymerase cloning. Nat Biotechnol 2006, 24(6):680-686.
  • [24]Tyson GW, Chapman J, Hugenholtz P, Allen EE, Ram RJ, Richardson PM, Solovyev VV, Rubin EM, Rokhsar DS, Banfield JF: Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 2004, 428(6978):37-43.
  • [25]Patil KR, Roune L, McHardy AC: The PhyloPythiaS web server for taxonomic assignment of metagenome sequences. PLoS One 2012, 7(6):e38581.
  • [26]Brady A, Salzberg S: PhymmBL expanded: confidence scores, custom databases, parallelization and more. Nat Methods 2011, 8(5):367.
  • [27]Rosen GL, Reichenberger ER, Rosenfeld AM: NBC: the Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads. Bioinformatics 2011, 27(1):127-129.
  • [28]Wu M, Scott AJ: Phylogenomic analysis of bacterial and archaeal sequences with AMPHORA2. Bioinformatics 2012, 28(7):1033-1034.
  • [29]Liu B, Gibbons T, Ghodsi M, Treangen T, Pop M: Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences. BMC Genomics 2011, 12((Suppl 2)):S4.
  • [30]Segata N, Waldron L, Ballarini A, Narasimhan V, Jousson O, Huttenhower C: Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods 2012, 9(8):811-814.
  • [31]Francis OE, Bendall M, Manimaran S, Hong C, Clement NL, Castro-Nallar E, Snell Q, Schaalje GB, Clement MJ, Crandall KA, Johnson WE: Pathoscope: Species identification and strain attribution with unassembled sequencing data. Genome Res 2013.
  • [32]Mazumder R, Natale DA, Murthy S, Thiagarajan R, Wu CH: Computational identification of strain-, species- and genus-specific proteins. BMC bioinformatics 2005, 6:279. BioMed Central Full Text
  • [33]Yu K, Zhang T: Construction of customized sub-databases from NCBI-nr database for rapid annotation of huge metagenomic datasets using a combined BLAST and MEGAN approach. PLoS One 2013, 8(4):e59831.
  • [34]Abbai NS, Govender A, Shaik R, Pillay B: Pyrosequence analysis of unamplified and whole genome amplified DNA from hydrocarbon-contaminated groundwater. Mol Biotechnol 2012, 50(1):39-48.
  • [35]Berger SA, Stamatakis A: Aligning short reads to reference alignments and trees. Bioinformatics 2011, 27(15):2068-2075.
  • [36]Teeling H, Glockner FO: Current opportunities and challenges in microbial metagenome analysis–a bioinformatic perspective. Brief Bioinform 2012, 13(6):728-742.
  • [37]Hunter CI, Mitchell A, Jones P, McAnulla C, Pesseat S, Scheremetjew M, Hunter S: Metagenomic analysis: the challenge of the data bonanza. Brief Bioinform 2012, 13(6):743-746.
  • [38]Mande SS, Mohammed MH, Ghosh TS: Classification of metagenomic sequences: methods and challenges. Brief Bioinform 2012, 13(6):669-681.
  • [39]Prakash T, Taylor TD: Functional assignment of metagenomic data: challenges and applications. Brief Bioinform 2012, 13(6):711-727.
  • [40]Huang W, Li L, Myers JR, Marth GT: ART: a next-generation sequencing read simulator. Bioinformatics 2012, 28(4):593-594.
  • [41]Bühlmann P, Yu B: Analyzing Bagging. Ann Stat 2002, 30(4):927-961.
  • [42]DN P, JP R: Large sample confidence regions based on subsamples under minimal assumptions. Annals of Statistics 1994, 22:2031-2050.
  • [43]Matsumoto M, Mersenne Twister NT: Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator. ACM Trans Model Comput Simul 1998, 8(1):3-30.
  • [44]Chen C, Natale DA, Finn RD, Huang H, Zhang J, Wu CH, Mazumder R: Representative proteomes: a stable, scalable and unbiased proteome set for sequence analysis and functional annotation. PLoS One 2011, 6(4):e18910.
  • [45]Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J, Miller W, Kent WJ, Nekrutenko A: Galaxy: a platform for interactive large-scale genome analysis. Genome Res 2005, 15(10):1451-1455.
  • [46]Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Methods 2012, 9(4):357-359.
  • [47]Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009, 25(14):1754-1760.
  • [48]Kaffenberger JT, Schilling JS: Using a grass substrate to compare decay among two clades of brown rot fungi. Appl Microbiol Biotechnol 2013.
  • [49]Morel M, Meux E, Mathieu Y, Thuillier A, Chibani K, Harvengt L, Jacquot JP, Gelhaye E: Xenomic networks variability and adaptation traits in wood decaying fungi. J Microbial Biotechnol 2013, 6(3):248-263.
  • [50]Kamei I, Yoshida T, Enami D, Meguro S: Coexisting Curtobacterium bacterium promotes growth of white-rot fungus Stereum sp. Curr Microbiol 2012, 64(2):173-178.
  • [51]Zhang HB, Yang MX, Tu R: Unexpectedly high bacterial diversity in decaying wood of a conifer as revealed by a molecular method. Int Biodeter Biodegr 2008, 62(4):471-474.
  • [52]Kubartova A, Ottosson E, Dahlberg A, Stenlid J: Patterns of fungal communities among and within decaying logs, revealed by 454 sequencing. Mol Ecol 2012, 21(18):4514-4532.
  • [53]Bugg TD, Ahmad M, Hardiman EM, Singh R: The emerging role for bacteria in lignin degradation and bio-product formation. Curr Opin Biotechnol 2011, 22(3):394-400.
  • [54]Lysholm F, Wetterbom A, Lindau C, Darban H, Bjerkner A, Fahlander K, Lindberg AM, Persson B, Allander T, Andersson B: Characterization of the viral microbiome in patients with severe lower respiratory tract infections, using metagenomic sequencing. PLoS One 2012, 7(2):e30875.
  • [55]Santana-Quintero L, Dingerdissen H, Thierry-Mieg J, Mazumder R, Simonyan V: HIVE-hexagon: high-performance, parallelized sequence alignment for next-generation sequencing data analysis. PLoS One 2014, 9(6):e99033.
  • [56]Krishna NK, Cunnion KM: Role of molecular diagnostics in the management of infectious disease emergencies. Med Clin North Am 2012, 96(6):1067-1078.
  • [57]Sibley CD, Peirano G, Church DL: Molecular methods for pathogen and microbial community detection and characterization: current and potential application in diagnostic microbiology. Infect Genet Evol 2012, 12(3):505-521.
  • [58]Mann RA, Smits TH, Buhlmann A, Blom J, Goesmann A, Frey JE, Plummer KM, Beer SV, Luck J, Duffy B, Rodoni B: Comparative genomics of 12 strains of Erwinia amylovora identifies a pan-genome with a large conserved core. PLoS One 2013, 8(2):e55644.
  • [59]Fouts DE, Brinkac L, Beck E, Inman J, Sutton G: PanOCT: automated clustering of orthologs using conserved gene neighborhood for pan-genomic analysis of bacterial strains and closely related species. Nucleic Acids Res 2012, 40(22):e172.
  • [60]Zhao Y, Wu J, Yang J, Sun S, Xiao J, Yu J: PGAP: pan-genomes analysis pipeline. Bioinformatics 2012, 28(3):416-418.
  • [61]Karsch-Mizrachi I, Nakamura Y, Cochrane G: The international nucleotide sequence database collaboration. Nucleic Acids Res 2012, 40(Database issue):D33-D37.
  • [62]Wu M, Eisen JA: A simple, fast, and accurate method of phylogenomic inference. Genome Biol 2008, 9(10):R151. BioMed Central Full Text
  文献评价指标  
  下载次数:23次 浏览次数:13次