| Frontiers in Cell and Developmental Biology | |
| Integrative Workflows for Metagenomic Analysis | |
| Aristotle A Chatziioannou1  Fragiskos N Kolisis2  Efthymios eLadoukakis2  | |
| [1] National Hellenic Research Foundation;National Technical University of Athens; | |
| 关键词: Metagenomics; bioinformatics; Distributed Computing; Cloud computing; workflow engines; | |
| DOI : 10.3389/fcell.2014.00070 | |
| 来源: DOAJ | |
【 摘 要 】
The rapid evolution of all sequencing technologies, described by the term Next Generation Sequencing (NGS), have revolutionized metagenomic analysis. They constitute a combination of high-throughput analytical protocols, coupled to delicate measuring techniques, in order to potentially discover, properly assemble and map allelic sequences to the correct genomes, achieving particularly high yields for only a fraction of the cost of traditional processes (i.e. Sanger). From a bioinformatic perspective, this boils down to many gigabytes of data being generated from each single sequencing experiment, rendering the management or even the storage, critical bottlenecks with respect to the overall analytical endeavor. The enormous complexity is even more aggravated by the versatility of the processing steps available, represented by the numerous bioinformatic tools that are essential, for each analytical task, in order to fully unveil the genetic content of a metagenomic dataset. These disparate tasks range from simple, nonetheless non-trivial, quality control of raw data to exceptionally complex protein annotation procedures, requesting a high level of expertise for their proper application or the neat implementation of the whole workflow. Furthermore, a bioinformatic analysis of such scale, requires grand computational resources, imposing as the sole realistic solution, the utilization of cloud computing infrastructures. In this review article we discuss different, integrative, bioinformatic solutions available, which address the aforementioned issues, by performing a critical assessment of the available automated pipelines for data management, quality control and annotation of metagenomic data, embracing various, major sequencing technologies and applications.
【 授权许可】
Unknown