期刊论文详细信息
BMC Bioinformatics
eHive: An Artificial Intelligence workflow system for genomic analysis
Methodology Article
Paul Flicek1  Javier Herrero1  Albert J Vilella1  Leo Gordon1  Michael Schuster1  Stephen Fitzgerald1  Kathryn Beal1  Abel Ureta-Vidal2  Jessica Severin3 
[1] European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Hinxton, Cambridge, UK;European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Hinxton, Cambridge, UK;Eagle Genomics Ltd., Babraham Research Campus, CB22 3AT, Cambridge, UK;European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Hinxton, Cambridge, UK;RIKEN Yokohama Institute, Omics Sciences Center (OSC), 1-7-22 Suehiro-cho, 230-0045, Tsurumi-ku, Yokohama, Kanagawa, Japan;
关键词: Fault Tolerance;    Autonomous Agent;    Control Rule;    Central Controller;    Data Flow Graph;   
DOI  :  10.1186/1471-2105-11-240
 received in 2009-10-21, accepted in 2010-05-11,  发布年份 2010
来源: Springer
PDF
【 摘 要 】

BackgroundThe Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future.ResultsWe present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios.ConclusionseHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/.

【 授权许可】

Unknown   
© Severin et al; licensee BioMed Central Ltd. 2010. This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

【 预 览 】
附件列表
Files Size Format View
RO202311093262766ZK.pdf 3975KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  文献评价指标  
  下载次数:11次 浏览次数:2次