期刊论文详细信息
BMC Bioinformatics
Implementing a genomic data management system using iRODS in the Wellcome Trust Sanger Institute
Correspondence
Guy Coates1  Gen-Tao Chiang1  Peter Clapham1  Kevin Sale2  Guoying Qi3 
[1] Wellcome Trust Sanger Institute, Informatics System Group, Wellcome Trust Genome Campus, CB10 1SA, Hinxton, UK;Wellcome Trust Sanger Institute, Infrastructure Management Team, Wellcome Trust Genome Campus, CB10 1SA, Hinxton, UK;Wellcome Trust Sanger Institute, New Sequencing Technologies, Wellcome Trust Genome Campus, CB10 1SA, Hinxton, UK;
关键词: Large Hadron Collider;    File System;    Data Management System;    Rule Engine;    Wellcome Trust Sanger Institute;   
DOI  :  10.1186/1471-2105-12-361
 received in 2011-05-20, accepted in 2011-09-09,  发布年份 2011
来源: Springer
PDF
【 摘 要 】

BackgroundIncreasingly large amounts of DNA sequencing data are being generated within the Wellcome Trust Sanger Institute (WTSI). The traditional file system struggles to handle these increasing amounts of sequence data. A good data management system therefore needs to be implemented and integrated into the current WTSI infrastructure. Such a system enables good management of the IT infrastructure of the sequencing pipeline and allows biologists to track their data.ResultsWe have chosen a data grid system, iRODS (Rule-Oriented Data management systems), to act as the data management system for the WTSI. iRODS provides a rule-based system management approach which makes data replication much easier and provides extra data protection. Unlike the metadata provided by traditional file systems, the metadata system of iRODS is comprehensive and allows users to customize their own application level metadata. Users and IT experts in the WTSI can then query the metadata to find and track data.The aim of this paper is to describe how we designed and used (from both system and user viewpoints) iRODS as a data management system. Details are given about the problems faced and the solutions found when iRODS was implemented. A simple use case describing how users within the WTSI use iRODS is also introduced.ConclusionsiRODS has been implemented and works as the production system for the sequencing pipeline of the WTSI. Both biologists and IT experts can now track and manage data, which could not previously be achieved. This novel approach allows biologists to define their own metadata and query the genomic data using those metadata.

【 授权许可】

CC BY   
© Chiang et al; licensee BioMed Central Ltd. 2011

【 预 览 】
附件列表
Files Size Format View
RO202311107080916ZK.pdf 487KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  文献评价指标  
  下载次数:7次 浏览次数:2次