期刊论文详细信息
Source Code for Biology and Medicine
FCC – An automated rule-based processing tool for life science data
Christian Panse1  Can Türker1  Simon Barkow-Oesterreicher1 
[1]Functional Genomics Center Zurich (FGCZ), Swiss Federal Institute of Technology Zurich (ETHZ)|University of Zurich (UZH), CH-8057 Zurich, Winterthurerstrasse 190, Switzerland
关键词: Data processing;    High-throughput;    Automatization;    Computing;   
Others  :  805877
DOI  :  10.1186/1751-0473-8-3
 received in 2012-10-30, accepted in 2013-01-03,  发布年份 2013
PDF
【 摘 要 】

Background

Data processing in the bioinformatics field often involves the handling of diverse software programs in one workflow. The field is lacking a set of standards for file formats so that files have to be processed in different ways in order to make them compatible to different analysis programs. The problem is that mass spectrometry vendors at most provide only closed-source Windows libraries to programmatically access their proprietary binary formats. This prohibits the creation of an efficient and unified tool that fits all processing needs of the users. Therefore, researchers are spending a significant amount of time using GUI-based conversion and processing programs. Besides the time needed for manual usage, such programs also can show long running times for processing, because most of them make use of only a single CPU. In particular, algorithms to enhance data quality, e.g. peak picking or deconvolution of spectra, add waiting time for the users.

Results

To automate these processing tasks and let them run continuously without user interaction, we developed the FGCZ Converter Control (FCC) at the Functional Genomics Center Zurich (FGCZ) core facility. The FCC is a rule-based system for automated file processing that reduces the operation of diverse programs to a single configuration task. Using filtering rules for raw data files, the parameters for all tasks can be custom-tailored to the needs of every single researcher and processing can run automatically and efficiently on any number of servers in parallel using all available CPU resources.

Conclusions

FCC has been used intensively at FGCZ for processing more than hundred thousand mass spectrometry raw files so far. Since we know that many other research facilities have similar problems, we would like to report on our tool and the accompanying ideas for an efficient set-up for potential reuse.

【 授权许可】

   
2013 Barkow-Oesterreicher et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140708084025431.pdf 981KB PDF download
Figure 4. 82KB Image download
Figure 3. 21KB Image download
Figure 2. 32KB Image download
Figure 1. 52KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

【 参考文献 】
  • [1]Martens L, Chambers M, Sturm M, Kessner D, Levander F, Shofstahl J, Tang WH, Roempp A, Neumann S, Pizarro AD, Montecchi-Palazzi L, Tasman N, Coleman M, Reisinger F, Souda P, Hermjakob H, Binz PA, Deutsch EW: mzML-a community standard for mass spectrometry data. Mol Cell Proteomics 2011, 10:R110.000133.
  • [2]Selkov G: unfinnigan – Painless extraction of mass spectra from thermo “raw” files. 2012. [HTTP://code.google.com/p/unfinnigan/ webcite]
  • [3]Kessner D, Chambers M, Burke R, Agus D, Mallick P: ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics (Oxford, England) 2008, 24(21):2534-2536.
  • [4]Savitski MM, Mathieson T, Becher I, Bantscheff M: H-score, a mass accuracy driven rescoring approach for improved peptide identification in modification rich samples. J Proteome Res 2010, 9(11):5511-5516.
  • [5]Kiebel GR, Auberry KJ, Jaitly N, Clark DA, Monroe ME, Peterson ES, Tolić N, Anderson GA, Smith RD: PRISM: A data management system for high-throughput proteomics. Proteomics 2006, 6(6):1783-1790.
  • [6]Türker C, Stolte E, Joho D, Schlapbach R: B-Fabric: A Data and Application Integration Framework for Life Sciences Research. Berlin, Heidelberg: Springer Berlin Heidelberg; 2007. [http://link.springer.com webcite]
  • [7]Hartmeier D, Systor AG: Designand performance of the OpenBSD Stateful packet filter. USENIX 2002 Annu Tech Conf 2002, 171-180.
  • [8]Berthold MR, Cebron N, Dill F, Gabriel TR, Kötter T, Meinl T, Ohl P, Sieb C, Thiel K, Wiswedel B: KNIME: The Konstanz information Miner. In Studiesin Classification, Data Analysis, and Knowledge Organization (GfKL 2007). Springer; 2007.
  文献评价指标  
  下载次数:42次 浏览次数:29次