会议论文详细信息
20th International Conference on Computing in High Energy and Nuclear Physics
Evaluation of Apache Hadoop for parallel data analysis with ROOT
物理学;计算机科学
Lehrack, S.^1 ; Duckeck, G.^1 ; Ebke, J.^1
Ludwigs-Maximilians-University Munich, Department of Elementary Particle Physics, Am Coulombwall 1, Garching
D-85748, Germany^1
关键词: Apache hadoop;    Binary data files;    Clusters of computers;    Distributed processing;    Job management;    Large datasets;    Parallel data;    Processing platform;   
Others  :  https://iopscience.iop.org/article/10.1088/1742-6596/513/3/032054/pdf
DOI  :  10.1088/1742-6596/513/3/032054
学科分类:计算机科学(综合)
来源: IOP
PDF
【 摘 要 】

The Apache Hadoop software is a Java based framework for distributed processing of large data sets across clusters of computers, using the Hadoop file system (HDFS) for data storage and backup and MapReduce as a processing platform. Hadoop is primarily designed for processing large textual data sets which can be processed in arbitrary chunks, and must be adapted to the use case of processing binary data files which cannot be split automatically. However, Hadoop offers attractive features in terms of fault tolerance, task supervision and control, multi-user functionality and job management. For this reason, we evaluated Apache Hadoop as an alternative approach to PROOF for ROOT data analysis. Two alternatives in distributing analysis data were discussed: either the data was stored in HDFS and processed with MapReduce, or the data was accessed via a standard Grid storage system (dCache Tier-2) and MapReduce was used only as execution back-end. The focus in the measurements were on the one hand to safely store analysis data on HDFS with reasonable data rates and on the other hand to process data fast and reliably with MapReduce. In the evaluation of the HDFS, read/write data rates from local Hadoop cluster have been measured and compared to standard data rates from the local NFS installation. In the evaluation of MapReduce, realistic ROOT analyses have been used and event rates have been compared to PROOF.

【 预 览 】
附件列表
Files Size Format View
Evaluation of Apache Hadoop for parallel data analysis with ROOT 868KB PDF download
  文献评价指标  
  下载次数:23次 浏览次数:56次