15th International Workshop on Advanced Computing and Analysis Techniques in Physics Research | |
Experience, use, and performance measurement of the Hadoop File System in a typical nuclear physics analysis workflow | |
物理学;计算机科学 | |
Sangaline, E.^1 ; Lauret, J.^2 | |
Physics Department, University of California Davis, Davis, CA 95616-5270, United States^1 | |
Physics Department, Brookhaven National Laboratory, Upton, NY 11973-5000, United States^2 | |
关键词: Attractive solutions; Computing resource; Distributed file-system; Dynamic configuration; Dynamic environments; Nuclear and particle physics; Performance and scalabilities; Performance measurements; | |
Others : https://iopscience.iop.org/article/10.1088/1742-6596/523/1/012006/pdf DOI : 10.1088/1742-6596/523/1/012006 |
|
学科分类:计算机科学(综合) | |
来源: IOP | |
【 摘 要 】
The quantity of information produced in Nuclear and Particle Physics (NPP) experiments necessitates the transmission and storage of data across diverse collections of computing resources. Robust solutions such as XRootD have been used in NPP, but as the usage of cloud resources grows, the difficulties in the dynamic configuration of these systems become a concern. Hadoop File System (HDFS) exists as a possible cloud storage solution with a proven track record in dynamic environments. Though currently not extensively used in NPP, HDFS is an attractive solution offering both elastic storage and rapid deployment. We will present the performance of HDFS in both canonical I/O tests and for a typical data analysis pattern within the RHIC/STAR experimental framework. These tests explore the scaling with different levels of redundancy and numbers of clients. Additionally, the performance of FUSE and NFS interfaces to HDFS were evaluated as a way to allow existing software to function without modification. Unfortunately, the complicated data structures in NPP are non-trivial to integrate with Hadoop and so many of the benefits of the MapReduce paradigm could not be directly realized. Despite this, our results indicate that using HDFS as a distributed filesystem offers reasonable performance and scalability and that it excels in its ease of configuration and deployment in a cloud environment.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
Experience, use, and performance measurement of the Hadoop File System in a typical nuclear physics analysis workflow | 837KB | download |