会议论文详细信息
20th International Conference on Computing in High Energy and Nuclear Physics
Analysis and improvement of data-set level file distribution in Disk Pool Manager
物理学;计算机科学
Skipsey, Samuel Cadellin^1 ; Purdie, Stuart^2 ; Britton, David^1 ; Mitchell, Mark^1 ; Bhimji, Wahid^3 ; Smith, David^4
Department of Physics and Astronomy, University of Glasgow, G12 8QQ, United Kingdom^1
University of St.Andrews, School of Computer Science, KY16 9SX, United Kingdom^2
University of Edinburgh, School of Physics and Astronomy, James Clerk Maxwell Building, Mayfield Road, Edinburgh
EH9 3JZ, United Kingdom^3
European Organization for Nuclear Research (CERN), Genève
CH-1211, Switzerland^4
关键词: File distribution;    File location;    File placement;    Filesystem;    Namespaces;    Round Robin algorithms;    Storage arrays;    Storage elements;   
Others  :  https://iopscience.iop.org/article/10.1088/1742-6596/513/4/042042/pdf
DOI  :  10.1088/1742-6596/513/4/042042
学科分类:计算机科学(综合)
来源: IOP
PDF
【 摘 要 】

Of the three most widely used implementations of the WLCG Storage Element specification, Disk Pool Manager[1, 2] (DPM) has the simplest implementation of file placement balancing (StoRM doesn't attempt this, leaving it up to the underlying filesystem, which can be very sophisticated in itself). DPM uses a round-robin algorithm (with optional filesystem weighting), for placing files across filesystems and servers. This does a reasonable job of evenly distributing files across the storage array provided to it. However, it does not offer any guarantees of the evenness of distribution of that subset of files associated with a given «dataset» (which often maps onto a «directory» in the DPM namespace (DPNS)). It is useful to consider a concept of «balance», where an optimally balanced set of files indicates that the files are distributed evenly across all of the pool nodes. The best case performance of the round robin algorithm is to maintain balance, it has no mechanism to improve balance. In the past year or more, larger DPM sites have noticed load spikes on individual disk servers, and suspected that these were exacerbated by excesses of files from popular datasets on those servers. We present here a software tool which analyses file distribution for all datasets in a DPM SE, providing a measure of the poorness of file location in this context. Further, the tool provides a list of file movement actions which will improve dataset-level file distribution, and can action those file movements itself. We present results of such an analysis on the UKI-SCOTGRID-GLASGOW Production DPM.

【 预 览 】
附件列表
Files Size Format View
Analysis and improvement of data-set level file distribution in Disk Pool Manager 849KB PDF download
  文献评价指标  
  下载次数:15次 浏览次数:51次