会议论文详细信息
20th International Conference on Computing in High Energy and Nuclear Physics
CMS users data management service integration and first experiences with its NoSQL data storage
物理学;计算机科学
Riahi, H.^1 ; Spiga, D.^2 ; Boccali, T.^3 ; Ciangottini, D.^4 ; Cinquilli, M.^2 ; Hernàndez, J.M.^5 ; Konstantinov, P.^6 ; Mascheroni, M.^7 ; Santocchia, A.^4
INFN Perugia, Via Alessandro Pascoli, Perugia
06123, Italy^1
European Organisation for Nuclear Research, IT Department, Geneva 23
CH-1211, Switzerland^2
INFN Pisa, Edificio C, Via F. Buonarroti 2, Pisa
56127, Italy^3
Universita' and INFN Perugia, Via Alessandro Pascoli, Perugia
06123, Italy^4
CIEMAT, Av Complutense, 40, Madrid
28040, Spain^5
INRNE, Tzarigradsko Chaussee blvd, 72, Sofia
BG-1784, Bulgaria^6
INFN Milano-Bicocca, Edificio U2, Piazza della Scienza, 3, I-Milano
I-20126, Italy^7
关键词: Analysis frameworks;    Computing resource;    Data management services;    Distributed data analysis;    High availability;    Integration strategy;    Real time monitoring;    Service performance;   
Others  :  https://iopscience.iop.org/article/10.1088/1742-6596/513/3/032079/pdf
DOI  :  10.1088/1742-6596/513/3/032079
学科分类:计算机科学(综合)
来源: IOP
PDF
【 摘 要 】

The distributed data analysis workflow in CMS assumes that jobs run in a different location to where their results are finally stored. Typically the user outputs must be transferred from one site to another by a dedicated CMS service, AsyncStageOut. This new service is originally developed to address the inefficiency in using the CMS computing resources when transferring the analysis job outputs, synchronously, once they are produced in the job execution node to the remote site. The AsyncStageOut is designed as a thin application relying only on the NoSQL database (CouchDB) as input and data storage. It has progressed from a limited prototype to a highly adaptable service which manages and monitors the whole user files steps, namely file transfer and publication. The AsyncStageOut is integrated with the Common CMS/Atlas Analysis Framework. It foresees the management of nearly nearly 200k users' files per day of close to 1000 individual users per month with minimal delays, and providing a real time monitoring and reports to users and service operators, while being highly available. The associated data volume represents a new set of challenges in the areas of database scalability and service performance and efficiency. In this paper, we present an overview of the AsyncStageOut model and the integration strategy with the Common Analysis Framework. The motivations for using the NoSQL technology are also presented, as well as data design and the techniques used for efficient indexing and monitoring of the data. We describe deployment model for the high availability and scalability of the service. We also discuss the hardware requirements and the results achieved as they were determined by testing with actual data and realistic loads during the commissioning and the initial production phase with the Common Analysis Framework.

【 预 览 】
附件列表
Files Size Format View
CMS users data management service integration and first experiences with its NoSQL data storage 1069KB PDF download
  文献评价指标  
  下载次数:25次 浏览次数:44次