20th International Conference on Computing in High Energy and Nuclear Physics | |
CMS users data management service integration and first experiences with its NoSQL data storage | |
物理学;计算机科学 | |
Riahi, H.^1 ; Spiga, D.^2 ; Boccali, T.^3 ; Ciangottini, D.^4 ; Cinquilli, M.^2 ; Hernàndez, J.M.^5 ; Konstantinov, P.^6 ; Mascheroni, M.^7 ; Santocchia, A.^4 | |
INFN Perugia, Via Alessandro Pascoli, Perugia | |
06123, Italy^1 | |
European Organisation for Nuclear Research, IT Department, Geneva 23 | |
CH-1211, Switzerland^2 | |
INFN Pisa, Edificio C, Via F. Buonarroti 2, Pisa | |
56127, Italy^3 | |
Universita' and INFN Perugia, Via Alessandro Pascoli, Perugia | |
06123, Italy^4 | |
CIEMAT, Av Complutense, 40, Madrid | |
28040, Spain^5 | |
INRNE, Tzarigradsko Chaussee blvd, 72, Sofia | |
BG-1784, Bulgaria^6 | |
INFN Milano-Bicocca, Edificio U2, Piazza della Scienza, 3, I-Milano | |
I-20126, Italy^7 | |
关键词: Analysis frameworks; Computing resource; Data management services; Distributed data analysis; High availability; Integration strategy; Real time monitoring; Service performance; | |
Others : https://iopscience.iop.org/article/10.1088/1742-6596/513/3/032079/pdf DOI : 10.1088/1742-6596/513/3/032079 |
|
学科分类:计算机科学(综合) | |
来源: IOP | |
【 摘 要 】
The distributed data analysis workflow in CMS assumes that jobs run in a different location to where their results are finally stored. Typically the user outputs must be transferred from one site to another by a dedicated CMS service, AsyncStageOut. This new service is originally developed to address the inefficiency in using the CMS computing resources when transferring the analysis job outputs, synchronously, once they are produced in the job execution node to the remote site. The AsyncStageOut is designed as a thin application relying only on the NoSQL database (CouchDB) as input and data storage. It has progressed from a limited prototype to a highly adaptable service which manages and monitors the whole user files steps, namely file transfer and publication. The AsyncStageOut is integrated with the Common CMS/Atlas Analysis Framework. It foresees the management of nearly nearly 200k users' files per day of close to 1000 individual users per month with minimal delays, and providing a real time monitoring and reports to users and service operators, while being highly available. The associated data volume represents a new set of challenges in the areas of database scalability and service performance and efficiency. In this paper, we present an overview of the AsyncStageOut model and the integration strategy with the Common Analysis Framework. The motivations for using the NoSQL technology are also presented, as well as data design and the techniques used for efficient indexing and monitoring of the data. We describe deployment model for the high availability and scalability of the service. We also discuss the hardware requirements and the results achieved as they were determined by testing with actual data and realistic loads during the commissioning and the initial production phase with the Common Analysis Framework.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
CMS users data management service integration and first experiences with its NoSQL data storage | 1069KB | download |