| 20th International Conference on Computing in High Energy and Nuclear Physics | |
| Streamlining CASTOR to manage the LHC data torrent | |
| 物理学;计算机科学 | |
| Presti, G. Lo^1 ; Curull, X Espinal^1 ; Cano, E.^1 ; Fiorini, B.^1 ; Ieri, A.^1 ; Murray, S.^1 ; Ponce, S.^1 ; Sindrilaru, E.^1 | |
| CERN, Geneva 23 | |
| 1211, Switzerland^1 | |
| 关键词: Data placement; Data stream; Production manager; Real-time data; Scheduling systems; Small files; Storage systems; Sub-systems; | |
| Others : https://iopscience.iop.org/article/10.1088/1742-6596/513/4/042031/pdf DOI : 10.1088/1742-6596/513/4/042031 |
|
| 学科分类:计算机科学(综合) | |
| 来源: IOP | |
PDF
|
|
【 摘 要 】
This contribution describes the evolution of the main CERN storage system, CASTOR, as it manages the bulk data stream of the LHC and other CERN experiments, achieving over 90 PB of stored data by the end of LHC Run 1. This evolution was marked by the introduction of policies to optimize the tape sub-system throughput, going towards a cold storage system where data placement is managed by the experiments' production managers. More efficient tape migrations and recalls have been implemented and deployed where bulk meta-data operations greatly reduce the overhead due to small files. A repack facility is now integrated in the system and it has been enhanced in order to automate the repacking of several tens of petabytes, required in 2014 in order to prepare for the next LHC run. Finally the scheduling system has been evolved to integrate the internal monitoring. To efficiently manage the service a solid monitoring infrastructure is required, able to analyze the logs produced by the different components (about 1 kHz of log messages). A new system has been developed and deployed, which uses a transport messaging layer provided by the CERN-IT Agile Infrastructure and exploits technologies including Hadoop and HBase. This enables efficient data mining by making use of MapReduce techniques, and real-time data aggregation and visualization. The outlook for the future is also presented. Directions and possible evolution will be discussed in view of the restart of data taking activities.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| Streamlining CASTOR to manage the LHC data torrent | 1505KB |
PDF