21st International Conference on Computing in High Energy and Nuclear Physics | |
Resilient FTS3 service at GridKa | |
物理学;计算机科学 | |
Hartmann, T.^1 ; Bubeliene, J.^1 ; Hoeft, B.^1 ; Obholz, L.^1 ; Petzold, A.^1 ; Wisniewski, K.^1 | |
Karlsruhe Institute of Technology (KIT), Steinbuch Centre for Computing (SCC), Hermann-von-Helmholtz-Platz 1, Eggenstein-Leopoldshafen | |
D-76344, Germany^1 | |
关键词: Database access; Database clusters; Database queries; File transfers; High availability; Normal operations; Resilient systems; Service components; | |
Others : https://iopscience.iop.org/article/10.1088/1742-6596/664/6/062019/pdf DOI : 10.1088/1742-6596/664/6/062019 |
|
学科分类:计算机科学(综合) | |
来源: IOP | |
【 摘 要 】
The FTS (File Transfer Service) service provides a transfer job scheduler to distribute and replicate vast amounts of data over the heterogeneous WLCG infrastructures. Compared to the channel model of the previous versions, the most recent version of FTS simplifies and improves the flexibility of the service while reducing the load to the service components. The improvements allow to handle a higher number of transfers with a single FTS3 setup. Covering now continent-wide transfers compared to the previous version, whose installations handled only transfers within specific clouds, a resilient system becomes even more necessary with the increased number of depending users. Having set up a FTS3 services at the German T1 site GridKa at KIT in Karlsruhe, we present our experiences on the preparations for a high-availability FTS3 service. Trying to avoid single points of failure, we rely on a database cluster as fault tolerant data back-end and the FTS3 service deployed on an own cluster setup to provide a resilient infrastructure for the users. With the database cluster providing a basic resilience for the data back-end, we ensure on the FTS3 service level a consistent and reliable database access through a proxy solution. On each FTS3 node a HAproxy instance is monitoring the integrity of each database node and distributes database queries over the whole cluster for load balancing during normal operations; in case of a broken database node, the proxy excludes it transparently to the local FTS3 service. The FTS3 service itself consists of a main and a backup instance, which takes over the identity of the main instance, i.e., IP, in case of an error using a CTDB (Cluster Trivial Database) infrastructure offering clients a consistent service.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
Resilient FTS3 service at GridKa | 3086KB | download |