16th International workshop on Advanced Computing and Analysis Techniques in physics research | |
The Error Reporting in the ATLAS TDAQ System | |
物理学;计算机科学 | |
Kolos, Serguei^1 ; Kazarov, Andrei^2,3 ; Papaevgeniou, Lykourgos^2 | |
University of California, Irvine, United States^1 | |
CERN, Switzerland^2 | |
Petersburg Nuclear Physics Institute, Kurchatov NPI, Gatchina, Russia^3 | |
关键词: Application program interfaces; Distributed middleware; On-line applications; Online environments; Runtime environments; Software applications; Software developer; Static information; | |
Others : https://iopscience.iop.org/article/10.1088/1742-6596/608/1/012004/pdf DOI : 10.1088/1742-6596/608/1/012004 |
|
学科分类:计算机科学(综合) | |
来源: IOP | |
【 摘 要 】
The ATLAS Error Reporting provides a service that allows experts and shift crew to track and address errors relating to the data taking components and applications. This service, called the Error Reporting Service (ERS), gives to software applications the opportunity to collect and send comprehensive data about run-time errors, to a place where it can be intercepted in real-time by any other system component. Other ATLAS online control and monitoring tools use the ERS as one of their main inputs to address system problems in a timely manner and to improve the quality of acquired data. The actual destination of the error messages depends solely on the run-time environment, in which the online applications are operating. When an application sends information to ERS, depending on the configuration, it may end up in a local file, a database, distributed middleware which can transport it to an expert system or display it to users. Thanks to the open framework design of ERS, new information destinations can be added at any moment without touching the reporting and receiving applications. The ERS Application Program Interface (API) is provided in three programming languages used in the ATLAS online environment: C++, Java and Python. All APIs use exceptions for error reporting but each of them exploits advanced features of a given language to simplify the end-user program writing. For example, as C++ lacks language support for exceptions, a number of macros have been designed to generate hierarchies of C++ exception classes at compile time. Using this approach a software developer can write a single line of code to generate a boilerplate code for a fully qualified C++ exception class declaration with arbitrary number of parameters and multiple constructors, which encapsulates all relevant static information about the given type of issues. When a corresponding error occurs at run time, the program just need to create an instance of that class passing relevant values to one of the available class constructors and send this instance to ERS. This paper presents the original design solutions exploited for the ERS implementation and describes how it was used during the first ATLAS run period. The cross-system error reporting standardization introduced by ERS was one of the key points for the successful implementation of automated mechanisms for online error recovery.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
The Error Reporting in the ATLAS TDAQ System | 699KB | download |