Final Project Report. Scalable fault tolerance runtime technology for petascale computers | |
Krishnamoorthy, Sriram1  Sadayappan, P2  | |
[1] Pacific Northwest National Lab. (PNNL), Richland, WA (United States);Ohio State Univ., Columbus, OH (United States) | |
关键词: fault tolerance; PGAS; runtime systems; | |
DOI : 10.2172/1184567 RP-ID : DOE-OSU--FG02-08ER25850 PID : OSTI ID: 1184567 |
|
学科分类:数学(综合) | |
美国|英语 | |
来源: SciTech Connect | |
![]() |
【 摘 要 】
With the massive number of components comprising the forthcoming petascale computer systems, hardware failures will be routinely encountered during execution of large-scale applications. Due to the multidisciplinary, multiresolution, and multiscale nature of scientific problems that drive the demand for high end systems, applications place increasingly differing demands on the system resources: disk, network, memory, and CPU. In addition to MPI, future applications are expected to use advanced programming models such as those developed under the DARPA HPCS program as well as existing global address space programming models such as Global Arrays, UPC, and Co-Array Fortran. While there has been a considerable amount of work in fault tolerant MPI with a number of strategies and extensions for fault tolerance proposed, virtually none of advanced models proposed for emerging petascale systems is currently fault aware. To achieve fault tolerance, development of underlying runtime and OS technologies able to scale to petascale level is needed. This project has evaluated range of runtime techniques for fault tolerance for advanced programming models.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
223KB | ![]() |