科技报告详细信息
Final Project Report. Scalable fault tolerance runtime technology for petascale computers
Krishnamoorthy, Sriram1  Sadayappan, P2 
[1]Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
[2]Ohio State Univ., Columbus, OH (United States)
关键词: fault tolerance;    PGAS;    runtime systems;   
DOI  :  10.2172/1184567
RP-ID  :  DOE-OSU--FG02-08ER25850
PID  :  OSTI ID: 1184567
学科分类:数学(综合)
美国|英语
来源: SciTech Connect
PDF
【 摘 要 】
With the massive number of components comprising the forthcoming petascale computer systems, hardware failures will be routinely encountered during execution of large-scale applications. Due to the multidisciplinary, multiresolution, and multiscale nature of scientific problems that drive the demand for high end systems, applications place increasingly differing demands on the system resources: disk, network, memory, and CPU. In addition to MPI, future applications are expected to use advanced programming models such as those developed under the DARPA HPCS program as well as existing global address space programming models such as Global Arrays, UPC, and Co-Array Fortran. While there has been a considerable amount of work in fault tolerant MPI with a number of strategies and extensions for fault tolerance proposed, virtually none of advanced models proposed for emerging petascale systems is currently fault aware. To achieve fault tolerance, development of underlying runtime and OS technologies able to scale to petascale level is needed. This project has evaluated range of runtime techniques for fault tolerance for advanced programming models.
【 预 览 】
附件列表
Files Size Format View
223KB PDF download
  文献评价指标  
  下载次数:12次 浏览次数:19次