科技报告详细信息
Methodology and Application of HPC I/O Characterization with MPIProf and IOT
Chang, Yan-Tyng Sherry ; Jin, Henry ; Bauer, John
关键词: SUPERCOMPUTERS;    INPUT/OUTPUT ROUTINES;    APPLICATIONS PROGRAMS (COMPUTERS);    COMPUTER PROGRAMS;    PERFORMANCE TESTS;    SYSTEMS ANALYSIS;   
RP-ID  :  ARC-E-DAA-TN35654
学科分类:计算机系统及组成
美国|英语
来源: NASA Technical Reports Server
PDF
【 摘 要 】

Combining the strengths of MPIProf and IOT, an efficient and systematic method is devised for I/O characterization at the per-job, per-rank, per-file and per-call levels of HPC programs running on the NASA Advanced Supercomputing Center. This method is applied to answer four I/O questions in this paper. A total of 13 MPI programs and 15 cases, ranging from 24 to 5968 ranks, are analyzed to establish the I/O landscape from answers to the four questions. Four of the 13 programs use MPI I/O and the behavior of their collective writes depends on the specific implementation of the MPI library used. The SGI MPT library, the prevailing MPI library for our systems, was found to gather small writes from a large number of ranks to perform larger writes by a small subset of collective buffering ranks. The number of collective buffering ranks invoked by MPT depends on the Lustre stripe count and the number of nodes used for the run. A demonstration of varying the stripe count to achieve double-digit speedup of one program's I/O was presented. Another program, which concurrently opens private files by all ranks and could potentially create a heavy load on the Lustre servers, was identified. The ability to systematically characterize I/O for a large number of programs running on a supercomputer, seek I/O optimization opportunity and identify programs that could cause a high load and instability on the filesystems is important for pursuing exascale in a real production environment.

【 预 览 】
附件列表
Files Size Format View
20190000271.pdf 686KB PDF download
  文献评价指标  
  下载次数:15次 浏览次数:8次