科技报告详细信息
LDRD final report : managing shared memory data distribution in hybrid HPC applications.
Merritt, Alexander M. (Georgia Institute of Technology, Atlanta, GA) ; Pedretti, Kevin Thomas Tauke
Sandia National Laboratories
关键词: Computers;    Distribution;    Programming;    Kernels;    99 General And Miscellaneous//Mathematics, Computing, And Information Science;   
DOI  :  10.2172/1007320
RP-ID  :  SAND2010-6262
RP-ID  :  AC04-94AL85000
RP-ID  :  1007320
美国|英语
来源: UNT Digital Library
PDF
【 摘 要 】

MPI is the dominant programming model for distributed memory parallel computers, and is often used as the intra-node programming model on multi-core compute nodes. However, application developers are increasingly turning to hybrid models that use threading within a node and MPI between nodes. In contrast to MPI, most current threaded models do not require application developers to deal explicitly with data locality. With increasing core counts and deeper NUMA hierarchies seen in the upcoming LANL/SNL 'Cielo' capability supercomputer, data distribution poses an upper boundary on intra-node scalability within threaded applications. Data locality therefore has to be identified at runtime using static memory allocation policies such as first-touch or next-touch, or specified by the application user at launch time. We evaluate several existing techniques for managing data distribution using micro-benchmarks on an AMD 'Magny-Cours' system with 24 cores among 4 NUMA domains and argue for the adoption of a dynamic runtime system implemented at the kernel level, employing a novel page table replication scheme to gather per-NUMA domain memory access traces.

【 预 览 】
附件列表
Files Size Format View
1007320.pdf 553KB PDF download
  文献评价指标  
  下载次数:6次 浏览次数:18次