学位论文

【摘要】

Multicore chips have become the standard building blocks for all current and future massively parallel machines. Much work has been done in scientific and engineering HPC applications to exploit shared-memory multicore nodes. This thesis, in contrast, pays close attention to the parallel language runtime system–a software layer that supports the execution of parallel applications. The essential idea is to parallelize the language runtime with threads as a natural consequence of the same general approach in applications to take advantage of the shared memory on a multicore node. Using the asynchronous message-driven CHARM++ runtime system as an evaluation platform, we address the key question of how the runtime should be designed and how it can be optimized for multicore nodes on parallel machines so that applications running atop the runtime can achieve better performance with as few changes as possible.Since the runtime performance on a single node is the basis for the overall runtime performance at scale, we have identified key factors for the runtime to run well on a single node, and developed corresponding optimization techniques. We have also developedthe CkLoop library in the CHARM++ runtime, which showcases the necessity of a unified runtime that can make better support of the parallelism at different granularity.Furthermore, we have explored the design space of work responsibility assignment among the threads in the multithreaded runtime. In the context of a runtime design of dedicated communication threads, we have investigated the consequent communication issues with the help from our extension to a performance analysis tool, and proposed methods that can resolve the issues. To achieve even better performance in applications, we have shown how developers can leverage new capabilities offered by the runtime, and developed new load balancing strategies that are more effective on multicore platforms.Finally, we have demonstrated the performance improvement on real production-levelscientific applications, including NAMD, a widely-used molecular dynamics simulation program, by using this multithreaded runtime on petascale massively parallel machines. In the case of the 100M-atom STMV simulation using NAMD, the multithreaded runtime leads NAMD to achieve about two-fold performance improvement on 224,076 coresof JaguarPF (Cray XT5), and about three times improvement in machine utilization on Intrepid (BlueGene/P). It also makes NAMD more scalable up to the full machine of JaguarPF and Titan (Cray XK6).

【预览】

附件列表
Files	Size	Format	View
Message-driven parallel language runtime design and optimizations for multicore-based massively parallel machines	2572KB	PDF	download


Message-driven parallel language runtime design and optimizations for multicore-based massively parallel machines
Multicore shared-memory optimizations;Multithreaded adaptive parallel language runtime;MPI+OpenMP;High Performance Computing (HPC);Load balancing;Parallel programming;Molecule dynamics simulation performance;Charm++
Mei, Chao
关键词: Multicore shared-memory optimizations; Multithreaded adaptive parallel language runtime; MPI+OpenMP; High Performance Computing (HPC); Load balancing; Parallel programming; Molecule dynamics simulation performance; Charm++;
Others : https://www.ideals.illinois.edu/bitstream/handle/2142/34238/Mei_Chao.pdf?sequence=1&isAllowed=y
美国\|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF


	文献评价指标
	下载次数：20次	浏览次数：15次

【 摘 要 】

【 预 览 】

【摘要】

【预览】