学位论文详细信息
Performance portability of parallel kernels on shared-memory systems
Performance Portability;OpenCL;CUDA;C++AMP
Stratton, John
关键词: Performance Portability;    OpenCL;    CUDA;    C++AMP;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/44383/John_Stratton.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

This work describes my solution to the performance portability problem: between CPUs and GPUs in particular, but laying the foundation for even broader performance portability support. I argue that the best approach is to use a language like OpenCL as a portable, low-level programming model with well-defined mechanisms for expressing multi-level parallelism and locality.That low-level program representation can be supported with architecture-specific compilers, runtimes, and libraries to target the application code to various platforms with high performance.High-level language designers or tool developers could then target this single, low-level programming and parallelism model as a portable, high-performance intermediate program representation.To demonstrate the feasibility of this approach, I show how one would design a good CPU implementation of OpenCL given that the programs are written according to the current high-level GPU vendor optimization guidelines.Programs written in such a way already meet the criteria of good GPU performance, and in this work, I show that those same programs on a CPU platform implemented according to my proposals can out-perform an OpenMP implementation of the same algorithm on the same system.

【 预 览 】
附件列表
Files Size Format View
Performance portability of parallel kernels on shared-memory systems 1658KB PDF download
  文献评价指标  
  下载次数:66次 浏览次数:18次