学位论文

【摘要】

This work describes my solution to the performance portability problem: between CPUs and GPUs in particular, but laying the foundation for even broader performance portability support. I argue that the best approach is to use a language like OpenCL as a portable, low-level programming model with well-defined mechanisms for expressing multi-level parallelism and locality.That low-level program representation can be supported with architecture-specific compilers, runtimes, and libraries to target the application code to various platforms with high performance.High-level language designers or tool developers could then target this single, low-level programming and parallelism model as a portable, high-performance intermediate program representation.To demonstrate the feasibility of this approach, I show how one would design a good CPU implementation of OpenCL given that the programs are written according to the current high-level GPU vendor optimization guidelines.Programs written in such a way already meet the criteria of good GPU performance, and in this work, I show that those same programs on a CPU platform implemented according to my proposals can out-perform an OpenMP implementation of the same algorithm on the same system.

【预览】

附件列表
Files	Size	Format	View
Performance portability of parallel kernels on shared-memory systems	1658KB	PDF	download


Performance portability of parallel kernels on shared-memory systems
Performance Portability;OpenCL;CUDA;C++AMP
Stratton, John
关键词: Performance Portability; OpenCL; CUDA; C++AMP;
Others : https://www.ideals.illinois.edu/bitstream/handle/2142/44383/John_Stratton.pdf?sequence=1&isAllowed=y
美国\|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF


	文献评价指标
	下载次数：82次	浏览次数：19次

【 摘 要 】

【 预 览 】

【摘要】

【预览】