学位论文详细信息
Performance analysis and optimization of a CFD application
Performance optimization;computational fluid dynamics (CFD);Intel Xeon Phi
Zhang, Wentao ; Bodony ; Daniel J.
关键词: Performance optimization;    computational fluid dynamics (CFD);    Intel Xeon Phi;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/88072/ZHANG-THESIS-2015.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

This thesis documents the analysis and optimization of a high-order finite difference computational fluid dynamics (CFD) application (PlasComCM). Performance bottlenecks were identified using performance tools and hardware counters. The performance analysis of PlasComCM showed that the quantity of memory accesses and the lack of vectorization inhibited optimal serial performance on a x86-based CPU. Optimizing techniques including pointer dereferencing, loop transformation and Fortran SIMD directives were applied to the top 10 time-consuming subroutines to remove obstacles to vectorization and to improve the serial performance. Details about the optimization techniques are presented and their impacts on performance are discussed. A 63% reduction in the number of memory loads and a serial speedup of 2.02 were obtained from the optimization efforts. Using the optimized serial program as the codebase, further investigation was focused on the analysis and optimization of parallel heterogeneous execution on a dual-socket node fitted with an Intel Xeon Phi MIC card. To reduce the overhead created by host-accelerator copies in heterogeneous execution, the data layout of the halo region was changed from a ''star'' shape to a ''box'' shape to agglomerate small communications and to create a larger work granularity. Preliminary results of running PlasComCM on Intel Xeon Phis in symmetric mode are also presented, where it was found that a 20% reduction in wall-clock time can be obtained for particular problem size when using 2 SandyBridge sockets + 1 Phi card vs 2 SandyBridge sockets.

【 预 览 】
附件列表
Files Size Format View
Performance analysis and optimization of a CFD application 5330KB PDF download
  文献评价指标  
  下载次数:10次 浏览次数:19次