学位论文详细信息
Efficient Performance Evaluation for Highly Multi-threaded Graphics Processors
GPU computing;performance evaluation;memory hierarchy;Graphics Processing Unit (GPU)
Sadeghi Baghsorkhi, Sara
关键词: GPU computing;    performance evaluation;    memory hierarchy;    Graphics Processing Unit (GPU);   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/26373/SadeghiBaghsorkhi_Sara.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】
With the emergence of highly multithreaded architectures, an effective performance monitoring system must reflect the interaction between a large number of concurrent events, and associate the overall effect of individual events and inefficiencies to the operations in the application source code. The state-of-the-art performance counters in highly multithreaded graphic processors currently do not provide this level of precision. Although fine-grained sampling of performance counters after each source-level operation could potentially achieve the desired precision, the high frequency of sampling required will likely cause too much distortion to the actual application behavior and make the sampled counter values inaccurate.In this thesis, I present a novel software-based approach for monitoring the memory hierarchy performance in highly multithreaded general-purpose graphics processors. The proposed analysis is based on memory traces collected for small snapshots of application execution. A trace-based memory hierarchy model with a Monte Carlo experimental methodology generates statistical bounds of performance measures in the presence of nonuniform thread interleaving and data sharing in a highly multithreaded execution environment. The statistical approach overcomes the classical problem of disturbed execution timing due to instrumentation. The approach scales well as I deploy a minimal sampling technique to reduce the trace generation overhead and model simulation time.The proposed scheme also keeps track of individual memory operations in the source code and can quantify the amount of their contribution to detrimental effects on memory system performance. A cross-validation of the model results shows close agreement with the values read from the hardware performance counters on an NVIDIA Tesla C2050. I later use the predicted memory hierarchy performance statistics in an analytical model to identify performance characteristics of a kernel and its expected execution time.To account for the systematic error present in the predictions, I approximate theerror function and express a range of potential true execution times for each predicted value.
【 预 览 】
附件列表
Files Size Format View
Efficient Performance Evaluation for Highly Multi-threaded Graphics Processors 1043KB PDF download
  文献评价指标  
  下载次数:7次 浏览次数:26次