学位论文详细信息
Enhancing dependence-based prefetching for better timeliness, coverage, and practicality
memory wall;data prefetch;cache
Lim, Chungsoo ; Eric Rotenberg, Committee Member,Vincent W. Freeh, Committee Member,Gregory T. Byrd, Committee Chair,Yan Solihin, Committee Member,Lim, Chungsoo ; Eric Rotenberg ; Committee Member ; Vincent W. Freeh ; Committee Member ; Gregory T. Byrd ; Committee Chair ; Yan Solihin ; Committee Member
University:North Carolina State University
关键词: memory wall;    data prefetch;    cache;   
Others  :  https://repository.lib.ncsu.edu/bitstream/handle/1840.16/3687/etd.pdf?sequence=1&isAllowed=y
美国|英语
来源: null
PDF
【 摘 要 】

This dissertation proposes an architecture that efficiently prefetches for loads whose effective addresses are dependent on previously-loaded values (dependence-based prefetching). For timely prefetches, the memory access patterns of producing loads are dynamically learned. These patterns (such as strides) are used to prefetch well ahead of the consumer load. Different prefetching algorithms are used for different patterns, and different algorithms are combined on top of dependence-based prefetching scheme. The proposed prefetcher is placed near the processor core and targets L1 cache misses, because removing L1 cache misses has greater performance potential than removing L2 cache misses.For higher coverage, dependence-based prefetching is extended by augmenting the dependence relation identification mechanism, to include not only direct relations (y = x) but also linear relations (y = ax + b) between producer (x) and consumer (y) loads. With these additional relations, higher performance, measured in instructions per cycle (IPC), can be obtained.We also show that the space overhead for storing the patterns can be reduced by leveraging chain prefetching and focusing on frequently missed loads. We specifically examine how to capture pointers in linked data structures (LDS) with pure hardware implementation. We find that the space requirement can be reduced, compared to previous work, if we selectively record patterns. Still, to make the prefetching scheme generally applicable, a large table is required for storing pointers. So we take one step further in order to eliminate the additional storage need for pointers. We propose a mechanism that utilizes a portion of the L2 cache for storing the pointers. With this mechanism, impractically huge on-chip storage for pointers, which is sometimes a total waste of silicon, can be removed. We show that storing the prefetch table in a partition of the L2 cache outperforms using the L2 cache conventionally for benchmarks that benefit from prefetching.

【 预 览 】
附件列表
Files Size Format View
Enhancing dependence-based prefetching for better timeliness, coverage, and practicality 3387KB PDF download
  文献评价指标  
  下载次数:12次 浏览次数:33次