学位论文详细信息
Energy-efficient latency tolerance for 1000-core data parallel processors with decoupled strands
Parallel Processing;Data-parallel;Graphics processing unit (GPU);General-purpose computing on graphics processing units (GPGPU);manycore;latency tolerance;decoupled architecture;compiler technique;energy-efficiency;power-efficiency;high-performance;low power;low energy
Crago, Neal
关键词: Parallel Processing;    Data-parallel;    Graphics processing unit (GPU);    General-purpose computing on graphics processing units (GPGPU);    manycore;    latency tolerance;    decoupled architecture;    compiler technique;    energy-efficiency;    power-efficiency;    high-performance;    low power;    low energy;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/34589/Crago_Neal.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

This dissertation presents a novel decoupled latency tolerance technique for 1000-core data parallel processors. The approach focuses on developing instruction latency tolerance to improve performance for a single thread. The main idea behind the approach is to leverage the compiler to split the original thread into separate memory-accessing and memory-consuming instruction streams. The goal is to provide latency tolerance similar to high-performance techniques such as out-of-order execution while leveraging low hardware complexity similar to an in-order execution core.The research in this dissertation supports the following thesis:Pipeline stalls due to long exposed instruction latency are the main performance limiter for cached 1000-core data parallel processors. Leveraging natural decoupling of memory-access and memory-consumption, a serial thread of execution can be partitioned into strands providing energy-efficient latency tolerance.This dissertation motivates the need for latency tolerance in 1000-core data parallel processors and presents decoupled core architectures as an alternative to currently used techniques. This dissertation discusses the limitations of prior decoupled architectures, and proposes techniques to improve both latency tolerance and energy-efficiency. Finally, the success of the proposed decoupled architecture is demonstrated against other approaches by performing an exhaustive design space exploration of energy, area, and performance using high-fidelity performance and physical design models.

【 预 览 】
附件列表
Files Size Format View
Energy-efficient latency tolerance for 1000-core data parallel processors with decoupled strands 3260KB PDF download
  文献评价指标  
  下载次数:25次 浏览次数:53次