学位论文

【摘要】

We present Outrider, an architecture for throughput-oriented processors that exploits intra-thread memory-level parallelism (MLP) to improve performance efficiency on highly threaded workloads. Outrider enables a single thread of execution to be presented to thearchitecture as multiple decoupled instruction streams, consisting of either memory accessing or memory consuming instructions. The key insight is that by decoupling the instruction streams, the processor pipeline can expose MLP in away similar to out-of-order designs while relying on a low-complexity in-order micro-architecture.Instead of adding more threads as is done in modern GPUs, Outrider can expose the same MLP with fewer threads and reduced contention for resources shared among threads.We demonstrate that Outrider can outperform single-threaded cores by 23-131% and a 4-way simultaneous multi-threaded core by up to 87% in data parallel applications in a 1024-core system. Outrider achieves these performance gains without incurring the overhead of additional hardware thread contexts, which results in improved efficiency compared to amulti-threaded core.

【预览】

附件列表
Files	Size	Format	View
Efficient memory-level parallelism extraction with decoupled strands	589KB	PDF	download


Efficient memory-level parallelism extraction with decoupled strands
Memory Latency Tolerance;Accelerators;Processors;Decoupled
Crago, Neal ; Patel ; Sanjay J.
关键词: Memory Latency Tolerance; Accelerators; Processors; Decoupled;
Others : https://www.ideals.illinois.edu/bitstream/handle/2142/24372/Crago_Neal.pdf?sequence=1&isAllowed=y
美国\|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF


	文献评价指标
	下载次数：44次	浏览次数：14次

【 摘 要 】

【 预 览 】

【摘要】

【预览】