科技报告详细信息
Collective Memory Transfers for Multi-Core Chips
Michelogiannakis, George ; Williams, Alexander ; Shalf, John
关键词: DRAM;    access stream;    stencils;    memory bandwidth;    collective transfers;   
DOI  :  10.2172/1164908
RP-ID  :  LBNL-6485E
PID  :  OSTI ID: 1164908
学科分类:数学(综合)
美国|英语
来源: SciTech Connect
PDF
【 摘 要 】

Future performance improvements for microprocessors have shifted from clock frequency scaling towards increases in on-chip parallelism. Performance improvements for a wide variety of parallel applications require domain-decomposition of data arrays from a contiguous arrangement in memory to a tiled layout for on-chip L1 data caches and scratchpads. How- ever, DRAM performance suffers under the non-streaming access patterns generated by many independent cores. We propose collective memory scheduling (CMS) that actively takes control of collective memory transfers such that requests arrive in a sequential and predictable fashion to the memory controller. CMS uses the hierarchically tiled arrays formal- ism to compactly express collective operations, which greatly improves programmability over conventional prefetch or list- DMA approaches. CMS reduces application execution time by up to 32% and DRAM read power by 2.2??, compared to a baseline DMA architecture such as STI Cell.

【 预 览 】
附件列表
Files Size Format View
RO201704180000515LZ 1389KB PDF download
  文献评价指标  
  下载次数:24次 浏览次数:71次