学位论文详细信息
Exploiting software information for an efficient memory hierarchy
Computer architecture;cache coherence;multicores;heterogeneous systems;protocol verification;memory hierarchy
Komuravelli, Rakesh
关键词: Computer architecture;    cache coherence;    multicores;    heterogeneous systems;    protocol verification;    memory hierarchy;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/72791/Rakesh_Komuravelli.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Power consumption is one of the most important factors in the design of today’s processor chips. Multicore and heterogeneous systems have emerged to address the rising power concerns. Since the memory hierarchy is becoming one of the major consumers of the on-chip power budget in these systems, designing an efficient memory hierarchy is critical to future systems. We identify three sources of inefficiencies in memory hierarchies of today’s systems: (a) coherence, (b) data communication, and (c) data storage. This thesis takes the stand that many of these inefficiencies are a result of today’s software-agnostic hardware design. There is a lot of information in the software that can be exploited to build an efficient memory hierarchy. This thesis focuses on identifying some of the inefficiencies related to each of the above three sources, and proposing various techniques to mitigate them by exploiting information from the software.First, we focus on inefficiencies related to coherence and communication. Today’s hardware based directory coherence protocols are extremely complex and incur unnecessary overheads for sending invalidation messages and maintaining sharer lists. We propose DeNovo, a hardware-software co-designed protocol, to address these issues for a class of programs that are deterministic. DeNovo assumes a disciplined programming environment and exploits features such as structured parallel control, data-race-freedom, and software information about data access patterns to build a system that is simple, extensible, and performance-efficient compared to today’s protocols. We also extend DeNovo to add two optimizations to address the inefficiencies related to data communication, specifically, aimed at reducing the unnecessary on-chip network traffic. We show that adding these two optimizations did not only result in addition of zero new states (or transient states) to the protocol but also provided performance and energy gains to the system, thus validating the extensibility of the DeNovo protocol. Together with the two communication optimizations DeNovo reduces the memory stall time by 32% and the network traffic by 36% (resulting in direct savings in energy) on average compared to a state-of-the-art implementation of the MESI protocol for the applications studied.Next we address the inefficiencies related to data storage. Caches and scratchpads are two popular organizations for storing data in today’s systems but they both have inefficiencies. Caches are power-hungry incurring expensive tag lookups and scratchpads incur unnecessary data movement as they are only locally visible. To address these problems, we propose a new memory organization, stash, which has the best of both cache and scratchpad organizations. Stash is a globally visible unit and its functionality is independent of the coherence protocol employed. In our implementation, we extend DeNovo to provide coherence for stash. Compared to a baseline configuration that has both scratchpad and cache accesses, we show that the stash configuration (in which scratchpad and cache accesses are converted to stash accesses), even with today’s applications that do not fully exploit stash, reduces the execution time by 10% and the energy consumption by 14% on average.Overall, this thesis shows that a software-aware hardware design can effectively address many of the inefficiencies found in today’s software oblivious memory hierarchies.

【 预 览 】
附件列表
Files Size Format View
Exploiting software information for an efficient memory hierarchy 3048KB PDF download
  文献评价指标  
  下载次数:14次 浏览次数:19次