学位论文

【摘要】

Scalability of future wide-issue processor designs is severely hampered by theuse of centralized resources such as register files, memories and interconnectnetworks.While the use of centralized resources eases both hardware design andcompiler code generation efforts, they can become performance bottlenecks asaccess latencies increase with larger designs.The natural solution to thisproblem is to adapt the architecture to use smaller, decentralized resources.Decentralized architectures use smaller, faster components and exploitdistributed instruction-level parallelism across the resources.A multiclusterarchitecture is an example of such a decentralized processor, where subsets ofsmaller register files, functional units, and memories are grouped together in atightly coupled unit, forming a cluster.These clusters can then be replicatedand connected together to form a scalable, high-performance architecture.The main difficulty with decentralized architectures resides in compiler codegeneration.In a centralized Very Long Instruction Word (VLIW) processor, thecompiler must statically schedule each operation to both a functional unit and atime slot for execution.In contrast, for a decentralized multicluster VLIW,the compiler must consider the additional effects of cluster assignment,recognizing that communication between clusters will result in a delay penalty.In addition, if the multicluster processor also has partitioned data memories,the compiler has the additional task of assigning data objects to theirrespective memories.Each decision, of cluster, functional unit, memory, andtime slot, are highly interrelated and can have dramatic effects on the bestchoice for every other decision.This dissertation addresses the issues of extracting and exploiting inherentparallelism across decentralized resources through compiler analysis and codegeneration techniques.First, a static analysis technique to partition dataobjects is presented, which maps data objects to decentralized scratchpadmemories.Second, an alternative profile-guided technique for memorypartitioning is presented which can effectively map data access operations ontodistributed caches.Finally, a detailed, resource-aware partitioning algorithmis presented which can effectively split computation operations of anapplication across a set of processing elements.These partitioners work intandem to create a high-performance partition assignment of both memory andcomputation operations for decentralized multicluster or multicore processors.

【预览】

附件列表
Files	Size	Format	View
Cooperative Data and Computation Partitioning for Decentralized Architectures.	2369KB	PDF	download


Cooperative Data and Computation Partitioning for Decentralized Architectures.
Compiler Code Generation;Multicluster Compilation;Decentralized Architectures;Data and Code Partitioning;Automatic Parallelization;Computer Science;Engineering;Computer Science & Engineering
Chu, Michael L.Reinhardt, Steven K. ;
University of Michigan
关键词: Compiler Code Generation; Multicluster Compilation; Decentralized Architectures; Data and Code Partitioning; Automatic Parallelization; Computer Science; Engineering; Computer Science & Engineering;
Others : https://deepblue.lib.umich.edu/bitstream/handle/2027.42/57649/mchu_1.pdf?sequence=2&isAllowed=y
瑞士\|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF


	文献评价指标
	下载次数：8次	浏览次数：29次

【 摘 要 】

【 预 览 】

【摘要】

【预览】