Multi-core architectures are becoming more common and core counts continue to increase.There are six- and eight-core chips currently in production, such as Intel Gulftown, and many-core chips with dozens of cores, such as the Intel Teraflops 80-core chip, are projected in the next five years.However, adding more cores often does not improve the performance of applications.It would be desirable to take advantage of the multi-core environment to speed up parallel discrete event simulation.The current bottleneck for many parallel simulations is time synchronization.This is especially true for simulations of wireless networks and on-chip networks, which have low lookahead.Message passing is also a common simulation bottleneck.In order to address the issue of time synchronization, we have designed hardware at a functional level that performs the time synchronization for parallel discrete event simulation asynchronously and in just a few clock cycles, eliminating the need for global communication with message passing or lock contention for shared memory.This hardware, the Global Synchronization Unit, consists of 3 register files, each the size of the number of cores, and is accessed using 5 new atomic instructions.In order to reduce the simulation overhead from message passing, we have also designed two independent pieces of hardware at a functional level, the Atomic Shared Heap and Atomic Message Passing, which can be used to perform lock-free, zero-copy message passing on a multi-core system.The impact of these specialized hardware units on the performance of parallel discrete event simulation is assessed and compared to traditional shared-memory techniques.
【 预 览 】
附件列表
Files
Size
Format
View
Hardware acceleration for conservative parallel discrete event simulation on multi-core systems