学位论文详细信息
Advanced management techniques for many-core communication systems
QA75 Electronic computers. Computer science;T Technology (General)
Al Khanjari, Sharifa ; Vanderbauwhede, Wim
University:University of Glasgow
Department:School of Computing Science
关键词: Many-core, network-on-chip, distributed-memory architecture , shared-memory architecture, quadtree.;   
Others  :  http://theses.gla.ac.uk/7952/1/2017AlKhanjariphd.pdf
来源: University of Glasgow
PDF
【 摘 要 】
The way computer processors are built is changing. Nowadays, computer processor performance is increased by adding more processing cores on a single chip instead of making processors larger and faster. The traditional approach is no longer viable, due to limits in transistor scaling. Both industry and academia agree that scaling the number of processing cores to hundreds or thousands on a single chip is the only way to scale computer processor performance from now on. Consequently, the performance of these future many-core systems with thousands of cores will heavily depend on the Network-on-Chip (NoC) architecture to provide scalable communication. Therefore, as the number of cores increases the locality will only become more important. Communication locality is essential to reduce latency and increase performance. Many-core systems should be designed such that cores communicate mainly to the neighbouring cores, in order to minimise the communication cost. We investigate the network performance of different topologies using the ITRS physical data for the year 2023. For this reason, we propose abstract synthetic traffic generation models to explore the locality behaviour in many-core NoC systems. Using the synthetic traffic models - group clustering model and ring clustering model - traffic distance metrics may be adjusted with locality parameters. We choose two many-core NoC architectures-distributed memory architecture and shared memory architecture-to examine whether enforcing locality on different architectures may have a diverse effect on the network performance of different topologies. Distributed memory architecture uses the message passing method of communication to communicate between cores. Our results show that the degree of locality and the clustering model strongly affect the performance of the network. Scale-invariant topologies, such as the fat quadtree, perform worse than flat ones because the reduced hop count is outweighed by the longer wire delays.In shared memory architecture, threads communicate with each other by storing data in shared cache lines. We design a hierarchical cache model that benefits from communication locality because many-core cache hierarchy that fails to exploit locality may end up having more cores delayed, thereby decreasing the network performance. Our results show that the locality model of thread placement and the distance of placing them significantly affect the NoC performance. Furthermore, they show that scale-invariant topologies perform better than flat topologies. Then, we demonstrate that implementing directory-based cache coherency has only a small overhead on the cache size. Using cache coherency protocol in our proposed hierarchical cache model, we show that network performance decreases only slightly. Hence, cache coherency scales, and it is possible to have shared memory architecture with thousands of cores.
【 预 览 】
附件列表
Files Size Format View
Advanced management techniques for many-core communication systems 8599KB PDF download
  文献评价指标  
  下载次数:8次 浏览次数:37次