期刊论文详细信息
ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS
Limit of Hardware Solutions for Self-Protecting Fault-Tolerant NoCs
Article
Louri, Ahmed1  Collet, Jacques2  Karanth, Avinash3 
[1]George Washington Univ, Dept Elect & Comp Engn, 800 22nd St NW,Room 5580, Washington, DC 20052 USA.
[2]Univ Paul Sabatier, Lab Anal & Architecture Syst, LAAS CNRS, 7 Ave Colonel Roche, F-31077 Toulouse 13, France.
[3]Ohio Univ, Sch Elect Engn & Comp Sci, 322D Stocker Ctr, Athens, OH 45701 USA.
关键词: Network-on-chips;    reliability;    built-in-self-test;    self-healing;    RELIABLE NETWORK;    REPAIR SCHEME;    FLUCTUATIONS;   
DOI  :  10.1145/3233986
来源: SCIE
PDF
【 摘 要 】
We study the ultimate limits of hardware solutions for the self-protection strategies against permanent faults in networks on chips (NoCs). NoCs reliability is improved by replacing each base router by an augmented router which includes extra protection circuitry. We compare the protection achieved by the self-test and self-protect (STAP) architectures to that of triple modular redundancy with voting (TMR). Two STAP architectures are considered. In the first one, a defective router self-disconnects from the network, while it self-heals in the second one. In practice, none of the considered architectures (STAP or TMR) can tolerate all the permanent faults, especially faults in the extra-circuitry for protection or voting, and consequently, there will always be some unidentified defective augmented routers which are going to transmit errors in an unpredictable manner. This study consists of tackling this fundamental problem. Specifically, we study and determine the average percentage of residual unidentified defective routers (UDRs) and their impact on the overall reliability of the NoC in light of self-protection strategies. Our study shows that TMR is the most efficient solution to limit the average percentage of UDRs when there are typically less than a 0.1 percent of defective base routers. However, TMR is also the most cost prohibitive and the least power efficient. Above 1% of defective base routers, the STAP approaches are more efficient although the protection efficiency decreases inexorably in the very defective technologies (e.g. when there is 10% or more of defective base routers). For instance, if the chip includes 10% of defective base routers, our study shows that there will remain on the average 1% of UDRs, which causes a major challenge for NoC reliability.
【 授权许可】

Free   

【 预 览 】
附件列表
Files Size Format View
RO202303094290273ZK.pdf 1194KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  文献评价指标  
  下载次数:2次 浏览次数:8次