首页> 外文会议>International conference on computer design >Soft Error Resilience in Chip Multiprocessor Cache using a Markov Model Based Re-usability Predictor
【24h】

Soft Error Resilience in Chip Multiprocessor Cache using a Markov Model Based Re-usability Predictor

机译:使用基于马尔可夫模型的可重用性预测器的芯片多处理器缓存中的软错误恢复能力

获取原文

摘要

Power consumption in the dense chip multiprocessors restricts the operational life of the battery operated devices. Voltage scaling for ensuring longer battery life exhibits hard and soft errors in cache. Hard faults, early detectable at boot time, are tractable but soft errors, being unpredictable in nature, are indeed hard to handle. Error correcting codes for soft error mitigation perform effectively for single event upsets (SEUs) but future technology nodes are anticipated to have significant multi bit upsets (MBUs) due to miniaturization which demands additional system level protection. Several redundancy based schemes have been proposed but none of them reported any scalable solution for shared NUCA cache in multicore and suffered from either low coverage due to partial redundancy or performance degradation due to capacity loss in full coverage. This work addresses both these issues by ensuring full error coverage with minimum cache bypass. Errors are detected using simple error detection technique like CRC and the dirty as well as reusable clean words are replicated either locally in same tile or globally to remote tiles to ensure complete error coverage. Word re-usability is calculated using a Markov Chain based novel re-usability prediction mechanism by analysing the cache access pattern. An invalidation strategy is applied that invalidates non-reusable words for reducing the vulnerable time to soft errors. A re-usability aware replacement policy is also designed that replaces the line with lowest re-usability, calculated at line level. The proposed technique has been evaluated in Multi2Sim 5.0 simulation framework with the benchmark programs in SPEC CPU2000 suite. The results indicate on average 83.33% decrease of vulnerability for integer benchmarks and 87.09% for floating point benchmarks, with full multi-bit error coverage, at the cost of 4.99% area, 5.54% dynamic power and 6.55% leakage power overheads with negligible performance penalty.
机译:密集芯片多处理器中的功耗限制了电池供电设备的使用寿命。用于确保更长电池寿命的电压缩放会在缓存中出现硬错误和软错误。硬故障(在启动时可以尽早检测到)很容易处理,但是软错误本质上是无法预测的,因此确实很难处理。用于软错误缓解的纠错码对于单事件翻转(SEU)有效执行,但是由于小型化要求更高的系统级保护,因此预计未来的技术节点将具有显着的多比特翻转(MBU)。已经提出了几种基于冗余的方案,但是没有一个方案报告了多核共享NUCA缓存的任何可扩展解决方案,并且由于部分冗余而导致覆盖率低,或者由于完全覆盖中的容量损失而导致性能下降。这项工作通过确保完全的错误覆盖以及最少的缓存绕过而解决了这两个问题。使用简单的错误检测技术(如CRC)可以检测到错误,并且脏数据以及可重复使用的干净字可以在同一图块中本地复制,也可以全局复制到远程图块中,以确保完全覆盖错误。通过分析缓存访问模式,使用基于马尔可夫链的新型可重用性预测机制来计算单词可重用性。应用了一种无效策略,该策略会使不可重用的单词无效,以减少易受软错误攻击的时间。还设计了一种可重用性感知的替换策略,该策略将按线路级别计算的可重用性最低的线路进行替换。在SPEC CPU2000套件中的基准程序下,已在Multi2Sim 5.0仿真框架中对提出的技术进行了评估。结果表明,具有完整的多位错误覆盖,整数基准测试的漏洞平均减少了83.33%,浮点基准测试的漏洞平均减少了87.09%,其代价是4.99%的面积,5.54%的动态功耗和6.55%的泄漏功耗开销,而性能却可以忽略不计。惩罚。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号