Soft Error Resilience in Chip Multiprocessor Cache using a Markov Model Based Re-usability Predictor

机译：使用基于马尔可夫模型的可重用性预测器的芯片多处理器缓存中的软错误恢复能力

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Power consumption in the dense chip multiprocessors restricts the operational life of the battery operated devices. Voltage scaling for ensuring longer battery life exhibits hard and soft errors in cache. Hard faults, early detectable at boot time, are tractable but soft errors, being unpredictable in nature, are indeed hard to handle. Error correcting codes for soft error mitigation perform effectively for single event upsets (SEUs) but future technology nodes are anticipated to have significant multi bit upsets (MBUs) due to miniaturization which demands additional system level protection. Several redundancy based schemes have been proposed but none of them reported any scalable solution for shared NUCA cache in multicore and suffered from either low coverage due to partial redundancy or performance degradation due to capacity loss in full coverage. This work addresses both these issues by ensuring full error coverage with minimum cache bypass. Errors are detected using simple error detection technique like CRC and the dirty as well as reusable clean words are replicated either locally in same tile or globally to remote tiles to ensure complete error coverage. Word re-usability is calculated using a Markov Chain based novel re-usability prediction mechanism by analysing the cache access pattern. An invalidation strategy is applied that invalidates non-reusable words for reducing the vulnerable time to soft errors. A re-usability aware replacement policy is also designed that replaces the line with lowest re-usability, calculated at line level. The proposed technique has been evaluated in Multi2Sim 5.0 simulation framework with the benchmark programs in SPEC CPU2000 suite. The results indicate on average 83.33% decrease of vulnerability for integer benchmarks and 87.09% for floating point benchmarks, with full multi-bit error coverage, at the cost of 4.99% area, 5.54% dynamic power and 6.55% leakage power overheads with negligible performance penalty.

机译：密集芯片多处理器中的功耗限制了电池供电设备的使用寿命。用于确保更长电池寿命的电压缩放会在缓存中出现硬错误和软错误。硬故障（在启动时可以尽早检测到）很容易处理，但是软错误本质上是无法预测的，因此确实很难处理。用于软错误缓解的纠错码对于单事件翻转（SEU）有效执行，但是由于小型化要求更高的系统级保护，因此预计未来的技术节点将具有显着的多比特翻转（MBU）。已经提出了几种基于冗余的方案，但是没有一个方案报告了多核共享NUCA缓存的任何可扩展解决方案，并且由于部分冗余而导致覆盖率低，或者由于完全覆盖中的容量损失而导致性能下降。这项工作通过确保完全的错误覆盖以及最少的缓存绕过而解决了这两个问题。使用简单的错误检测技术（如CRC）可以检测到错误，并且脏数据以及可重复使用的干净字可以在同一图块中本地复制，也可以全局复制到远程图块中，以确保完全覆盖错误。通过分析缓存访问模式，使用基于马尔可夫链的新型可重用性预测机制来计算单词可重用性。应用了一种无效策略，该策略会使不可重用的单词无效，以减少易受软错误攻击的时间。还设计了一种可重用性感知的替换策略，该策略将按线路级别计算的可重用性最低的线路进行替换。在SPEC CPU2000套件中的基准程序下，已在Multi2Sim 5.0仿真框架中对提出的技术进行了评估。结果表明，具有完整的多位错误覆盖，整数基准测试的漏洞平均减少了83.33％，浮点基准测试的漏洞平均减少了87.09％，其代价是4.99％的面积，5.54％的动态功耗和6.55％的泄漏功耗开销，而性能却可以忽略不计。惩罚。

著录项

来源
《International conference on computer design》|2019年|468-476|共9页
会议地点
作者
Avishek Choudhury; Biplab K Sikdar;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
cache storage; error correction codes; error detection; fault tolerant computing; Markov processes; microprocessor chips; multiprocessing systems; program diagnostics; radiation hardening (electronics); redundancy; software reusability;

机译：高速缓存存储;纠错码;错误检测;容错计算;马尔可夫过程;微处理器芯片;多处理系统;程序诊断;辐射硬化（电子）;冗余;软件可重用性;

相似文献

外文文献
中文文献
专利

1. Modeling Remapping Based Fault Tolerance Techniques for Chip Multiprocessor Cache with Design Space Exploration [J] . Journal of Electronic Testing: Theory and Applications: Theory and Applications . 2020,第1期

机译：基于芯片多处理器缓存的基于复活的模型复制超容差技术与设计空间探索
2. A generic architecture model based- methodology for an efficient design of hardware/software application-specific multiprocessor System-on-Chip [J] . Nacer-Eddine ZERGAINOH, Amer BAGHDADI, Ahmed Amine JERRAYA Annales des Telecommunications . 2004,第7a8期

机译：基于通用体系结构模型的方法，用于高效设计硬件/软件专用多处理器片上系统
3. Predicting resilience in retailing using grey theory and moving probability based Markov models [J] . Rajesh R., Agariya Arun Kumar, Rajendran Chandrasekharan Journal of retailing and consumer services . 2021,第Sepa期

机译：利用灰色理论及基于移动概率的马尔可夫模型预测零售恢复力
4. Soft Error Resilience in Chip Multiprocessor Cache using a Markov Model Based Re-usability Predictor [C] . Avishek Choudhury, Biplab K Sikdar International conference on computer design . 2019

机译：使用基于Markov模型的可重新可用性预测器，芯片多处理器缓存中的软错误恢复
5. Spatiotemporal capacity management for the last level caches of chip multiprocessors. [D] . Zhan, Dongyuan. 2012

机译：芯片多处理器最后一级缓存的时空容量管理。
6. A Bayesian hidden Markov model for motif discovery through joint modeling of genomic sequence and ChIP-chip data [O] . Jonathan A. L. Gelfond, Mayetri Gupta, Joseph G. Ibrahim -1

机译：通过基因组序列和芯片芯片数据的联合建模贝叶斯隐马尔可夫模型
7. Predicting Mean Service Execution Times of Software Components Based on Markov Models [O] . 2008

机译：基于马尔可夫模型的软件构件平均服务执行时间预测
8. Performance of Cache-Based Error Recovery in Multiprocessors [R] . Janssens, B., Fuchs, W. F. 1992

机译：多处理器中基于缓存的错误恢复的性能

Soft Error Resilience in Chip Multiprocessor Cache using a Markov Model Based Re-usability Predictor

摘要

著录项

相似文献

相关主题

期刊订阅