首页> 外文会议>2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks >Low-cost program-level detectors for reducing silent data corruptions
【24h】

Low-cost program-level detectors for reducing silent data corruptions

机译:低成本程序级检测器,用于减少静默数据损坏

获取原文
获取原文并翻译 | 示例

摘要

With technology scaling, transient faults are becoming an increasing threat to hardware reliability. Commodity systems must be made resilient to these in-field faults through very low-cost resiliency solutions. Software-level symptom detection techniques have emerged as promising low-cost and effective solutions. While the current user-visible Silent Data Corruption (SDC) rates for these techniques is relatively low, eliminating or significantly lowering the SDC rate is crucial for these solutions to become practically successful. Identifying and understanding program sections that cause SDCs is crucial to reducing (or eliminating) SDCs in a cost effective manner. This paper provides a detailed analysis of code sections that produce over 90% of SDCs for six applications we studied. This analysis facilitated the development of program-level detectors that catch errors in quantities that are either accumulated or active for a long duration, amortizing the detection costs. These low-cost detectors significantly reduce the dependency on redundancy-based techniques and provide more practical and flexible choice points on the performance vs. reliability trade-off curve. For example, for an average of 90%, 99%, or 100% reduction of the baseline SDC rate, the average execution overheads of our approach versus redundancy alone are respectively 12% vs. 30%, 19% vs. 43%, and 27% vs. 51%.
机译:随着技术的扩展,瞬态故障正日益成为对硬件可靠性的威胁。必须通过成本非常低的弹性解决方案使商品系统对这些现场故障具有弹性。软件级症状检测技术已成为有前途的低成本有效解决方案。尽管这些技术的当前用户可见的静默数据损坏(SDC)速率相对较低,但消除或显着降低SDC速率对于这些解决方案在实践中取得成功至关重要。识别和理解导致SDC的计划部分对于以经济有效的方式减少(或消除)SDC至关重要。本文提供了对代码段的详细分析,这些代码段可为我们研究的六个应用程序产生超过90%的SDC。这种分析促进了程序级检测器的开发,该检测器可以捕获长期累积或激活的数量错误,从而摊销检测成本。这些低成本的检测器大大降低了对基于冗余的技术的依赖,并在性能与可靠性的权衡曲线上提供了更多实用且灵活的选择点。例如,对于基准SDC速率平均降低90%,99%或100%,我们的方法与冗余相比的平均执行开销分别为12%,30%,19%和43%,以及27%和51%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号