首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Failure Diagnosis for Distributed Systems Using Targeted Fault Injection
【24h】

Failure Diagnosis for Distributed Systems Using Targeted Fault Injection

机译:使用目标故障注入的分布式系统故障诊断

获取原文
获取原文并翻译 | 示例
           

摘要

This paper introduces a novel approach to automating failure diagnostics in distributed systems by combining fault injection and data analytics. We use fault injection to populate the database of failures for a target distributed system. When a failure is reported from production environment, the database is queried to find “matched” failures generated by fault injections. Relying on the assumption that similar faults generate similar failures, we use information from the matched failures as hints to locate the actual root cause of the reported failures. In order to implement this approach, we introduce techniques for (i) reconstructing end-to-end execution flows of distributed software components, (ii) computing the similarity of the reconstructed flows, and (iii) performing precise fault injection at pre-specified executing points in distributed systems. We have evaluated our approach using an OpenStack cloud platform, a popular cloud infrastructure management system. Our experimental results showed that this approach is effective in determining the root causes, e.g., fault types and affected components, for 71-100 percent of tested failures. Furthermore, it can provide fault locations close to actual ones and can easily be used to find and fix actual root causes. We have also validated this technique by localizing real bugs that occurred in OpenStack.
机译:本文介绍了一种通过结合故障注入和数据分析来自动化分布式系统中故障诊断的新颖方法。我们使用故障注入来填充目标分布式系统的故障数据库。当从生产环境报告故障时,将查询数据库以查找由故障注入生成的“匹配”故障。基于类似故障产生类似故障的假设,我们使用来自匹配故障的信息作为提示来定位所报告故障的实际根本原因。为了实现此方法,我们介绍了以下技术:(i)重构分布式软件组件的端到端执行流程,(ii)计算重构流程的相似度,以及(iii)在预先指定的位置执行精确的故障注入分布式系统中的执行点。我们已经使用流行的云基础架构管理系统OpenStack云平台评估了我们的方法。我们的实验结果表明,对于71%至100%的测试故障,此方法可有效确定根本原因,例如故障类型和受影响的组件。此外,它可以提供接近实际故障点的位置,并且可以轻松地用于查找和修复实际根本原因。我们还通过对OpenStack中发生的实际错误进行了本地化来验证了该技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号