首页> 外文会议>IEEE Symposium on Reliable Distributed Systems >Dynamic Checkpoint Architecture for Reliability Improvement on Distributed Frameworks
【24h】

Dynamic Checkpoint Architecture for Reliability Improvement on Distributed Frameworks

机译:动态检查点体系结构,可提高分布式框架的可靠性

获取原文

摘要

Fault tolerant mechanisms are essential to provide reliable feature for distributed systems. Checkpoint and Recovery is a widely used technique that consists on saving data states for a fast recovery in case of failure. On Apache Hadoop and Apache Spark - distributed high performance frameworks -, checkpoint aims to help on recovery steps after failures. However, wrong configuration of checkpoint attributes can degrade system performance and reliability, thus losing checkpoint purpose. This work proposes a dynamic architecture for checkpoint based on system monitoring and alerts. In order to avoid checkpoint problems on Hadoop and Spark, one implementation of dynamic mechanism is defined for each framework.
机译:容错机制对于为分布式系统提供可靠的功能至关重要。 Checkpoint and Recovery是一种广泛使用的技术,包括保存数据状态以在发生故障时快速恢复。在Apache Hadoop和Apache Spark(分布式高性能框架)上,检查点旨在帮助故障发生后恢复步骤。但是,检查点属性的错误配置会降低系统性能和可靠性,从而失去检查点的用途。这项工作提出了一种基于系统监视和警报的检查点动态架构。为了避免在Hadoop和Spark上出现检查点问题,为每个框架定义了一种动态机制的实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号