首页> 外文会议>Parallel and Distributed Computing and Networks >Using Checkpointing for Fault Tolerance and Parallel Program Debugging
【24h】

Using Checkpointing for Fault Tolerance and Parallel Program Debugging

机译:使用检查点进行容错和并行程序调试

获取原文
获取原文并翻译 | 示例

摘要

Checkpointing and rollback recovery are widely used in fault-tolerant computing. These techniques allow a running program to be restarted from an earlier state of its execution, when a failure suddenly happens. The idea is to reduce the amount of lost work. Besides fault tolerance, such techniques are also used for cyclic debugging, where they intend to reduce the waiting time in repeated debugging cycles. However, compared to fault tolerance checkpointing, only few methods are available for debugging. At the same time, some strict requirements of debugging prohibit most methods used in fault tolerance. Therefore, a comparison of requirements when using checkpointing in both areas is important and useful to develop applicable methods for parallel program debugging. The paper will discuss this problem and show suitable methods for debugging, which enable cyclic debugging to be used for long-running programs while preserving a small waiting time.
机译:检查点和回滚恢复广泛用于容错计算中。当突然发生故障时,这些技术允许正在运行的程序从执行的较早状态重新启动。这个想法是为了减少丢失的工作量。除了容错之外,此类技术还用于循环调试,它们旨在减少重复调试周期中的等待时间。但是,与容错检查点相比,只有很少的方法可用于调试。同时,一些严格的调试要求禁止大多数用于容错的方法。因此,在两个区域都使用检查点时进行需求比较对于开发适用于并行程序调试的方法很重要且很有用。本文将讨论此问题,并显示合适的调试方法,这些方法使循环调试可用于长时间运行的程序,同时又能节省少量等待时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号