首页> 外文期刊>International Journal of Engineering Science and Technology >Low Overhead Checkpointing Protocols for Mobile Distributed Systems: A Comparative Study
【24h】

Low Overhead Checkpointing Protocols for Mobile Distributed Systems: A Comparative Study

机译:移动分布式系统的低开销检查点协议:一项比较研究

获取原文
           

摘要

In Mobile Distributed systems, we come across some issues like: mobility, low bandwidth of wireless channels and lack of stable storage on mobile nodes, disconnections, limited battery power and high failure rate of mobile nodes. Fault Tolerance Techniques enable systems to perform tasks in the presence of faults. The likelihood of faults grows as systems are becoming more complex and applications are requiring more resources, including execution speed, storage capacity and communication bandwidth. A checkpoint is a local state of a process saved on stable storage. In a distributed system, since the processes in the system do not share memory, a global state of the system is defined as a set of local states, one from each process. In case of a fault in distributed systems, checkpointing enables the execution of a program to be resumed from a previous consistent global state rather than resuming the execution from the beginning. In this way, the amount of useful processing lost because of the fault is significantly reduced. Checkpointing is an effective fault tolerant technique in distributed system as it avoids the domino effect and require minimum storage requirement. Most of the earlier coordinated checkpoint algorithms block their computation during checkpointing and forces minimum-process or non-blocking even though many of them may not be necessary or non-blocking minimum-process but takes useless checkpoints or reduced useless checkpoint but has higher synchronization message overhead or has high checkpoint request propagation time. In this paper, we present a survey of some checkpointing algorithms for distributed systems.
机译:在移动分布式系统中,我们遇到一些问题,例如:移动性,无线通道的低带宽以及移动节点上缺乏稳定的存储,断开连接,电池电量有限以及移动节点的故障率高。容错技术使系统能够在存在故障的情况下执行任务。随着系统变得越来越复杂,应用程序需要更多的资源,包括执行速度,存储容量和通信带宽,出现故障的可能性越来越大。检查点是保存在稳定存储中的进程的本地状态。在分布式系统中,由于系统中的进程不共享内存,因此系统的全局状态定义为一组本地状态,每个进程中的一个。在分布式系统出现故障的情况下,检查点可使程序的执行从先前的一致全局状态恢复,而不必从头开始恢复执行。以这种方式,由于故障而导致的有用处理的量被大大减少。检查点是分布式系统中的一种有效的容错技术,因为它避免了多米诺骨牌效应,并且需要最少的存储需求。大多数较早的协调检查点算法都会在检查点期间阻塞其计算,并强制执行最小进程或非阻塞,即使其中许多可能不是必需的或非阻塞最小进程,但它们会占用无用的检查点或减少无用的检查点,但具有更高的同步消息开销大或检查点请求传播时间长。在本文中,我们对分布式系统的一些检查点算法进行了概述。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号