【24h】

Dynamic Fault Tolerance in Distributed Simulation System

机译:分布式仿真系统中的动态容错

获取原文
获取原文并翻译 | 示例

摘要

Distributed simulation system is widely used for forecasting, decision-making and scientific computing. Multi-agent and Grid have been used as platform for simulation. In order to survive from software or hardware failures and guarantee successful rate during agent migrating, system must solve the fault tolerance problem. Classic fault tolerance technology like checkpoint and redundancy can be used for distributed simulation system, but is not efficient. We present a novel fault tolerance protocol which combines the causal message logging method and prime-backup technology. The proposed protocol uses iterative backup location scheme and adaptive update interval to reduce overhead and balance the cost of fault tolerance and recovery time. The protocol has characteristics of no orphan state, and do not need the survival agents to rollback. Most important is that the recovery scheme can tolerant concurrently failures, even the permanent failure of single node. Correctness of the protocol is proved and experiments show the protocol is efficient.
机译:分布式仿真系统被广泛用于预测,决策和科学计算。多主体和网格已用作仿真平台。为了在软件或硬件故障中生存并确保代理迁移期间的成功率,系统必须解决容错问题。经典的容错技术(例如检查点和冗余)可以用于分布式仿真系统,但效率不高。我们提出了一种新颖的容错协议,它结合了因果消息记录方法和原始备份技术。所提出的协议使用迭代备份定位方案和自适应更新间隔来减少开销并平衡容错和恢复时间的成本。该协议具有无孤立状态的特征,并且不需要生存代理进行回滚。最重要的是,恢复方案可以容忍并发故障,甚至是单节点的永久性故障。实验证明了该协议的正确性,实验表明该协议是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号