首页> 外文期刊>International Journal of High Performance Computing Applications >PROCESS FAULT TOLERANCE: SEMANTICS, DESIGN AND APPLICATIONS FOR HIGH PERFORMANCE COMPUTING
【24h】

PROCESS FAULT TOLERANCE: SEMANTICS, DESIGN AND APPLICATIONS FOR HIGH PERFORMANCE COMPUTING

机译:过程容错:高性能计算的外观,设计和应用

获取原文
获取原文并翻译 | 示例
           

摘要

With increasing numbers of processors on current machines, the probability for node or link failures is also increasing. Therefore, application-level fault tolerance is becoming more of an important issue for both end-users and the institutions running the machines. In this paper we present the semantics of a fault-tolerant version of the message passing interface (MPI), the de-facto standard for communication in scientific applications, which gives applications the possibility to recover from a node or link error and continue execution in a well-defined way. We present the architecture of fault-tolerant MPI, an implementation of MPI using the semantics presented above as well as benchmark results with various applications. An example of a fault-tolerant parallel equation solver, performance results as well as the time for recovering from a process failure are furthermore detailed.
机译:随着当前机器上处理器数量的增加,节点或链接故障的可能性也在增加。因此,对于最终用户和运行机器的机构而言,应用程序级的容错能力都变得越来越重要。在本文中,我们介绍了消息传递接口(MPI)的容错版本的语义,这是科学应用中通信的实际标准,它使应用程序有可能从节点或链接错误中恢复并继续执行。明确定义的方式我们介绍了容错MPI的体系结构,使用上述语义的MPI实现以及各种应用程序的基准测试结果。此外,还将详细介绍一个容错并行方程求解器的示例,性能结果以及从过程故障中恢复的时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号