...
首页> 外文期刊>IEEE Transactions on Computers >An architecture for tolerating processor failures in shared-memory multiprocessors
【24h】

An architecture for tolerating processor failures in shared-memory multiprocessors

机译:容忍共享内存多处理器中的处理器故障的体系结构

获取原文
获取原文并翻译 | 示例
           

摘要

This paper focuses on the problem of fault tolerance in shared memory multiprocessors, and describes an architecture designed for transparently tolerating processor failures. The Recoverable Shared Memory (RSM) is the novel component of this architecture, providing a hardware supported backward error recovery mechanism which minimizes the propagation of recovery when a processor fails. The RSM permits a shared memory multiprocessor to be constructed using standard caches and cache coherence protocols, and does not require any changes to be made to applications software. The performance of the recovery scheme supported by the RSM is evaluated and compared with other schemes that have been proposed for fault tolerant shared memory multiprocessors. The performance study has been conducted by simulation using address traces collected from real parallel applications.
机译:本文着重讨论共享内存多处理器中的容错问题,并描述了一种旨在透明地容忍处理器故障的体系结构。可恢复共享内存(RSM)是此体系结构的新颖组件,提供了硬件支持的向后错误恢复机制,该机制可在处理器出现故障时最大程度地减少恢复传播。 RSM允许使用标准缓存和缓存一致性协议构造共享内存多处理器,并且不需要对应用程序软件进行任何更改。对RSM支持的恢复方案的性能进行了评估,并与为容错共享内存多处理器提出的其他方案进行了比较。通过使用从真正的并行应用程序收集的地址跟踪进行仿真,可以进行性能研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号