首页> 外文学位 >Increasing processor dependability in distributed shared-memory servers.

【24h】

Increasing processor dependability in distributed shared-memory servers.

机译：分布式共享内存服务器中处理器可靠性的提高。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Scalable shared-memory servers offer high performance and capacity within the familiar shared-memory programming model. However, reliability and availability have been significant shortcoming for previous shared-memory architectures, as a single error in one of the many processor or memory modules could bring down the entire system. The goal of this thesis is to eliminate the processor module as a single point of failure for shared-memory servers without requiring changes to software and minimizing the impact on commodity hardware designs.;The basic approach studied is distributed redundancy, where pairs of processor cores are grouped together logically but separated physically to increase availability of the system. We propose a design space based on fault-containment granularity, and argue that achieving our goals requires that processor cores and their private caches keep unchecked values from propagating into shared memory. We investigate two alternatives for exposing these updates to the outside system: forcing a check when external requests arrive or hiding the updates using a relaxed memory model.;We propose initial designs based on lockstep coordination that constructs synchronous redundant processor pairs. We then leverage the hidden-update mechanisms to develop an asynchronous, distributed-redundant system. Our evaluations of common enterprise workloads show that asynchronous redundancy can achieve performance overheads averaging just 10% over a non-redundant system, while obviating the need for extensive initialization and deterministic execution found in synchronous designs.;We observe that although asynchronous redundancy has numerous benefits for the designer, it complicates the system's ability to recover from chip failures. Our implementation of asynchronous redundancy relies on one of the replica cores in each pair being potentially incoherent with the rest of the system, leading to temporal regions where, if the coherent core failed, data could be lost. We propose simple extensions to the cache coherence protocol to close these windows of vulnerability. Using symbolic model checking, we formally verify an example distributed shared-memory coherence protocol and our proposed extensions for chip-failure tolerance.

机译：可扩展的共享内存服务器在熟悉的共享内存编程模型内提供了高性能和容量。但是，可靠性和可用性对于以前的共享内存体系结构来说是严重的缺点，因为许多处理器或内存模块之一中的单个错误可能会使整个系统瘫痪。本文的目的是消除处理器模块成为共享内存服务器的单点故障，而无需更改软件并最大程度地减少对商用硬件设计的影响。；研究的基本方法是分布式冗余，其中成对的处理器内核在逻辑上分组在一起，但在物理上分开，以提高系统的可用性。我们提出了一种基于故障遏制粒度的设计空间，并认为要实现我们的目标，就需要处理器内核及其专用缓存防止未经检查的值传播到共享内存中。我们研究了将这些更新公开给外部系统的两种方法：在外部请求到达时强制检查或使用宽松的内存模型隐藏更新。我们提出了基于锁步协调的初始设计，该同步结构构造了同步冗余处理器对。然后，我们利用隐藏更新机制来开发异步，分布式冗余系统。我们对常见企业工作负载的评估表明，异步冗余可以在非冗余系统上实现平均仅10％的性能开销，同时避免了同步设计中需要进行广泛的初始化和确定性执行。对于设计人员而言，这使系统从芯片故障中恢复的能力变得复杂。我们异步冗余的实现依赖于每对副本中的一个复制核心可能与系统的其余部分不一致，从而导致临时区域，如果相关核心发生故障，则可能会丢失数据。我们建议对缓存一致性协议进行简单扩展，以关闭这些漏洞窗口。使用符号模型检查，我们正式验证了示例分布式共享内存一致性协议以及我们提出的针对芯片故障容限的扩展。

著录项

作者
Gold, Brian T.;
展开▼
作者单位

Carnegie Mellon University.;

展开▼
授予单位 Carnegie Mellon University.;
学科 Computer Science.
学位 Ph.D.
年度 2009
页码 84 p.
总页数 84
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词
入库时间 2022-08-17 11:38:30

相似文献

外文文献
中文文献
专利

1. Moving address translation closer to memory in distributed shared-memory multiprocessors [J] . Qiu X., Dubois M. IEEE Transactions on Parallel and Distributed Systems . 2005,第7期

机译：将地址转换移到分布式共享内存多处理器中的内存附近
2. REDUCING CONTROL LATENCY IN DISTRIBUTED SHARED-MEMORY MULTIPROCESSOR SYSTEMS USING FUZZY LOGIC PREDICTION [J] . O.M. Al-Jarrah, A. Muhsen International Journal Of Modelling & Simulation . 2005,第1期

机译：基于模糊逻辑预测的分布式共享存储器多处理器系统控制时延降低
3. An efficient implementation of tree-based multicast routing for distributed shared-memory multiprocessors [J] . M. P. Malumbres, Jose Duato Journal of systems architecture . 2000,第11期

机译：分布式共享内存多处理器的基于树的多播路由的有效实现
4. Eager combining: a coherency protocol for increasing effective network and memory bandwidth in shared-memory multiprocessors [C] . Bianchini, R., LeBlanc, . 1994

机译：急切的组合：一种一致性协议，用于增加共享内存多处理器中的有效网络和内存带宽
5. Speculative distributed shared-memory multiprocessors organized as processor-and-memory hierarchies. [D] . Figueiredo, Renato Jansen O. 2001

机译：组织为处理器和内存层次结构的推测性分布式共享内存多处理器。
6. Radiology CME on the Web using secure document transfer and internationally distributed image servers. [O] . K. W. McEnery, S. M. Roth, R. V. Walkup 1996

机译：使用安全文档传输和国际分布的图像服务器在Web上进行放射CME。
7. Dynamic Program Phase Detection in Distributed Shared-Memory Multiprocessors [O] . 2014

机译：分布式共享内存多处理器中的动态程序相位检测
8. Eager Combining: A Coherency Protocol for Increasing Effective Network and Memory Bandwidth in Shared-Memory Multiprocessors. [R] . Bianchini, R., LeBlanc, T. J. 1994

机译：Eager Combining：用于提高共享内存多处理器中有效网络和内存带宽的一致性协议。

Increasing processor dependability in distributed shared-memory servers.

摘要

著录项

相似文献

相关主题

期刊订阅