首页> 外文学位 >Technology impacts of CMOS scaling on microprocessor core design for hard-fault tolerance in single-core applications and optimized throughput in throughput-oriented chip multiprocessors.
【24h】

Technology impacts of CMOS scaling on microprocessor core design for hard-fault tolerance in single-core applications and optimized throughput in throughput-oriented chip multiprocessors.

机译:CMOS缩放对单核应用中的硬故障容限和面向吞吐量的芯片多处理器中优化的吞吐量的微处理器内核设计的技术影响。

获取原文
获取原文并翻译 | 示例

摘要

The continued march of technological progress, epitomized by Moore's Law provides the microarchitect with increasing numbers of transistors to employ as we continue to shrink feature geometries. Physical limitations impose new constraints upon designers in the areas of overall power and localized power density. Techniques to scale threshold and supply voltages to lower values in order to reduce power consumption of the part have also run into physical limitations, exacerbating power and cooling problems in deep sub-micron CMOS process generations. Smaller device geometries are also subject to increased sensitivity to common failure modes as well as manufacturing process variability.In the face of these added challenges, we observe a shift in the focus of the industry, away from building ever-larger single-core chips, whose focus is on reducing single-threaded latency, toward a design approach that employs multiple cores on a single chip to improve throughput. While the early multicore era utilized the existing single-core designs of the previous generation in small numbers, subsequent generations have introduced cores tailored to multicore use. These cores seek to achieve power-efficient throughput and have led to a new emphasis on throughput-oriented computing, particularly for Internet workloads, where the end-to-end computational task is dominated by long-latency network operations. The ubiquity of these workloads makes a compelling argument for throughput-oriented designs, but does not free the microarchitect fully from latency demands of common workloads in enterprise and desktop application spaces.We believe that a continued need for both throughput-oriented and latency-sensitive processors will exist in coming generations of technology. We further opine that making effective use of the additional transistors that will be available may require different techniques for latency-sensitive designs than for throughput-oriented ones, since we may trade latency or throughput for the desired attribute of a core in each of the respective paradigms.We make three major contributions with this thesis. Our first contribution is a fine-grained fault diagnosis and deconfiguration technique for array structures, such as the ROB, within the microprocessor core. We present and evaluate two variants of this technique. The first variant uses an existing fault detection and correction technique whose scope is the processor core execution pipeline to ensure correct processor operation. The second variant integrates fault detection and correction into the array structure itself to provide a self-contained, fine-grained, fault detection, diagnosis, and repair technique.In our second contribution, we develop a lightweight, fine-grained fault diagnosis mechanism for the processor core. In this work, we leverage the first contribution's methods to provide deconfiguration of faulty array elements. We additionally extend the scope of that work to include all pipeline circuitry from instruction issue to retirement.In our third and final contribution, we focus on throughput-oriented core data cache design. In this work, we study the demands of the throughput-oriented core running a representative work-load and then propose and evaluate an alternative data cache implementation that more closely matches the demands of the core. We then show that a better-matched cache design can be exploited to provide improved throughput under a fixed power budget.Our results show that typical latency-sensitive cores have sufficient redundancy to make fine-grained hard-fault tolerance an affordable alternative for hardening complex designs. Our designs suffer little or no performance loss when no faults are present and retain nearly the same performance characteristics in the presence of small numbers of hard faults in protected structures. In our study of the latency-sensitive core, we have shown that SRAM-based designs have low latencies that end up providing less benefit to a throughput-oriented core and workload than a better-fitted data cache composed of DRAM. The move from a high-power, low-latency technology to a lower-power, high-latency technology allows us to increase L1 data cache capacity, which is a net benefit for the throughput-oriented core.
机译:随着我们不断缩小特征几何尺寸,以摩尔定律为代表的技术进步的不断前进为微体系结构提供了越来越多的晶体管。物理限制在总体功率和局部功率密度方面对设计人员施加了新的约束。将阈值和电源电压缩放至较低值以降低部件功耗的技术也遇到了物理限制,加剧了深亚微米CMOS工艺世代中的电源和散热问题。器件尺寸越小,对常见故障模式的敏感性以及制造工艺的可变性也越大。面对这些附加挑战,我们观察到了行业重点的转移,而不是制造越来越大的单核芯片,其重点是减少单线程延迟,这是一种在单个芯片上采用多个内核以提高吞吐量的设计方法。尽管早期的多核时代少量使用了上一代现有的单核设计,但后代却推出了针对多核使用量身定制的内核。这些核心寻求实现高能效的吞吐量,并导致对面向吞吐量的计算的新重视,特别是对于Internet工作负载,其中端到端计算任务主要由长等待时间的网络操作控制。这些工作负载的普遍性为面向吞吐量的设计提出了令人信服的论点,但并未使微体系结构完全摆脱企业和桌面应用程序空间中常见工作负载的延迟需求。我们认为,持续需要面向吞吐量和对延迟敏感的需求处理器将存在于下一代技术中。我们进一步认为,对于延迟敏感型设计,与面向吞吐量的设计相比,有效利用可用的其他晶体管可能需要不同的技术,因为我们可能会在各个相应内核中针对内核的所需属性来交换延迟或吞吐量。范式。我们对本文进行了三大贡献。我们的第一项贡献是对微处理器核心内的阵列结构(例如ROB)进行细粒度的故障诊断和取消配置技术。我们介绍并评估此技术的两个变体。第一种变体使用现有的故障检测和纠正技术,其范围是处理器核心执行管道,以确保处理器正确运行。第二个变种将故障检测和纠正集成到阵列结构本身中,以提供一种自包含的,细粒度的故障检测,诊断和修复技术。在我们的第二个贡献中,我们开发了一种轻量级,细粒度的故障诊断机制,用于处理器核心。在这项工作中,我们利用第一贡献的方法提供故障阵列元素的取消配置。我们进一步扩展了工作范围,涵盖了从指令发布到退役的所有流水线电路。在第三点也是最后一点,我们专注于面向吞吐量的核心数据缓存设计。在这项工作中,我们研究运行有代表性的工作负载的面向吞吐量的核心的需求,然后提出并评估与核心需求更加匹配的替代数据缓存实现。然后,我们表明可以使用更匹配的缓存设计来在固定功率预算下提供更高的吞吐量。我们的结果表明,典型的延迟敏感型内核具有足够的冗余性,可以使细粒度的硬故障容错成为加固复杂系统的可承受替代方案设计。当没有故障出现时,我们的设计几乎不会遭受性能损失,甚至在受保护的结构中存在少量硬故障时,我们的设计仍能保持几乎相同的性能特征。在对延迟敏感型内核的研究中,我们表明,基于SRAM的设计具有较低的延迟,最终,与以DRAM组成的更适合数据的高速缓存相比,其对面向吞吐量的内核和工作负载的益处较小。从高功率,低延迟技术到低功率,高延迟技术的转变使我们能够增加L1数据高速缓存容量,这对于以吞吐量为导向的核心是一个净收益。

著录项

  • 作者

    Bower, Fred Allison, III.;

  • 作者单位

    Duke University.;

  • 授予单位 Duke University.;
  • 学科 Engineering Computer.Engineering Electronics and Electrical.Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 141 p.
  • 总页数 141
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号