...
【24h】

Aging-Aware Compilation for GP-GPUs

机译:GP-GPU的老化感知编译

获取原文
获取原文并翻译 | 示例
           

摘要

General-purpose graphic processing units (GP-GPUs) offer high computational throughput using thousands of integrated processing elements (PEs). These PEs are stressed during workload execution, and negative bias temperature instability (NBTI) adversely affects their reliability by introducing new delay-induced faults. However, the effect of these delay variations is not uniformly spread across the PEs: some are affected more-hence less reliable-than others. This variation causes significant reduction in the lifetime of GP-GPU parts. In this article, we address the problem of "wear leveling" across processing units to mitigate lifetime uncertainty in GP-GPUs. We propose innovations in the static compiled code that can improve healing in PEs and stream cores (SCs) based on their degradation status. PE healing is a fine-grained very long instruction word (VLIW) slot assignment scheme that balances the stress of instructions across the PEs within an SC. SC healing is a coarse-grained workload allocation scheme that distributes workload across SCs in GP-GPUs. Both schemes share a common property: they adaptively shift workload from less reliable units to more reliable units, either spatially or temporally. These software schemes are based on online calibration with NBTI monitoring that equalizes the expected lifetime of PEs and SCs by regenerating adaptive compiled codes to respond to the specific health state of the GP-GPUs. We evaluate the effectiveness of the proposed schemes for various OpenCL kernels from the AMD APP SDK on Evergreen and Southern Island GPU architectures. The aging-aware healthy kernels generated by the PE (or SC) healing scheme reduce NBTI-induced voltage threshold shift by 30% (77% in the case of SCs), with no (moderate) performance penalty compared to the naive kernels.
机译:通用图形处理单元(GP-GPU)使用成千上万个集成处理元件(PE)提供高计算吞吐量。这些PE在工作负载执行期间会受到压力,并且负偏压温度不稳定性(NBTI)通过引入新的延迟引起的故障对其可靠性产生不利影响。但是,这些延迟变化的影响并没有在PE上均匀分布:有些影响比其他的更不可靠。这种变化会导致GP-GPU零件的使用寿命大大缩短。在本文中,我们解决了跨处理单元的“损耗平衡”问题,以减轻GP-GPU的寿命不确定性。我们提出了静态编译代码中的创新,可以根据PE和流核心(SC)的退化状态来改善它们的修复。 PE恢复是一种细粒度的非常长的指令字(VLIW)插槽分配方案,可以平衡SC内各PE之间的指令压力。 SC修复是一种粗粒度的工作负载分配方案,可在GP-GPU中的SC之间分配工作负载。两种方案都有一个共同的特性:它们在空间或时间上将工作量从不太可靠的单元自适应地转移到更可靠的单元。这些软件方案基于具有NBTI监视功能的在线校准,该功能通过重新生成自适应编译代码以响应GP-GPU的特定运行状况,从而均衡了PE和SC的预期寿命。我们从Evergreen和Southern Island GPU架构的AMD APP SDK评估了针对各种OpenCL内核提出的方案的有效性。通过PE(或SC)修复方案生成的可感知老化的健康内核将NBTI诱导的电压阈值偏移降低了30%(对于SC而言为77%),与纯内核相比没有(中等)性能损失。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号