首页> 外文会议>42th Annual International Symposium on Computer Architecture >Efficient execution of memory access phases using dataflow specialization
【24h】

Efficient execution of memory access phases using dataflow specialization

机译:使用数据流专门化来高效执行内存访问阶段

获取原文
获取原文并翻译 | 示例

摘要

This paper identifies a new opportunity for improving the efficiency of a processor core: memory access phases of programs. These are dynamic regions of programs where most of the instructions are devoted to memory access or address computation. These occur naturally in programs because of workload properties, or when employing an in-core accelerator, we get induced phases where the code execution on the core is access code. We observe such code requires an OOO core's dataflow and dynamism to run fast and does not execute well on an in-order processor. However, an OOO core consumes much power, effectively increasing energy consumption and reducing the energy efficiency of in-core accelerators. We develop an execution model called memory access dataflow (MAD) that encodes dataflow computation, event-condition-action rules, and explicit actions. Using it we build a specialized engine that provides an OOO core's performance but at a fraction of the power. Such an engine can serve as a general way for any accelerator to execute its respective induced phase, thus providing a common interface and implementation for current and future accelerators. We have designed and implemented MAD in RTL, and we demonstrate its generality and flexibility by integration with four diverse accelerators (SSE, DySER, NPU, and C-Cores). Our quantitative results show, relative to in-order, 2-wide OOO, and 4-wide OOO, MAD provides 2.4×, 1.4× and equivalent performance respectively. It provides 0.8×, 0.6× and 0.4× lower energy.
机译:本文确定了提高处理器核心效率的新机会:程序的内存访问阶段。这些是程序的动态区域,其中大多数指令专用于存储器访问或地址计算。由于工作负载属性,这些自然地会在程序中发生,或者当使用内核内加速器时,我们会得出诱导阶段,其中内核上的代码执行是访问代码。我们观察到这样的代码需要OOO内核的数据流和动态性才能快速运行,并且不能在有序处理器上很好地执行。但是,OOO磁芯会消耗大量功率,从而有效地增加了能耗并降低了堆芯内加速器的能效。我们开发了一种称为内存访问数据流(MAD)的执行模型,该模型对数据流计算,事件条件操作规则和显式操作进行编码。使用它,我们构建了一种专门的发动机,该发动机可提供OOO磁芯的性能,但功率却很小。这样的引擎可以用作任何加速器执行其各自的诱导阶段的一般方式,从而为当前和将来的加速器提供通用的接口和实现。我们已经在RTL中设计和实现了MAD,并且通过与四个不同的加速器(SSE,DySER,NPU和C-Core)集成来证明其通用性和灵活性。我们的定量结果表明,相对于有序2宽OOO和4宽OOO,MAD分别提供2.4倍,1.4倍和等效的性能。它提供了0.8倍,0.6倍和0.4倍的较低能量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号