Efficient execution of memory access phases using dataflow specialization

机译：使用数据流专门化来高效执行内存访问阶段

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper identifies a new opportunity for improving the efficiency of a processor core: memory access phases of programs. These are dynamic regions of programs where most of the instructions are devoted to memory access or address computation. These occur naturally in programs because of workload properties, or when employing an in-core accelerator, we get induced phases where the code execution on the core is access code. We observe such code requires an OOO core's dataflow and dynamism to run fast and does not execute well on an in-order processor. However, an OOO core consumes much power, effectively increasing energy consumption and reducing the energy efficiency of in-core accelerators. We develop an execution model called memory access dataflow (MAD) that encodes dataflow computation, event-condition-action rules, and explicit actions. Using it we build a specialized engine that provides an OOO core's performance but at a fraction of the power. Such an engine can serve as a general way for any accelerator to execute its respective induced phase, thus providing a common interface and implementation for current and future accelerators. We have designed and implemented MAD in RTL, and we demonstrate its generality and flexibility by integration with four diverse accelerators (SSE, DySER, NPU, and C-Cores). Our quantitative results show, relative to in-order, 2-wide OOO, and 4-wide OOO, MAD provides 2.4×, 1.4× and equivalent performance respectively. It provides 0.8×, 0.6× and 0.4× lower energy.

机译：本文确定了提高处理器核心效率的新机会：程序的内存访问阶段。这些是程序的动态区域，其中大多数指令专用于存储器访问或地址计算。由于工作负载属性，这些自然地会在程序中发生，或者当使用内核内加速器时，我们会得出诱导阶段，其中内核上的代码执行是访问代码。我们观察到这样的代码需要OOO内核的数据流和动态性才能快速运行，并且不能在有序处理器上很好地执行。但是，OOO磁芯会消耗大量功率，从而有效地增加了能耗并降低了堆芯内加速器的能效。我们开发了一种称为内存访问数据流（MAD）的执行模型，该模型对数据流计算，事件条件操作规则和显式操作进行编码。使用它，我们构建了一种专门的发动机，该发动机可提供OOO磁芯的性能，但功率却很小。这样的引擎可以用作任何加速器执行其各自的诱导阶段的一般方式，从而为当前和将来的加速器提供通用的接口和实现。我们已经在RTL中设计和实现了MAD，并且通过与四个不同的加速器（SSE，DySER，NPU和C-Core）集成来证明其通用性和灵活性。我们的定量结果表明，相对于有序2宽OOO和4宽OOO，MAD分别提供2.4倍，1.4倍和等效的性能。它提供了0.8倍，0.6倍和0.4倍的较低能量。

著录项

来源
《42th Annual International Symposium on Computer Architecture》|2015年|118-130|共13页
会议地点 Portland OR(US)
作者
Ho Chen-Han; Kim Sung Jin; Sankaralingam Karthikeyan;
展开▼
作者单位

University of Wisconsin-Madison, USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Efficient Execution of Memory Access Phases Using Dataflow Specialization [J] . Chen-Han Ho, Sung Jin Kim, Karthikeyan Sankaralingam Computer architecture news . 2015,第3期

机译：使用数据流专业化高效执行内存访问阶段
2. Toward Efficient Execution of RVC-CAL Dataflow Programs on Multicore Platforms [J] . Hautala Ilkka, Boutellier Jani, Nylanden Teemu, Journal of VLSI signal processing systems for signal, image, and video technology . 2018,第11期

机译：在多核平台上实现RVC-CAL数据流程序的高效执行
3. Leveraging Phase Change Memory to Achieve Efficient Virtual Machine Execution [J] . Ruijin Zhou, Tao Li ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2013,第7期

机译：利用相变内存实现有效的虚拟机执行
4. Efficient execution of memory access phases using dataflow specialization [C] . Ho Chen-Han, Kim Sung Jin, Sankaralingam Karthikeyan Annual International Symposium on Computer Architecture . 2015

机译：使用DataFlow专业化有效地执行内存访问阶段
5. Translation of OpenMP to dataflow execution model for data locality and efficient parallel execution. [D] . Weng, Tien-hsiung. 2003

机译：将OpenMP转换为数据流执行模型，以实现数据局部性和高效的并行执行。
6. Impairment in the Intention Formation and Execution Phases of Prospective Memory in Parkinsons Disease [O] . Shu-Hong Jia, Kai Li, Wen Su, 2018

机译：帕金森氏病前瞻性记忆的意向形成和执行阶段的障碍
7. Efficient Execution of Memory Access Phases Using Dataflow Specialization [O] . Chen-han Ho, Sung Jin, Kim Karthikeyan Sankaralingam 2015

机译：使用数据流专业化高效执行内存访问阶段

Efficient execution of memory access phases using dataflow specialization

摘要

著录项

相似文献

相关主题

期刊订阅