Efficient heterogeneous matrix profile on a CPU + High Performance FPGA with integrated HBM

Jose Carlos Romero; Angeles Navarro; Antonio Vilches; Andres Rodriguez; Francisco Corbera; Rafael Asenjo

首页> 外文期刊>Future generation computer systems >Efficient heterogeneous matrix profile on a CPU + High Performance FPGA with integrated HBM

【24h】

Efficient heterogeneous matrix profile on a CPU + High Performance FPGA with integrated HBM

机译：CPU +高性能FPGA的高效异构矩阵配置与集成HBM

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this work, we study the problem of efficiently executing a state-of-the-art time series algorithm class - SCAMP - on a heterogeneous platform comprised of CPU + High Performance FPGA with integrated HBM (High Bandwidth Memory). The geometry of the algorithm (a triangular matrix walk) and the FPGA capabilities pose two challenges. First, several replicated IPs can be instantiated in the FPGA fabric, so load balance is an issue not only at system-level (CPU+FPGA), but also at device-level (FPGA IPs). And second, the data that each one of these IPs accesses must be carefully placed among the HBM banks in order to efficiently exploit the memory bandwidth offered by the banks while optimizing power consumption. To tackle the first challenge we propose a novel hierarchical scheduler named Fastflt, to efficiently balance the workload in the heterogeneous system while ensuring near-optimal throughput. Our scheduler consists of a two level scheduling engine: (1) the system-level scheduler, which leverages an analytical model of the FPGA pipeline IPs, to find the near-optimal FPGA chunk size that guarantees optimal FPGA throughput; and (2) a geometry-aware device-level scheduler, which is responsible for the effective partitioning of the FPGA chunk into sub-chunks assigned to each FPGA IP. To deal with the second challenge we propose a methodology based on a model of the HBM bandwidth usage that allows us to set the minimum number of active banks that ensure the maximum aggregated memory bandwidth for a given number of IPs. Through exhaustive evaluation we validate the accuracy of our models, the efficiency of our intra-device partition strategies and the performance and energy efficiency of our Fastfit heterogeneous scheduler, finding that it outperforms state-of-the-art previous schedulers by achieving up to 99.4% of ideal performance.

机译：在这项工作中，我们研究了有效地执行了最先进的时间序列算法类 - Scamp - 在包含CPU +高性能FPGA的异构平台上，具有集成的HBM（高带宽存储器）。算法的几何形状（三角形矩阵行走）和FPGA能力构成了两个挑战。首先，在FPGA结构中可以将多个复制的IPS实例化，因此负载平衡是不仅在系统级（CPU + FPGA）的问题，还可以在设备级（FPGA IPS）。其次，必须在HBM银行中仔细地放置每个IPS访问中的数据，以便有效地利用银行提供的内存带宽，同时优化功耗。为了解决第一个挑战，我们提出了一个名为FastFlt的新型分层调度程序，以有效地平衡异构系统中的工作量，同时确保近最佳吞吐量。我们的调度器由两个级别调度引擎组成：（1）系统级调度器，它利用FPGA管道IP的分析模型，找到保证最佳FPGA吞吐量的近最优FPGA块尺寸; （2）几何识别设备级调度程序，负责有效分区FPGA块进入分配给每个FPGA IP的子块。要处理第二个挑战，我们提出了一种基于HBM带宽使用模型的方法，该模型允许我们设置确保给定数量的IP的最大聚合存储器带宽的最小活动库。通过详尽评估，我们验证了我们模型的准确性，我们的设备内部分区策略的效率以及我们快速的异构调度程序的性能和能源效率，发现它通过实现高达99.4来实现最先进的预期调度员理想表现的百分比。

著录项

来源
《Future generation computer systems》 |2021年第12期|10-23|共14页
作者
Jose Carlos Romero; Angeles Navarro; Antonio Vilches; Andres Rodriguez; Francisco Corbera; Rafael Asenjo;
展开▼
作者单位

Universidad de Malaga Spain;

Universidad de Malaga Spain;

Shapelets Puerto del Mar 18 2nd Floor 29005 Malaga Spain;

Universidad de Malaga Spain;

Universidad de Malaga Spain;

Universidad de Malaga Spain;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
High Performance FPGA; High Bandwidth Memory; Heterogeneous scheduler; Lightweight partitioner; Analytical model; Time series; Matrix profile;

机译：高性能FPGA;高带宽存储器;异构调度员;轻量级分区;分析模型;时间序列;矩阵概况;

相似文献

外文文献
中文文献
专利

1. Implementing and Evaluating an Heterogeneous, Scalable, Tridiagonal Linear System Solver with OpenCL to Target FPGAs, GPUs, and CPUs [J] . Macintosh Hamish J., Banks Jasmine E., Kelson Neil A. International journal of reconfigurable computing . 2019,第PTa1期

机译：实现和评估具有OpenCL的异构，可伸缩的Tridgonal线性系统求解器，以靶向FPGA，GPU和CPU
2. Implementing and Evaluating an Heterogeneous, Scalable, Tridiagonal Linear System Solver with OpenCL to Target FPGAs, GPUs, and CPUs [J] . Hamish J. Macintosh, Jasmine E. Banks, Neil A. Kelson International journal of reconfigurable computing . 2019,第5aaPagea2期

机译：实现和评估具有OpenCL的异构，可伸缩的Tridgonal线性系统求解器，以靶向FPGA，GPU和CPU
3. Profile Guided Dataflow Transformation for FPGAs and CPUs [J] . Stewart Robert, Bhowmik Deepayan, Wallace Andrew, Journal of signal processing systems for signal, image, and video technology . 2017,第1期

机译：针对FPGA和CPU的配置文件引导的数据流转换
4. 23.1 20nm high-K metal-gate heterogeneous 64b quad-core CPUs and hexa-core GPU for high-performance and energy-efficient mobile application processor [C] . Jungyul Pyo, Youngmin Shin, Hoi-Jin Lee, IEEE International Solid- State Circuits Conference . 2015

机译：23.1 20nm高K金属门异构64b四核CPU和六核GPU，用于高性能和高能效的移动应用处理器
5. Efficient and Scalable Parallel Stochastic Gradient Descent on a Heterogeneous CPU-FPGA platform for Large Scale Machine Learning [D] . Rasoori, Sandeep. 2017

机译：用于大规模机器学习的异构CPU-FPGA平台上高效且可伸缩的平行随机梯度下降
6. Application Performance Analysis and Efficient Execution on Systems with multi-core CPUs GPUs and MICs: A Case Study with Microscopy Image Analysis [O] . George Teodoro, Tahsin Kurc, Guilherme Andrade, -1

机译：具有多核CPUGPU和MIC的系统上的应用程序性能分析和高效执行：以显微镜图像分析为例
7. Improving memory access performance for irregular algorithms in heterogeneous CPU/FPGA systems [O] . Bean Andrew 2016

机译：提高异构CpU / FpGa系统中不规则算法的内存访问性能

Efficient heterogeneous matrix profile on a CPU + High Performance FPGA with integrated HBM

摘要

著录项

相似文献

相关主题

期刊订阅