...
首页> 外文期刊>Future generation computer systems >Efficient heterogeneous matrix profile on a CPU + High Performance FPGA with integrated HBM
【24h】

Efficient heterogeneous matrix profile on a CPU + High Performance FPGA with integrated HBM

机译:CPU +高性能FPGA的高效异构矩阵配置与集成HBM

获取原文
获取原文并翻译 | 示例
           

摘要

In this work, we study the problem of efficiently executing a state-of-the-art time series algorithm class - SCAMP - on a heterogeneous platform comprised of CPU + High Performance FPGA with integrated HBM (High Bandwidth Memory). The geometry of the algorithm (a triangular matrix walk) and the FPGA capabilities pose two challenges. First, several replicated IPs can be instantiated in the FPGA fabric, so load balance is an issue not only at system-level (CPU+FPGA), but also at device-level (FPGA IPs). And second, the data that each one of these IPs accesses must be carefully placed among the HBM banks in order to efficiently exploit the memory bandwidth offered by the banks while optimizing power consumption. To tackle the first challenge we propose a novel hierarchical scheduler named Fastflt, to efficiently balance the workload in the heterogeneous system while ensuring near-optimal throughput. Our scheduler consists of a two level scheduling engine: (1) the system-level scheduler, which leverages an analytical model of the FPGA pipeline IPs, to find the near-optimal FPGA chunk size that guarantees optimal FPGA throughput; and (2) a geometry-aware device-level scheduler, which is responsible for the effective partitioning of the FPGA chunk into sub-chunks assigned to each FPGA IP. To deal with the second challenge we propose a methodology based on a model of the HBM bandwidth usage that allows us to set the minimum number of active banks that ensure the maximum aggregated memory bandwidth for a given number of IPs. Through exhaustive evaluation we validate the accuracy of our models, the efficiency of our intra-device partition strategies and the performance and energy efficiency of our Fastfit heterogeneous scheduler, finding that it outperforms state-of-the-art previous schedulers by achieving up to 99.4% of ideal performance.
机译:在这项工作中,我们研究了有效地执行了最先进的时间序列算法类 - Scamp - 在包含CPU +高性能FPGA的异构平台上,具有集成的HBM(高带宽存储器)。算法的几何形状(三角形矩阵行走)和FPGA能力构成了两个挑战。首先,在FPGA结构中可以将多个复制的IPS实例化,因此负载平衡是不仅在系统级(CPU + FPGA)的问题,还可以在设备级(FPGA IPS)。其次,必须在HBM银行中仔细地放置每个IPS访问中的数据,以便有效地利用银行提供的内存带宽,同时优化功耗。为了解决第一个挑战,我们提出了一个名为FastFlt的新型分层调度程序,以有效地平衡异构系统中的工作量,同时确保近最佳吞吐量。我们的调度器由两个级别调度引擎组成:(1)系统级调度器,它利用FPGA管道IP的分析模型,找到保证最佳FPGA吞吐量的近最优FPGA块尺寸; (2)几何识别设备级调度程序,负责有效分区FPGA块进入分配给每个FPGA IP的子块。要处理第二个挑战,我们提出了一种基于HBM带宽使用模型的方法,该模型允许我们设置确保给定数量的IP的最大聚合存储器带宽的最小活动库。通过详尽评估,我们验证了我们模型的准确性,我们的设备内部分区策略的效率以及我们快速的异构调度程序的性能和能源效率,发现它通过实现高达99.4来实现最先进的预期调度员理想表现的百分比。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号