首页> 外文期刊>IEEE transactions on very large scale integration (VLSI) systems >Mapping Data-Parallel Tasks Onto Partially Reconfigurable Hybrid Processor Architectures
【24h】

Mapping Data-Parallel Tasks Onto Partially Reconfigurable Hybrid Processor Architectures

机译:将数据并行任务映射到部分可重新配置的混合处理器体系结构上

获取原文
获取原文并翻译 | 示例
           

摘要

Reconfigurable hybrid processor systems provide a flexible platform for mapping data-parallel applications, while providing considerable speedup over software implementations. However, the overhead for reconfiguration presents a significant deterrent in mapping applications onto reconfigurable hardware. Partial runtime reconfiguration is one approach to reduce the reconfiguration overhead. In this paper, we present a methodology to map data-parallel tasks onto hardware that supports partial reconfiguration. The aim is to obtain the maximum possible speedup, for a given reconfiguration time, bus speed, and computation speed. The proposed approach involves using multiple, identical but independent processing units in the reconfigurable hardware. Under nonzero reconfiguration overhead, we show that there exists an upper limit on the number of processing units that can be employed beyond which further reduction in execution time is not possible. We obtain solutions for the minimum processing time, the corresponding load distribution, and schedule for data transfer. To demonstrate the applicability of the analysis, we present the following: 1) various plots showing the variation of processing time with different parameters; 2) hardware simulations for two examples, viz., 1-D discrete wavelet transform and finite impulse response filter, targeted to Xilinx field-programmable gate arrays (FPGAs); and 3) experimental results for a hardware prototype implemented on a FPGA board
机译:可重配置的混合处理器系统为映射数据并行应用程序提供了一个灵活的平台,同时提供了比软件实现更快的速度。但是,重新配置的开销在将应用程序映射到可重新配置的硬件上具有很大的威慑力。部分运行时重新配置是减少重新配置开销的一种方法。在本文中,我们提出了一种将数据并行任务映射到支持部分重新配置的硬件的方法。目的是在给定的重新配置时间,总线速度和计算速度下获得最大可能的加速。所提出的方法涉及在可重配置硬件中使用多个相同但独立的处理单元。在非零重新配置开销下,我们表明可以使用的处理单元数量存在上限,超过该上限则无法进一步减少执行时间。我们获得了最短处理时间,相应的负载分配以及数据传输时间表的解决方案。为了证明该分析的适用性,我们提出以下内容:1)各种图,显示了不同参数下处理时间的变化; 2)针对Xilinx现场可编程门阵列(FPGA)的两个示例(一维离散小波变换和有限脉冲响应滤波器)的硬件仿真;和3)在FPGA板上实现的硬件原型的实验结果

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号