首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Computer Generation of High Throughput and Memory Efficient Sorting Designs on FPGA
【24h】

Computer Generation of High Throughput and Memory Efficient Sorting Designs on FPGA

机译:FPGA上计算机生成的高吞吐量和内存高效排序设计

获取原文
获取原文并翻译 | 示例
           

摘要

Accelerating sorting using dedicated hardware to fully utilize the memory bandwidth for Big Data applications has gained much interest in the research community. Recently, parallel sorting networks have been widely employed in hardware implementations due to their high data parallelism and low control overhead. In this paper, we propose a systematic methodology for mapping large-scale bitonic sorting networks onto FPGA. To realize data permutations in the sorting network, we develop a novel RAM-based design by vertically “folding” the classic Clos network. By utilizing the proposed design for data permutation, we develop a hardware generator to automatically build bitonic sorting architectures on FPGAs. For given input size, data width and data parallelism, the hardware generator specializes both the datapath and the control unit for sorting and generates a design in high level hardware description language. We demonstrate trade-offs among throughput, latency and area using two illustrative sorting designs including a high throughput design and a resource efficient design. With a data parallelism of , the high throughput design sorts an -key sequence with latency , throughput results per cycle and uses memory. This achieves optimal memory efficiency (defined as the ratio of throughput to the amount of on-chip memory used by the design) and outperforms the state-of-the-art. Experimental results show that the designs obtained by our proposed hardware generator achieve 49 to 112 percent improvement in energy efficiency and 56 to 430 percent higher memory efficiency compared with the state-of-the-art.
机译:使用专用硬件加速排序以充分利用大数据应用程序的内存带宽已经引起了研究界的极大兴趣。近来,并行分类网络由于其高数据并行性和低控制开销而被广泛用于硬件实现中。在本文中,我们提出了一种系统的方法,用于将大规模双子分类系统映射到FPGA上。为了实现分类网络中的数据排列,我们通过垂直“折叠”经典Clos网络来开发基于RAM的新颖设计。通过利用提出的数据排列设计,我们开发了一种硬件生成器,可以在FPGA上自动构建双音排序架构。对于给定的输入大小,数据宽度和数据并行性,硬件生成器专门处理数据路径和控制单元以进行排序,并以高级硬件描述语言生成设计。我们使用两种示例性排序设计(包括高吞吐量设计和资源高效设计)演示了吞吐量,延迟和区域之间的权衡。数据并行性为时,高吞吐量设计将按延迟排序键序列,每个周期的吞吐量结果并使用内存。这样可实现最佳的存储效率(定义为吞吐量与设计所使用的片上存储器数量之比),并且性能优于最新技术。实验结果表明,与现有技术相比,我们提出的硬件生成器获得的设计在能源效率上提高了49%至112%,在存储效率方面提高了56%至430%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号