Computer Generation of High Throughput and Memory Efficient Sorting Designs on FPGA

Ren Chen; Viktor K. Prasanna

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Computer Generation of High Throughput and Memory Efficient Sorting Designs on FPGA

【24h】

Computer Generation of High Throughput and Memory Efficient Sorting Designs on FPGA

机译：FPGA上计算机生成的高吞吐量和内存高效排序设计

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Accelerating sorting using dedicated hardware to fully utilize the memory bandwidth for Big Data applications has gained much interest in the research community. Recently, parallel sorting networks have been widely employed in hardware implementations due to their high data parallelism and low control overhead. In this paper, we propose a systematic methodology for mapping large-scale bitonic sorting networks onto FPGA. To realize data permutations in the sorting network, we develop a novel RAM-based design by vertically “folding” the classic Clos network. By utilizing the proposed design for data permutation, we develop a hardware generator to automatically build bitonic sorting architectures on FPGAs. For given input size, data width and data parallelism, the hardware generator specializes both the datapath and the control unit for sorting and generates a design in high level hardware description language. We demonstrate trade-offs among throughput, latency and area using two illustrative sorting designs including a high throughput design and a resource efficient design. With a data parallelism of , the high throughput design sorts an -key sequence with latency , throughput results per cycle and uses memory. This achieves optimal memory efficiency (defined as the ratio of throughput to the amount of on-chip memory used by the design) and outperforms the state-of-the-art. Experimental results show that the designs obtained by our proposed hardware generator achieve 49 to 112 percent improvement in energy efficiency and 56 to 430 percent higher memory efficiency compared with the state-of-the-art.

机译：使用专用硬件加速排序以充分利用大数据应用程序的内存带宽已经引起了研究界的极大兴趣。近来，并行分类网络由于其高数据并行性和低控制开销而被广泛用于硬件实现中。在本文中，我们提出了一种系统的方法，用于将大规模双子分类系统映射到FPGA上。为了实现分类网络中的数据排列，我们通过垂直“折叠”经典Clos网络来开发基于RAM的新颖设计。通过利用提出的数据排列设计，我们开发了一种硬件生成器，可以在FPGA上自动构建双音排序架构。对于给定的输入大小，数据宽度和数据并行性，硬件生成器专门处理数据路径和控制单元以进行排序，并以高级硬件描述语言生成设计。我们使用两种示例性排序设计（包括高吞吐量设计和资源高效设计）演示了吞吐量，延迟和区域之间的权衡。数据并行性为时，高吞吐量设计将按延迟排序键序列，每个周期的吞吐量结果并使用内存。这样可实现最佳的存储效率（定义为吞吐量与设计所使用的片上存储器数量之比），并且性能优于最新技术。实验结果表明，与现有技术相比，我们提出的硬件生成器获得的设计在能源效率上提高了49％至112％，在存储效率方面提高了56％至430％。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2017年第11期|3100-3113|共14页
作者
Ren Chen; Viktor K. Prasanna;
展开▼
作者单位

Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA;

Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Sorting; Hardware; Throughput; Field programmable gate arrays; Generators; Parallel processing; Bandwidth;

机译：分选硬件吞吐量现场可编程门阵列发生器并行处理带宽;

相似文献

外文文献
中文文献
专利

1. On the development of high-throughput and area-efficient multi-mode cryptographic hash designs in FPGAs [J] . Michail H. E., Athanasiou G. S., Theodoridis G., Integration . 2014,第4期

机译：关于FPGA中高通量和面积有效的多模式密码哈希设计的开发
2. A High Throughput and Memory-efficient Regular Expression Matching on FPGA [J] . Xiaoyu Wang, Zhaoguo Wang, Deyun Chen, Journal of information and computational science . 2012,第6期

机译：FPGA上的高吞吐量和内存高效的正则表达式匹配
3. Efficient Designs of Multiported Memory on FPGA [J] . Bo-Cheng Charles Lai, Jiun-Liang Lin Very Large Scale Integration (VLSI) Systems, IEEE Transactions on . 2017,第1期

机译：FPGA上多端口存储器的高效设计
4. Area-efficient high-throughput sorted QR decomposition-based MIMO detector on FPGA [C] . Tong Zhou, Song Guo, Yuanwu Lei, IEEE International Conference on Computer and Communications . 2015

机译：FPGA上基于面积高效的高吞吐量分类QR分解MIMO检测器
5. Computer-Assisted Drug Discovery Part I: Design, Development, Validation and Application of FRESH, a Novel In-Silico High-throughput Screening Program Part II: Monocarbonyl Curcumin Analogues: Heterocyclic Pleiotropic Kinase Inhibitors that Mediate Anticancer Properties Part III: Development of 2nd Generation NAMFIS Software Program [D] . Shi, Qi 2014

机译：计算机辅助药物发现，第一部分：FRESH的设计，开发，验证和应用，一种新型的硅中高通量筛选程序第二部分：单羰基姜黄素类似物：介导抗癌特性的杂环多效性激酶抑制剂第三部分：第二代药物的开发NAMFIS软件程序
6. Efficient Smart CMOS Camera Based on FPGAs Oriented to Embedded Image Processing [O] . Ignacio Bravo, Javier Baliñas, Alfredo Gardel, 2011

机译：基于面向嵌入式图像处理的FPGA的高效智能CMOS相机
7. Communication-Efficient Bitonic Sort on a Distributed Memory Parallel Computer [O] . Yong Cheol Kim, Minsoo Jeon, Dongseung Kim, 2001

机译：分布式内存并行计算机上的高效通信的Bitonic排序

Computer Generation of High Throughput and Memory Efficient Sorting Designs on FPGA

摘要

著录项

相似文献

相关主题

期刊订阅