Design and implementation of reconfigurable acceleration for in-memory distributed big data computing

Junjie Hou; Yongxin Zhu; Sen Du; Shijin Song

首页> 外文期刊>Future generation computer systems >Design and implementation of reconfigurable acceleration for in-memory distributed big data computing

【24h】

Design and implementation of reconfigurable acceleration for in-memory distributed big data computing

机译：内存分布式大数据计算中可重构加速的设计与实现

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Apache Spark is an efficient distributed computing framework for big data processing. It supports in-memory computation of RDDs (Resilient Distributed Datasets) and provides a provision of reusability, fault tolerance, and real-time stream processing. However, the tasks in Spark framework are only performed on CPU. The low degree of parallelism and power inefficiency of CPU may restrict the performance and scalability of the cluster. In order to improve the performance and power dissipation of the data center, heterogeneous accelerators such as FPGA, GPU, MIC (Many Integrated Core) exhibit more efficient performance than general-purpose processors in big data processing. In this work, we propose a framework to integrate FPGA accelerators into a Spark cluster, which achieves performance improvement and power dissipation reduction for distributed applications. We propose a method for connecting Spark with OpenCL application which is a standard for cross-platform, parallel programming of diverse processors and widely used in heterogeneous computing, and use FPGA to accelerate the Spark tasks developed with Python. We illustrate the performance and the energy efficiency of FPGA based Spark framework with a case study of K-means algorithm acceleration. The results show that FPGA based Spark implementation achieves 3.5x speedup and 4.06x energy efficiency over original Spark framework.

机译：Apache Spark是用于大数据处理的高效分布式计算框架。它支持RDD（弹性分布式数据集）的内存计算，并提供可重用性，容错能力和实时流处理。但是，Spark框架中的任务仅在CPU上执行。 CPU的低并行度和低功耗可能会限制群集的性能和可伸缩性。为了改善数据中心的性能和功耗，在大数据处理中，FPGA，GPU，MIC（许多集成核心）等异构加速器比通用处理器具有更高的性能。在这项工作中，我们提出了一个将FPGA加速器集成到Spark集群中的框架，该框架可提高分布式应用程序的性能并降低功耗。我们提出了一种将Spark与OpenCL应用程序连接的方法，该方法是跨平台，多种处理器并行编程的标准，并广泛用于异构计算中，并使用FPGA来加速使用Python开发的Spark任务。我们以K-means算法加速为例，说明了基于FPGA的Spark框架的性能和能效。结果表明，与原始Spark框架相比，基于FPGA的Spark实现实现了3.5倍的加速和4.06倍的能源效率。

著录项

来源
《Future generation computer systems》 |2019年第3期|68-75|共8页
作者
Junjie Hou; Yongxin Zhu; Sen Du; Shijin Song;
展开▼
作者单位

School of Microelectronics, Shanghai Jiao Tong University;

Shanghai Advanced Research Institute, Chinese Academy of Sciences;

University of Chinese Academy of Sciences;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Spark; Distributed computing; Python; FPGA; OpenCL; High performance; Energy efficiency;

机译：Spark;分布式计算;Python;FPGA;OpenCL;高性能;能效;

相似文献

外文文献
中文文献
专利

1. The Design and Implementation of Combining the Standard of Data and Integrated Water Resource Data in Distributed Cloud Computing Environment [J] . Feng-Cheng Lin, Ting-Wu Ho, Chen-Yu Hao, Computer Science and Information Technology . 2015,第5期

机译：分布式云计算环境中数据标准与水资源综合数据相结合的设计与实现
2. ClimateSpark: An in-memory distributed computing framework for big climate data analytics [J] . Hu Fei, Yang Chaowei, Schnase John L., Computers & geosciences . 2018,第JUNa期

机译：ClimateSpark：用于大气候数据分析的内存分布式计算框架
3. Comparing Implementations of Near-Data Computing with In-Memory MapReduce Workloads [J] . Pugsley Seth H., Jestes Jeffrey, Balasubramonian Rajeev, Micro, IEEE . 2014,第4期

机译：将近数据计算的实现与内存中的MapReduce工作负载进行比较
4. Distributed Mining of Spatial High Utility Itemsets in Very Large Spatiotemporal Databases using Spark In-Memory Computing Architecture [C] . R. Uday Kiran, Sadanori Ito, Minh-Son Dao, IEEE International Conference on Big Data . 2020

机译：使用Spark In-Memory Computing Architecture在非常大的时空数据库中分布挖掘空间高实用程序项集
5. Analysis and acceleration of data mining algorithms on high performance reconfigurable computing platforms [D] . Sun, Song 2011

机译：高性能可重构计算平台数据挖掘算法的分析与加速
6. Design and Implementation of a Secure Computing Environment for Analysis of Sensitive Data at an Academic Medical Center [O] . Peter R. Oxley, John Ruffing, Thomas R. Campion Jr., 2018

机译：学术医学中心用于敏感数据分析的安全计算环境的设计与实现
7. Reconfigurable Computing Based on Commercial FPGAs. Solutions for the Design and Implementation of Partially Reconfigurable Systems = Computación reconfigurable basada en FPGAs comerciales. Soluciones para el diseño e implementación de sistemas parcialmente reconfigurables. [O] . Esteves Krasteva Yana 2009

机译：基于商用FPGA的可重构计算。设计和实现部分可重配置系统的解决方案=基于商用FPGA的可重配置计算。设计和实现部分可重配置系统的解决方案。

Design and implementation of reconfigurable acceleration for in-memory distributed big data computing

摘要

著录项

相似文献

相关主题

期刊订阅