首页> 外文学位 >Accelerating Mahout on heterogeneous clusters using HadoopCL.
【24h】

Accelerating Mahout on heterogeneous clusters using HadoopCL.

机译:使用HadoopCL在异构集群上加速Mahout。

获取原文
获取原文并翻译 | 示例

摘要

MapReduce is a programming model capable of processing massive data in parallel across hundreds of computing nodes in a cluster. It hides many of the complicated details of parallel computing and provides a straightforward interface for programmers to adapt their algorithms to improve productivity. Many MapReduce-based applications have utilized the power of this model, including machine learning. MapReduce can meet the demands of processing massive data generated by user-server interaction in applications including web search, video viewing and online product purchasing. The Mahout recommendation system is one of the most popular open source recommendation systems that employs machine learning techniques based on MapReduce. Mahout provides a parallel computing infrastructure that can be applied to study a range of different types of datasets.;A complimentary trend occurring in cluster computing is the introduction of GPUs which provide higher bandwidth and data-level parallelism. There have been several efforts that combine the simplicity of the MapReduce framework with the power of GPUs. HadoopCL is one framework that generates OpenCL programs automatically from Java to be executed on heterogeneous architectures in a cluster. It pprovides the infrastructure for utilizing GPUs in a cluster environment.;In this work, we present a detailed description of Mahout recommender system and a profiling of Mahout performance running on multiple nodes in a cluster. We also present a performance evaluation of a Mahout job running on heterogeneous platforms using CPUs, AMD APUs and NVIDIA discrete GPUs with HadoopCL. We choose a time-consuming job in Mahout and manually tune a GPU kernel for it. We also modify the pipeline of HadoopCL from map->reduce to filter->map->reduce that increase the flexibility of HadoopCL in task assignment. Analysis of the performance issues of automatically generated OpenCL GPU program is provided as well as the optimization we make to resolve the issues. We achieve around 1.5 to 2X speedup from using optimized GPU kernel integrated into HadoopCL on a APU cluster and 2X to 4X speedup on a discrete GPU cluster.
机译:MapReduce是一种编程模型,能够跨集群中数百个计算节点并行处理海量数据。它隐藏了并行计算的许多复杂细节,并为程序员提供了一个直接的接口,使他们可以调整算法以提高生产率。许多基于MapReduce的应用程序都利用了该模型的功能,包括机器学习。 MapReduce可以满足在Web搜索,视频观看和在线产品购买等应用程序中处理用户-服务器交互生成的海量数据的需求。 Mahout推荐系统是采用基于MapReduce的机器学习技术的最受欢迎的开源推荐系统之一。 Mahout提供了可用于研究各种不同类型的数据集的并行计算基础结构。集群计算中出现的一种互补趋势是引入了GPU,它们提供了更高的带宽和数据级别的并行性。已经进行了一些努力,将MapReduce框架的简单性与GPU的功能相结合。 HadoopCL是一种框架,可以从Java自动生成OpenCL程序,以在集群中的异构体系结构上执行。它提供了在群集环境中使用GPU的基础结构。在本工作中,我们对Mahout推荐器系统进行了详细描述,并对在群集中多个节点上运行的Mahout性能进行了概述。我们还提供了对使用CPU,AMD APU和NVIDIACL的HadoopCL在异构平台上运行的Mahout作业的性能评估。我们在Mahout中选择一个耗时的工作,并为此手动调整GPU内核。我们还将HadoopCL的管道从map-> reduce修改为filter-> map-> reduce,以增加HadoopCL在任务分配中的灵活性。提供了对自动生成的OpenCL GPU程序的性能问题的分析,以及我们为解决这些问题而进行的优化。通过在APU群集上使用集成到HadoopCL中的优化GPU内核,在离散GPU群集上将2倍提高到4倍,我们可以实现1.5到2倍的提速。

著录项

  • 作者

    Li, Xiangyu.;

  • 作者单位

    Northeastern University.;

  • 授予单位 Northeastern University.;
  • 学科 Engineering Computer.
  • 学位 M.S.
  • 年度 2015
  • 页码 79 p.
  • 总页数 79
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号