首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >HadoopCL2: Motivating the Design of a Distributed, Heterogeneous Programming System With Machine-Learning Applications
【24h】

HadoopCL2: Motivating the Design of a Distributed, Heterogeneous Programming System With Machine-Learning Applications

机译:HadoopCL2:通过机器学习应用程序促进分布式异构编程系统的设计

获取原文
获取原文并翻译 | 示例
       

摘要

Machine learning (ML) algorithms have garnered increased interest as they demonstrate improved ability to extract meaningful trends from large, diverse, and noisy data sets. While research is advancing the state-of-the-art in ML algorithms, it is difficult to drastically improve the real-world performance of these algorithms. Porting new and existing algorithms from single-node systems to multi-node clusters, or from architecturally homogeneous systems to heterogeneous systems, is a promising optimization technique. However, performing optimized ports is challenging for domain experts who may lack experience in distributed and heterogeneous software development. This work explores how challenges in ML application development on heterogeneous, distributed systems shaped the development of the HadoopCL2 (HCL2) programming system. ML applications guide this work because they exhibit features that make application development difficult: large & diverse datasets, complex algorithms, and the need for domain-specific knowledge. The goal of this work is a general, MapReduce programming system that outperforms existing programming systems. This work evaluates the performance and portability of HCL2 against five ML applications from the Mahout ML framework on two hardware platforms. HCL2 demonstrates speedups of greater than 20x relative to Mahout for three computationally heavy algorithms and maintains minor performance improvements for two I/O bound algorithms.
机译:机器学习(ML)算法显示出从大型,多样且嘈杂的数据集中提取有意义的趋势的能力得到增强,因此受到越来越多的关注。尽管研究正在推动ML算法的最新发展,但要大幅提高这些算法的实际性能却很困难。将新的和现有的算法从单节点系统移植到多节点集群,或者从体系结构同类系统移植到异构系统是一种很有前途的优化技术。但是,对于可能缺乏分布式和异构软件开发经验的领域专家来说,执行优化的端口是一项挑战。这项工作探索了异构,分布式系统上的ML应用程序开发中的挑战如何影响HadoopCL2(HCL2)编程系统的开发。 ML应用程序指导这项工作,因为它们具有使应用程序开发变得困难的功能:庞大而多样的数据集,复杂的算法以及对特定领域知识的需求。这项工作的目标是超越现有编程系统的通用MapReduce编程系统。这项工作针对两个硬件平台上的Mahout ML框架针对五个ML应用程序评估了HCL2的性能和可移植性。对于三种计算量大的算法,HCL2展示出相对于Mahout的提速超过20倍,并且对于两种I / O限制算法,其性能保持较小的提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号