...
首页> 外文期刊>Journal of supercomputing >Optimizing the Hadoop MapReduce Framework with high-performance storage devices
【24h】

Optimizing the Hadoop MapReduce Framework with high-performance storage devices

机译:使用高性能存储设备优化Hadoop MapReduce框架

获取原文
获取原文并翻译 | 示例
           

摘要

Solid-state drives (SSDs) are an attractive alternative to hard disk drives (HDDs) to accelerate the Hadoop MapReduce Framework. However, the SSD characteristics and today's Hadoop framework exhibit mismatches that impede indiscriminate SSD integration. This paper explores how to optimize a Hadoop MapReduce Framework with SSDs in terms of performance, cost, and energy consumption. It identifies extensible best practices that can exploit SSD benefits within Hadoop when combined with high network bandwidth and increased parallel storage access. Our Terasort benchmark results demonstrate that Hadoop currently does not sufficiently exploit SSD throughput. Hence, using faster SSDs in Hadoop does not enhance its performance. We show that SSDs presently deliver significant efficiency when storing intermediate Hadoop data, leaving HDDs for Hadoop Distributed File System (HDFS). The proposed configuration is optimized with the JVM reuse option and frequent heartbeat interval option. Moreover, we examined the performance of a state-of-the-art non-volatile memory express interface SSD within the Hadoop MapReduce Framework. While HDFS read and write throughput increases with high-performance SSDs, achieving complete system performance improvement requires carefully balancing CPU, network, and storage resource capabilities at a system level.
机译:固态驱动器(SSD)是硬盘驱动器(HDD)的一种有吸引力的替代方案,可以加速Hadoop MapReduce框架。但是,SSD的特性和当今的Hadoop框架显示出不匹配的情况,阻碍了不加选择的SSD集成。本文探讨了如何在性能,成本和能耗方面使用SSD优化Hadoop MapReduce框架。它确定了可扩展的最佳实践,当与高网络带宽和增加的并行存储访问结合使用时,可以利用Hadoop中的SSD优势。我们的Terasort基准测试结果表明,Hadoop当前无法充分利用SSD吞吐量。因此,在Hadoop中使用更快的SSD不会提高其性能。我们证明,当存储中间Hadoop数据时,SSD目前可提供显着的效率,而将HDD留给Hadoop分布式文件系统(HDFS)。使用JVM重用选项和频繁心跳间隔选项对建议的配置进行了优化。此外,我们检查了Hadoop MapReduce框架中最新的非易失性存储器快速接口SSD的性能。尽管高性能SSD可以提高HDFS的读写吞吐量,但要实现完整的系统性能改善,就需要在系统级别上谨慎地平衡CPU,网络和存储资源的功能。

著录项

  • 来源
    《Journal of supercomputing》 |2015年第9期|3525-3548|共24页
  • 作者单位

    Texas A&M Univ, Dept Elect & Comp Engn, College Stn, TX 77840 USA;

    Korea Aerosp Univ, Sch Elect & Informat Engn, Goyang Si, South Korea;

    Samsung Semicond Incorp, Adv Datactr Solut Grp, Milpitas, CA 95036 USA;

    Samsung Semicond Incorp, Adv Datactr Solut Grp, Milpitas, CA 95036 USA;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Performance; Storage; SSD; Hadoop; MapReduce; HDFS;

    机译:性能;存储;SSD;Hadoop;MapReduce;HDFS;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号