首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Measuring Scale-Up and Scale-Out Hadoop with Remote and Local File Systems and Selecting the Best Platform
【24h】

Measuring Scale-Up and Scale-Out Hadoop with Remote and Local File Systems and Selecting the Best Platform

机译:使用远程和本地文件系统测量Hadoop的横向扩展和横向扩展,并选择最佳平台

获取原文
获取原文并翻译 | 示例
           

摘要

MapReduce is a popular computing model for parallel data processing on large-scale datasets, which can vary from gigabytes to terabytes and petabytes. Though Hadoop MapReduce normally uses Hadoop Distributed File System (HDFS) local file system, it can be configured to use a remote file system. Then, an interesting question is raised: for a given application, which is the best running platform among the different combinations of scale-up and scale-out Hadoop with remote and local file systems. However, there has been no previous research on how different types of applications (e.g., CPU-intensive, data-intensive) with different characteristics (e.g., input data size) can benefit from the different platforms. Thus, in this paper, we conduct a comprehensive performance measurement of different applications on scale-up and scale-out clusters configured with HDFS and a remote file system (i.e., OFS), respectively. We identify and study how different job characteristics (e.g., input data size, the number of file reads/writes, and the amount of computations) affect the performance of different applications on the different platforms. Based on the measurement results, we also propose a performance prediction model to help users select the best platforms that lead to the minimum latency. Our evaluation using a Facebook workload trace demonstrates the effectiveness of our prediction model. This study is expected to provide a guidance for users to choose the best platform to run different applications with different characteristics in the environment that provides both remote and local storage, such as HPC cluster and cloud environment.
机译:MapReduce是一种流行的计算模型,用于大规模数据集上的并行数据处理,其大小可能从千兆字节到TB到PB级不等。尽管Hadoop MapReduce通常使用Hadoop分布式文件系统(HDFS)本地文件系统,但可以将其配置为使用远程文件系统。然后,提出了一个有趣的问题:对于给定的应用程序,这是横向扩展和横向扩展Hadoop与远程和本地文件系统的不同组合中运行最好的平台。但是,以前没有关于具有不同特征(例如输入数据大小)的不同类型的应用程序(例如CPU密集型,数据密集型)如何从不同平台中受益的研究。因此,在本文中,我们分别对配置有HDFS和远程文件系统(即OFS)的按比例扩展和按比例扩展群集上的不同应用程序进行综合性能评估。我们确定并研究不同的工作特征(例如输入数据大小,文件读/写数量和计算量)如何影响不同平台上不同应用程序的性能。根据测量结果,我们还提出了一种性能预测模型,以帮助用户选择可导致最小延迟的最佳平台。我们使用Facebook工作负载跟踪进行的评估证明了我们预测模型的有效性。预期该研究将为用户选择最佳平台,以在提供远程和本地存储的环境(例如HPC群集和云环境)中运行具有不同特征的不同应用程序提供指导。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号