Measuring Scale-Up and Scale-Out Hadoop with Remote and Local File Systems and Selecting the Best Platform

Zhuozhao Li; Haiying Shen

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Measuring Scale-Up and Scale-Out Hadoop with Remote and Local File Systems and Selecting the Best Platform

【24h】

Measuring Scale-Up and Scale-Out Hadoop with Remote and Local File Systems and Selecting the Best Platform

机译：使用远程和本地文件系统测量Hadoop的横向扩展和横向扩展，并选择最佳平台

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

MapReduce is a popular computing model for parallel data processing on large-scale datasets, which can vary from gigabytes to terabytes and petabytes. Though Hadoop MapReduce normally uses Hadoop Distributed File System (HDFS) local file system, it can be configured to use a remote file system. Then, an interesting question is raised: for a given application, which is the best running platform among the different combinations of scale-up and scale-out Hadoop with remote and local file systems. However, there has been no previous research on how different types of applications (e.g., CPU-intensive, data-intensive) with different characteristics (e.g., input data size) can benefit from the different platforms. Thus, in this paper, we conduct a comprehensive performance measurement of different applications on scale-up and scale-out clusters configured with HDFS and a remote file system (i.e., OFS), respectively. We identify and study how different job characteristics (e.g., input data size, the number of file reads/writes, and the amount of computations) affect the performance of different applications on the different platforms. Based on the measurement results, we also propose a performance prediction model to help users select the best platforms that lead to the minimum latency. Our evaluation using a Facebook workload trace demonstrates the effectiveness of our prediction model. This study is expected to provide a guidance for users to choose the best platform to run different applications with different characteristics in the environment that provides both remote and local storage, such as HPC cluster and cloud environment.

机译：MapReduce是一种流行的计算模型，用于大规模数据集上的并行数据处理，其大小可能从千兆字节到TB到PB级不等。尽管Hadoop MapReduce通常使用Hadoop分布式文件系统（HDFS）本地文件系统，但可以将其配置为使用远程文件系统。然后，提出了一个有趣的问题：对于给定的应用程序，这是横向扩展和横向扩展Hadoop与远程和本地文件系统的不同组合中运行最好的平台。但是，以前没有关于具有不同特征（例如输入数据大小）的不同类型的应用程序（例如CPU密集型，数据密集型）如何从不同平台中受益的研究。因此，在本文中，我们分别对配置有HDFS和远程文件系统（即OFS）的按比例扩展和按比例扩展群集上的不同应用程序进行综合性能评估。我们确定并研究不同的工作特征（例如输入数据大小，文件读/写数量和计算量）如何影响不同平台上不同应用程序的性能。根据测量结果，我们还提出了一种性能预测模型，以帮助用户选择可导致最小延迟的最佳平台。我们使用Facebook工作负载跟踪进行的评估证明了我们预测模型的有效性。预期该研究将为用户选择最佳平台，以在提供远程和本地存储的环境（例如HPC群集和云环境）中运行具有不同特征的不同应用程序提供指导。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2017年第11期|3201-3214|共14页
作者
Zhuozhao Li; Haiying Shen;
展开▼
作者单位

Department of Computer Science, University of Virginia, Charlottesville, VA;

Department of Computer Science, University of Virginia, Charlottesville, VA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Random access memory; Predictive models; File systems; Distributed databases; Program processors; Computational modeling; Facebook;

机译：随机存取存储器;预测模型;文件系统;分布式数据库;程序处理器;计算模型;Facebook;

相似文献

外文文献
中文文献
专利

1. An Efficient Binary Locally Repairable Code for Hadoop Distributed File System [J] . Shahabinejad M., Khabbazian M., Ardakani M. Communications Letters, IEEE . 2014,第8期

机译：Hadoop分布式文件系统的高效二进制本地可修复代码
2. Analyzing Google File System and Hadoop Distributed File System [J] . Nader Gemayel Research Journal of Information Technology . 2016,第3期

机译：分析Google文件系统和Hadoop分布式文件系统
3. Dealing with Small Files Problem in Hadoop Distributed File System [J] . Sachin Bende, Rajashree Shedge Procedia Computer Science . 2016,第1期

机译：Hadoop分布式文件系统中的小文件问题处理
4. Performance Measurement on Scale-Up and Scale-Out Hadoop with Remote and Local File Systems [C] . Zhuozhao Li, Haiying Shen IEEE International Conference on Cloud Computing . 2016

机译：使用远程和本地文件系统的Hadoop向上扩展和向外扩展性能评估
5. P2PHDFS: An implementation of Statistic Multiplexed Computing Architecture in Hadoop File System. [D] . Pradeep, Aakash. 2012

机译：P2PHDFS：Hadoop文件系统中统计复用计算体系结构的实现。
6. Mechanization of Library Procedures in the Medium-sized Medical Library: XII. An Information Retrieval System: A Combination of a Manual Selective Dissemination of Information and a Personal File Indexing System by Computer [O] . Miwa Ohta, Glyn T. Evans 1970

机译：中型医学图书馆的图书馆程序机械化：十二。信息检索系统：信息的手动选择性传播和计算机的个人文件索引系统的组合
7. File Systems and Hadoop Distributed File System in Big Data [O] . G Fayaz Hussain, Tarakeswar T 2016

机译：文件系统和Hadoop分布式文件系统在大数据中

Measuring Scale-Up and Scale-Out Hadoop with Remote and Local File Systems and Selecting the Best Platform

摘要

著录项

相似文献

相关主题

期刊订阅