An Exploration of Designing a Hybrid Scale-Up/Out Hadoop Architecture Based on Performance Measurements

Zhuozhao Li; Haiying Shen; Walter Ligon; Jeffrey Denton

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >An Exploration of Designing a Hybrid Scale-Up/Out Hadoop Architecture Based on Performance Measurements

【24h】

An Exploration of Designing a Hybrid Scale-Up/Out Hadoop Architecture Based on Performance Measurements

机译：基于性能度量设计混合扩展/扩展Hadoop架构的探索

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Scale-up machines perform better for jobs with small and median (KB, MB) data sizes, while scale-out machines perform better for jobs with large (GB, TB) data size. Since a workload usually consists of jobs with different data size levels, we propose building a hybrid Hadoop architecture that includes both scale-up and scale-out machines, which however is not trivial. The first challenge is workload data storage. Thousands of small data size jobs in a workload may overload the limited local disks of scale-up machines. Jobs from scale-up and scale-out machines may both request the same set of data, which leads to data transmission between the machines. The second challenge is to automatically schedule jobs to either scale-up or scale-out cluster to achieve the best performance. We conduct a thorough performance measurement of different applications on scale-up and scale-out clusters, configured with Hadoop Distributed File System (HDFS) and a remote file system (i.e., OFS), respectively. We find that using OFS rather than HDFS can solve the data storage challenge. Also, we identify the factors that determine the performance differences on the scale-up and scale-out clusters and their cross points to make the choice. Accordingly, we design and implement the hybrid scale-up/out Hadoop architecture. Our trace-driven experimental results show that our hybrid architecture outperforms both the traditional Hadoop architecture with HDFS and with OFS in terms of job completion time, throughput and job failure rate.

机译：规模扩大的机器在数据大小为中位数（KB，MB）的工作中表现更好，而规模扩大的机器在数据大小为（GB，TB）的工作中表现更好。由于工作负载通常由具有不同数据大小级别的作业组成，因此我们建议构建一个混合的Hadoop架构，该架构同时包含向上扩展和向外扩展的机器，但这并不是不重要的。第一个挑战是工作负载数据存储。工作负载中的成千上万个小数据量作业可能会使向上扩展计算机的有限本地磁盘过载。向上扩展计算机和向外扩展计算机的作业都可能请求同一组数据，这导致计算机之间的数据传输。第二个挑战是自动将作业调度到向上或向外扩展群集，以实现最佳性能。我们对分别配置了Hadoop分布式文件系统（HDFS）和远程文件系统（即OFS）的横向扩展和横向扩展群集上的不同应用程序进行了全面的性能评估。我们发现使用OFS而不是HDFS可以解决数据存储难题。此外，我们确定了确定在向上扩展和向外扩展集群及其交叉点上的性能差异的因素，以做出选择。因此，我们设计并实现了混合扩展/扩展Hadoop架构。我们的跟踪驱动实验结果表明，在作业完成时间，吞吐量和作业失败率方面，我们的混合架构优于具有HDFS和OFS的传统Hadoop架构。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2017年第2期|386-400|共15页
作者
Zhuozhao Li; Haiying Shen; Walter Ligon; Jeffrey Denton;
展开▼
作者单位

Department of Electrical and Computer Engineering, Clemson University, Clemson, SC;

Department of Electrical and Computer Engineering, Clemson University, Clemson, SC;

Department of Electrical and Computer Engineering, Clemson University, Clemson, SC;

CITI group of CCIT, SC;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Computer architecture; Facebook; Measurement; Distributed databases; Random access memory; Data communication;

机译：计算机体系结构;Facebook;测量;分布式数据库;随机存取存储器;数据通信;

相似文献

外文文献
中文文献
专利

1. High-performance adaptive hybrid wireless NoC architecture based on improved congestion measurement [J] . Jianhua Li, Ning Wu, Xiaoqiang Zhang, IEICE Electronics Express . 2015,第12期

机译：高性能自适应混合无线NOC架构基于改进拥塞测量
2. Data prefetching and file synchronizing for performance optimization in Hadoop-based hybrid cloud [J] . Li Chunlin, Zhang Jing, Chen Yi, The Journal of Systems and Software . 2019,第MAY期

机译：数据预取和文件同步以优化基于Hadoop的混合云中的性能
3. Agent-based architecture for designing hybrid control systems [J] . Grelle C, Ippolito L, Loia V, Information Sciences: An International Journal . 2006,第9期

机译：基于代理的体系结构，用于设计混合控制系统
4. Designing a Hybrid Scale-Up/Out Hadoop Architecture Based on Performance Measurements for High Application Performance [C] . Li Zhuozhao, Shen Haiying International Conference on Parallel Processing . 2015

机译：基于性能度量设计混合扩展/扩展Hadoop架构以实现高应用性能
5. Exploration into the performance of asymmetric d-ary heap-based algorithms for the HSA architecture. [D] . Adams, Stephen Blake. 2014

机译：探索用于HSA架构的基于非对称dary堆的算法的性能。
6. Hybrid architecture for encoded measurement-based quantum computation [O] . M. Zwerger, H. J. Briegel, W. Dür -1

机译：基于编码测量的量子计算的混合架构
7. High-performance adaptive hybrid wireless NoC architecture based on improved congestion measurement [O] . Jianhua Li, Ning Wu, Yongliang Hu, 2015

机译：高性能自适应混合无线NOC架构基于改进拥塞测量

An Exploration of Designing a Hybrid Scale-Up/Out Hadoop Architecture Based on Performance Measurements

摘要

著录项

相似文献

相关主题

期刊订阅