首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >An Exploration of Designing a Hybrid Scale-Up/Out Hadoop Architecture Based on Performance Measurements
【24h】

An Exploration of Designing a Hybrid Scale-Up/Out Hadoop Architecture Based on Performance Measurements

机译:基于性能度量设计混合扩展/扩展Hadoop架构的探索

获取原文
获取原文并翻译 | 示例
           

摘要

Scale-up machines perform better for jobs with small and median (KB, MB) data sizes, while scale-out machines perform better for jobs with large (GB, TB) data size. Since a workload usually consists of jobs with different data size levels, we propose building a hybrid Hadoop architecture that includes both scale-up and scale-out machines, which however is not trivial. The first challenge is workload data storage. Thousands of small data size jobs in a workload may overload the limited local disks of scale-up machines. Jobs from scale-up and scale-out machines may both request the same set of data, which leads to data transmission between the machines. The second challenge is to automatically schedule jobs to either scale-up or scale-out cluster to achieve the best performance. We conduct a thorough performance measurement of different applications on scale-up and scale-out clusters, configured with Hadoop Distributed File System (HDFS) and a remote file system (i.e., OFS), respectively. We find that using OFS rather than HDFS can solve the data storage challenge. Also, we identify the factors that determine the performance differences on the scale-up and scale-out clusters and their cross points to make the choice. Accordingly, we design and implement the hybrid scale-up/out Hadoop architecture. Our trace-driven experimental results show that our hybrid architecture outperforms both the traditional Hadoop architecture with HDFS and with OFS in terms of job completion time, throughput and job failure rate.
机译:规模扩大的机器在数据大小为中位数(KB,MB)的工作中表现更好,而规模扩大的机器在数据大小为(GB,TB)的工作中表现更好。由于工作负载通常由具有不同数据大小级别的作业组成,因此我们建议构建一个混合的Hadoop架构,该架构同时包含向上扩展和向外扩展的机器,但这并不是不重要的。第一个挑战是工作负载数据存储。工作负载中的成千上万个小数据量作业可能会使向上扩展计算机的有限本地磁盘过载。向上扩展计算机和向外扩展计算机的作业都可能请求同一组数据,这导致计算机之间的数据传输。第二个挑战是自动将作业调度到向上或向外扩展群集,以实现最佳性能。我们对分别配置了Hadoop分布式文件系统(HDFS)和远程文件系统(即OFS)的横向扩展和横向扩展群集上的不同应用程序进行了全面的性能评估。我们发现使用OFS而不是HDFS可以解决数据存储难题。此外,我们确定了确定在向上扩展和向外扩展集群及其交叉点上的性能差异的因素,以做出选择。因此,我们设计并实现了混合扩展/扩展Hadoop架构。我们的跟踪驱动实验结果表明,在作业完成时间,吞吐量和作业失败率方面,我们的混合架构优于具有HDFS和OFS的传统Hadoop架构。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号