首页> 外文会议>International Conference on Information Technology - New Generations >On The Performance of Apache Hadoop in a Tiny Private IaaS Cloud
【24h】

On The Performance of Apache Hadoop in a Tiny Private IaaS Cloud

机译:论Apache Hadoop在小型私有IAAS云中的表现

获取原文

摘要

High performance and parallel computing are traditionally implemented on very large dedicated compute clusters. However, as many organizations begin to adopt service-oriented cloud-based infrastructures, we can expect to see the development of parallel computing in the cloud. The goal of a parallel compute cluster is to divide a large job into several small jobs, execute the small jobs in parallel on many compute nodes, and then combine the results in some coherent manner. The biggest hurdle in moving this type of service to a cloud-based infrastructure is that performance will undoubtedly be affected by many factors, particularly those related to virtualization in clouds, such as memory and CPU overhead, limited resources, and others relating to hardware virtualization. In order to fully understand how virtualization can affect parallel computing in a tiny private cloud, we have devised four case studies that examine the performance of Apache Hadoop in varying environments on our private cloud. Our case studies are comprised of a baseline or bare metal (non-virtualized) cluster deployment consisting of seven nodes, a seven-node virtual machine cluster, a twenty-node virtual machine cluster, and an optimized seven-node virtual machine cluster. Results show that, although small data sets result in comparable job completion times, as the data size increases the performance of Apache Hadoop is affected greatly by virtualization even when we attempt to optimize the configuration of our cloud.
机译:高性能和并行计算传统上在非常大的专用计算集群上实现。然而,由于许多组织开始采用面向服务的基于云的基础设施,我们可以期望看到云中的并行计算的开发。并行计算群集的目标是将大量作业分为几个小型作业,在许多计算节点上并行执行小作业,然后以某种相干方式组合结果。将这种服务移动到基于云的基础架构中最大的障碍是,性能无疑会受到许多因素的影响,特别是与云中虚拟化相关的因素,例如内存和CPU开销,有限的资源和与硬件虚拟化有关的其他因素相关的因素。为了完全了解虚拟化如何在微小的私有云中产生平行计算,我们已经设计了四种案例研究,检查Apache Hadoop在私有云上的不同环境中的性能。我们的案例研究包括由七个节点,七节点虚拟机群,二十节点虚拟机群集和优化的七节点虚拟机集群组成的基线或裸机(非虚拟化)群集部署组成。结果表明,尽管小数据集导致相当的作业完成时间,但随着数据大小的增加,即使我们尝试优化我们的云的配置,Apache Hadoop的性能也受到虚拟化的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号