...
首页> 外文期刊>IEEE Transactions on Magnetics >DRAW: A New Data-gRouping-AWare Data Placement Scheme for Data Intensive Applications With Interest Locality
【24h】

DRAW: A New Data-gRouping-AWare Data Placement Scheme for Data Intensive Applications With Interest Locality

机译:DRAW:一种针对具有兴趣局部性的数据密集型应用程序的新的数据组群感知数据放置方案

获取原文
获取原文并翻译 | 示例
           

摘要

Recent years have seen an increasing number of scientists employ data parallel computing frameworks such as MapReduce and Hadoop to run data intensive applications and conduct analysis. In these co-located compute and storage frameworks, a wise data placement scheme can significantly improve the performance. Existing data parallel frameworks, e.g., Hadoop, or Hadoop-based clouds, distribute the data using a random placement method for simplicity and load balance. However, we observe that many data intensive applications exhibit interest locality which only sweep part of a big data set. The data often accessed together result from their grouping semantics. Without taking data grouping into consideration, the random placement does not perform well and is way below the efficiency of optimal data distribution. In this paper, we develop a new Data-gRouping-AWare (DRAW) data placement scheme to address the above-mentioned problem. DRAW dynamically scrutinizes data access from system log files. It extracts optimal data groupings and re-organizes data layouts to achieve the maximum parallelism per group subjective to load balance. By experimenting two real-world MapReduce applications with different data placement schemes on a 40-node test bed, we conclude that DRAW increases the total number of local map tasks executed up to 59.8%, reduces the completion latency of the map phase up to 41.7%, and improves the overall performance by 36.4%, in comparison with Hadoop's default random placement.
机译:近年来,越来越多的科学家采用MapReduce和Hadoop等数据并行计算框架来运行数据密集型应用程序并进行分析。在这些位于同一位置的计算和存储框架中,明智的数据放置方案可以显着提高性能。现有的数据并行框架(例如Hadoop或基于Hadoop的云)使用随机放置方法分发数据以简化操作并实现负载平衡。但是,我们观察到许多数据密集型应用程序都显示出兴趣局部性,这些兴趣局部性仅席卷大数据集的一部分。经常一起访问的数据是由它们的分组语义产生的。如果不考虑数据分组,则随机放置的效果不佳,并且远低于最佳数据分发的效率。在本文中,我们开发了一种新的Data-gRouping-AWare(DRAW)数据放置方案来解决上述问题。 DRAW动态检查来自系统日志文件的数据访问。它提取最佳数据分组并重新组织数据布局,以使每组最大并行度受负载平衡的影响。通过在40个节点的测试台上使用不同的数据放置方案对两个真实世界的MapReduce应用程序进行实验,我们得出结论,DRAW将执行的本地地图任务总数提高了59.8%,将地图阶段的完成延迟降低了41.7 %,并且与Hadoop的默认随机放置相比,整体性能提高了36.4%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号