DRAW: A New Data-gRouping-AWare Data Placement Scheme for Data Intensive Applications With Interest Locality

Wang J.; Xiao Q.; Yin J.; Shang P.

首页> 外文期刊>IEEE Transactions on Magnetics >DRAW: A New Data-gRouping-AWare Data Placement Scheme for Data Intensive Applications With Interest Locality

【24h】

DRAW: A New Data-gRouping-AWare Data Placement Scheme for Data Intensive Applications With Interest Locality

机译：DRAW：一种针对具有兴趣局部性的数据密集型应用程序的新的数据组群感知数据放置方案

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recent years have seen an increasing number of scientists employ data parallel computing frameworks such as MapReduce and Hadoop to run data intensive applications and conduct analysis. In these co-located compute and storage frameworks, a wise data placement scheme can significantly improve the performance. Existing data parallel frameworks, e.g., Hadoop, or Hadoop-based clouds, distribute the data using a random placement method for simplicity and load balance. However, we observe that many data intensive applications exhibit interest locality which only sweep part of a big data set. The data often accessed together result from their grouping semantics. Without taking data grouping into consideration, the random placement does not perform well and is way below the efficiency of optimal data distribution. In this paper, we develop a new Data-gRouping-AWare (DRAW) data placement scheme to address the above-mentioned problem. DRAW dynamically scrutinizes data access from system log files. It extracts optimal data groupings and re-organizes data layouts to achieve the maximum parallelism per group subjective to load balance. By experimenting two real-world MapReduce applications with different data placement schemes on a 40-node test bed, we conclude that DRAW increases the total number of local map tasks executed up to 59.8%, reduces the completion latency of the map phase up to 41.7%, and improves the overall performance by 36.4%, in comparison with Hadoop's default random placement.

机译：近年来，越来越多的科学家采用MapReduce和Hadoop等数据并行计算框架来运行数据密集型应用程序并进行分析。在这些位于同一位置的计算和存储框架中，明智的数据放置方案可以显着提高性能。现有的数据并行框架（例如Hadoop或基于Hadoop的云）使用随机放置方法分发数据以简化操作并实现负载平衡。但是，我们观察到许多数据密集型应用程序都显示出兴趣局部性，这些兴趣局部性仅席卷大数据集的一部分。经常一起访问的数据是由它们的分组语义产生的。如果不考虑数据分组，则随机放置的效果不佳，并且远低于最佳数据分发的效率。在本文中，我们开发了一种新的Data-gRouping-AWare（DRAW）数据放置方案来解决上述问题。 DRAW动态检查来自系统日志文件的数据访问。它提取最佳数据分组并重新组织数据布局，以使每组最大并行度受负载平衡的影响。通过在40个节点的测试台上使用不同的数据放置方案对两个真实世界的MapReduce应用程序进行实验，我们得出结论，DRAW将执行的本地地图任务总数提高了59.8％，将地图阶段的完成延迟降低了41.7 ％，并且与Hadoop的默认随机放置相比，整体性能提高了36.4％。

著录项

来源
《IEEE Transactions on Magnetics》 |2013年第6apart1期|2514-2520|共7页
作者
Wang J.; Xiao Q.; Yin J.; Shang P.;
展开▼
作者单位

Department of Electrical Engineering and Computer Science, , University of Central Florida,, Orlando,, USA|c|;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Data intensive; Hadoop; MapReduce; data layout;

机译：数据密集型;Hadoop;MapReduce;数据布局;

相似文献

外文文献
中文文献
专利

1. DPPACS: A Novel Data Partitioning and Placement Aware Computation Scheduling Scheme for Data-Intensive Cloud Applications [J] . K. Hemant Kumar Reddy, Diptendu Sinha Roy The Computer journal . 2016,第1期

机译：DPPACS：针对数据密集型云应用程序的新型数据分区和布局感知计算调度方案
2. A new data-grouping-aware dynamic data placement method that take into account jobs execute frequency for Hadoop [J] . Wu Jia-xuan, Zhang Chang-sheng, Zhang Bin, Microprocessors and microsystems . 2016,第nova期

机译：一种新的可识别数据分组的动态数据放置方法，该方法将作业的执行频率考虑进了Hadoop
3. Novel data-placement scheme for improving the data locality of Hadoop in heterogeneous environments [J] . Bae Minho, Yeo Sangho, Park Gyudong, Concurrency and computation: practice and experience . 2021,第18期

机译：用于改善异构环境中Hadoop数据局部的新型数据放置方案
4. DRAW: A new Data-gRouping-AWare data placement scheme for data intensive applications with interest locality [C] . Shang Pengju, Xiao Qiangju, Wang Jun Asia-Pacific Magnetic Recording Conference 2012 : Digest. . 2012

机译：DRAW：一种新的Data-gRouping-AWare数据放置方案，用于具有感兴趣位置的数据密集型应用程序
5. Adapting Data Representations for Optimizing Data-Intensive Applications. [D] . Kusum, Amlan. 2016

机译：调整数据表示形式以优化数据密集型应用程序。
6. Impact study of data locality on task-based applications through the Heteroprio scheduler [O] . Bérenger Bramas 2019

机译：通过Heteropro调度程序对基于任务的应用程序的影响研究
7. Significance of Hierarchical and Markov Clustering in Grouping Aware Data Placement for Data Intensive Applications Having Interest Locality [O] . Vengadeswaran Shanmugasundaram, Balasundaram Sadhu Ramakrishnan 2018

机译：分层和Markov聚类在分组数据密集应用程序分组数据展示中的意义

DRAW: A New Data-gRouping-AWare Data Placement Scheme for Data Intensive Applications With Interest Locality

摘要

著录项

相似文献

相关主题

期刊订阅