...
首页> 外文期刊>Cluster computing >Scatter-Gather-Merge: An efficient star-join query processing algorithm for data-parallel frameworks
【24h】

Scatter-Gather-Merge: An efficient star-join query processing algorithm for data-parallel frameworks

机译:Scatter-Gather-Merge:针对数据并行框架的高效星型联接查询处理算法

获取原文
获取原文并翻译 | 示例
           

摘要

A data-parallel framework is very attractive for large-scale data processing since it enables such an application to easily process a huge amount of data on commodity machines. MapReduce, a popular data-parallel framework, is used in various fields such as web search, data mining and data warehouses; it is proven to be very practical for such a data-parallel application. A star-join query is a popular query in data warehouses that are a current target domain of data-parallel frameworks. This article proposes a new algorithm that efficiently processes star-join queries in data-parallel frameworks such as MapReduce and Dryad. Our star-join algorithm for general data-parallel frameworks is called Scatter-Gather-Merge, and it processes star-join queries in a constant number of computation steps, although the number of participating dimension tables increases. By adopting bloom filters, Scatter-Gather-Merge reduces a non-trivial amount of IO. We also show that Scatter-Gather-Merge can be easily applied to MapReduce. Our experimental results in both cluster and cloud environments show that Scatter-Gather-Merge outperforms existing approaches.
机译:数据并行框架对于大规模数据处理非常有吸引力,因为它使此类应用程序可以轻松地在商用机器上处理大量数据。 MapReduce是一种流行的数据并行框架,用于Web搜索,数据挖掘和数据仓库等各个领域。它被证明对于这种数据并行应用非常实用。星型联接查询是数据仓库中流行的查询,数据仓库是数据并行框架的当前目标域。本文提出了一种新算法,该算法可在诸如MapReduce和Dryad之类的数据并行框架中有效处理星形联接查询。我们用于一般数据并行框架的星型联接算法称为Scatter-Gather-Merge,尽管参与的维表数量增加,但它以恒定数量的计算步骤处理星型联接查询。通过采用Bloom过滤器,Scatter-Gather-Merge可以减少大量的IO。我们还展示了Scatter-Gather-Merge可以轻松地应用于MapReduce。我们在集群和云环境中的实验结果表明,分散-聚集-合并优于现有方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号