首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >A parallel sort merge join algorithm for managing data skew
【24h】

A parallel sort merge join algorithm for managing data skew

机译:用于管理数据偏斜的并行排序合并联接算法

获取原文
获取原文并翻译 | 示例
           

摘要

A parallel sort-merge-join algorithm which uses a divide-and-conquer approach to address the data skew problem is proposed. The proposed algorithm adds an extra, low-cost scheduling phase to the usual sort, transfer, and join phases. During the scheduling phase, a parallelizable optimization algorithm, using the output of the sort phase, attempts to balance the load across the multiple processors in the subsequent join phase. The algorithm naturally identifies the largest skew elements, and assigns each of them to an optimal number of processors. Assuming a Zipf-like distribution of data skew, the algorithm is demonstrated to achieve very good load balancing for the join phase, and is shown to be very robust relative, among other things, to the degree of data skew and the total number of processors.
机译:提出了一种采用分而治之的并行排序合并联接算法来解决数据偏斜问题。所提出的算法在通常的排序,传输和连接阶段增加了一个额外的,低成本的调度阶段。在调度阶段,使用排序阶段的输出的可并行化优化算法尝试在后续的加入阶段中平衡多个处理器之间的负载。该算法自然地识别出最大的偏斜元素,并将它们中的每一个分配给最佳数量的处理器。假设数据偏斜的分布类似于Zipf,该算法被证明在连接阶段实现了很好的负载平衡,并且除其他因素外,相对于数据偏斜的程度和处理器的总数,鲁棒性非常强。 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号