首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Memory-Efficient and Skew-Tolerant MapReduce Over MPI for Supercomputing Systems
【24h】

Memory-Efficient and Skew-Tolerant MapReduce Over MPI for Supercomputing Systems

机译:用于超级计算系统的MPI内存高效且歪斜的MAPREDUCE

获取原文
获取原文并翻译 | 示例
           

摘要

Data analytics has become an integral part of large-scale scientific computing. Among various data analytics frameworks, MapReduce has gained the most traction. Although some efforts have been made to enable efficient MapReduce for supercomputing systems, they are often limited to fairly homogeneous workloads where equal partitioning of input data across tasks results in essentially equal output or temporary data generated on each task. For workloads that are more skewed, however, current implementations can result in imbalance in memory usage and, consequently, can cause a slowdown in execution time and a loss in data scalability. To tackle this problem, we enhance a previously published memory-conscious MapReduce over MPI framework called Mimir. Our enhancements to Mimir include combiner and dynamic repartition optimizations to minimize and balance memory usage and to achieve close to optimal balance of the memory usage across processes and to reduce the execution time by up to 12 times. Experimental results show that Mimir can scale to at least 3072 processes on the Tianhe-2 supercomputer on skewed datasets.
机译:数据分析已成为大规模科学计算的一个组成部分。在各种数据分析框架中,MapReduce获得了最多的牵引力。尽管已经进行了一些努力使得能够为超级计算系统实现高效的MapReduce,但它们通常限于相当于同质的工作负载,其中跨任务的输入数据的相同划分导致在每个任务上生成的基本上等于输出或临时数据。然而,对于更偏斜的工作负载,当前实现可能导致内存使用情况不平衡,因此,可能导致执行时间的放缓和数据可伸缩性的损失。为了解决这个问题,我们通过称为MIMIR的MPI框架增强先前发布的内存有用的MapReduce。我们对MIMIR的增强包括组合器和动态重置优化,以最小化和平衡内存使用情况,并实现跨进程内存使用的最佳平衡,并将执行时间降低12次。实验结果表明,MIMIR可以在偏斜数据集上的天河2超级计算机上扩展到至少3072个过程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号