首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Fast parallel sorting under LogP: experience with the CM-5
【24h】

Fast parallel sorting under LogP: experience with the CM-5

机译:LogP下的快速并行排序:CM-5的经验

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we analyze four parallel sorting algorithms (bitonic, column, radix, and sample sort) with the LogP model. LogP characterizes the performance of modern parallel machines with a small set of parameters: the communication latency (L), overhead (o), bandwidth (g), and the number of processors (P). We develop implementations of these algorithms in Split-C, a parallel extension to C, and compare the performance predicted by LogP to actual performance on a CM-5 of 32 to 512 processors for a range of problem sizes. We evaluate the robustness of the algorithms by varying the distribution and ordering of the key values. We also briefly examine the sensitivity of the algorithms to the communication parameters. We show that the LogP model is a valuable guide in the development of parallel algorithms and a good predictor of implementation performance. The model encourages the use of data layouts which minimize communication and balanced communication schedules which avoid contention. With an empirical model of local processor performance, LogP predictions closely match observed execution times on uniformly distributed keys across a broad range of problem and machine sizes. We find that communication performance is oblivious to the distribution of the key values, whereas the local processor performance is not; some communication phases are sensitive to the ordering of keys due to contention. Finally, our analysis shows that overhead is the most critical communication parameter in the sorting algorithms.
机译:在本文中,我们使用LogP模型分析了四种并行的排序算法(双峰,列,基数和样本排序)。 LogP通过少量参数来表征现代并行机的性能:通信等待时间(L),开销(o),带宽(g)和处理器数量(P)。我们在Split-C(C的并行扩展)中开发了这些算法的实现,并将LogP预测的性能与32至512个处理器的CM-5上针对一系列问题大小的实际性能进行了比较。我们通过更改键值的分布和顺序来评估算法的鲁棒性。我们还简要检查了算法对通信参数的敏感性。我们证明,LogP模型是并行算法开发中的宝贵指南,也是实现性能的良好预测指标。该模型鼓励使用尽量减少通信的数据布局和避免争用的平衡通信计划。借助本地处理器性能的经验模型,LogP预测与在广泛的问题和机器尺寸范围内均匀分布的键上观察到的执行时间紧密匹配。我们发现,通信性能不会影响键值的分布,而本地处理器的性能则不会。由于争用,某些通信阶段对密钥的顺序很敏感。最后,我们的分析表明,开销是排序算法中最关键的通信参数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号