【24h】

Dynamic Reconfiguration of Data Parallel Programs

机译:动态重新配置数据并行程序

获取原文

摘要

Given the large amount of data from different sources that have become available to researchers in multiple fields, Data Science has emerged as a new paradigm for exploring and getting value from that data. In that context, new parallel processing environments with abstract programming interfaces, like Spark, were proposed to try to simplify the development of distributed programs. Although such solutions have become widely used, achieving the best performance with them is still not always straight-forward, despite the multiple run-time strategies they use. In this work we analyze some of the causes of performance degradation in such systems and, based on that analysis, we propose a tool to improve performance by dynamically adjusting data partitioning and parallelism degree in recurrent applications based on previous executions. Our results applying that methodology show consistent reductions in execution time for the applications considered, with gains of up to 50%.
机译:鉴于来自不同来源的大量数据已可供多个领域的研究人员使用,数据科学已成为探索和从这些数据中获取价值的新范例。在这种情况下,提出了具有抽象编程接口的新并行处理环境,例如Spark,以尝试简化分布式程序的开发。尽管此类解决方案已被广泛使用,但尽管使用了多种运行时策略,但仍无法始终直接实现最佳性能。在这项工作中,我们分析了此类系统性能下降的一些原因,并在此分析基础上,我们提出了一种工具,该工具可通过基于先前的执行动态调整循环应用程序中的数据分区和并行度来提高性能。我们使用该方法的结果表明,所考虑的应用程序的执行时间持续减少,最多可增加50%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号