首页> 外文会议>International Conference on Advanced Communication Technology >ASC: Improving spark driver performance with SPARK automatic checkpoint
【24h】

ASC: Improving spark driver performance with SPARK automatic checkpoint

机译:ASC:通过SPARK自动检查点提高火花驱动器性能

获取原文

摘要

Many great big data processing platforms, for example Hadoop Map Reduce, are keeping improving large-scale data processing performance which make big data processing focus of IT industry. Among them Spark has become increasingly popular big data processing framework since it was presented in 2010 first time. Spark use RDD for its data abstraction, targeting at the multiple iteration large-scale data processing with reuse of data, the in-memory feature of RDD make spark faster than many other non-in-memory big data processing platform. However in-memory feature also bring the volatile problem, a failure or a missing RDD will cause Spark to recompute all the missing RDD on the lineage. And a long lineage will also increasing the time cost and memory usage of Driver analyzing the lineage. A checkpoint will cut off the lineage and save the data which is required in the coming computing, the frequency to make a checkpoint and the RDDs which are selected to save will significantly influence the performance. In this paper, we are presenting an automatic checkpoint algorithm on Spark to help solve the long lineage problem with less influence on the performance. The automatic checkpoint will select the necessary RDD to save and bring an acceptable overhead and improve the time performance for multiple iteration.
机译:许多伟大的大数据处理平台,例如Hadoop Map Reduce,都在不断提高大规模数据处理性能,这使IT行业关注大数据处理。自2010年首次推出以来,Spark已成为越来越流行的大数据处理框架。 Spark使用RDD进行数据抽象,针对多次迭代的大规模数据处理和数据重用,RDD的内存功能使Spark比许多其他非内存大数据处理平台更快。但是内存中功能也会带来不稳定的问题,失败或缺少RDD会导致Spark重新计算沿袭中所有缺少的RDD。较长的沿袭也会增加驱动程序分析沿袭的时间成本和内存使用率。检查点将切断谱系并保存即将进行的计算所需的数据,创建检查点的频率以及选择保存的RDD将极大地影响性能。在本文中,我们提出一种基于Spark的自动检查点算法,以帮助解决长谱系问题,而对性能的影响较小。自动检查点将选择必要的RDD以节省并带来可接受的开销,并提高多次迭代的时间性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号