首页> 中文期刊> 《计算机应用研究》 >一种Spark环境下的高效率大规模图数据处理机制

一种Spark环境下的高效率大规模图数据处理机制

         

摘要

Due to the inefficiency problems in processing,storage and management framework of graph data,this paper pro-posed a feasible processing mechanism of large-scale graph data.It first reviewed the advantages and shortages of existing graph processing models and graph data storage frameworks.By analyzing the characteristics of distributed computing,it im-plemented a new graph data framework including three main parts:segmentation algorithm of large-scale graph,caching and optimization for data extraction,and combination mechanism of calculation and persistence layer.By applying PageRank and SSSP algorithm,it conducted experiments to compare the performance of the proposed framework,MapReduce and Spark with HDFS.Results show that the proposed framework is more 90 times faster than MapReduce,and 2 times faster than Spark with HDFS,and the proposed framework can satisfy the needs of high performance graph data processing.%针对现有的图处理和图管理框架存在的效率低下以及数据存储结构等问题,提出了一种适合大规模图数据的处理机制。首先分析了目前的一些图处理模型以及图存储框架的优势与存在的不足。其次,通过对分布式计算的特性分析采取适合大规模图的分割算法、数据抽取的优化以及缓存、计算层与持久层结合机制三方面来设计图数据处理框架。最后通过PageRank和SSSP算法设计实验,与MapReduce框架和采用HDFS作持久层的Spark框架进行性能对比。实验证明提出的框架要比 MapReduce 框架快90倍,比采用 HDFS 作持久层的Spark框架快2倍,能够满足高效率图数据处理的应用前景。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号