首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Site-Based Partitioning and Repartitioning Techniques for Parallel PageRank Computation
【24h】

Site-Based Partitioning and Repartitioning Techniques for Parallel PageRank Computation

机译:并行PageRank计算的基于站点的分区和重新分区技术

获取原文
获取原文并翻译 | 示例
           

摘要

The PageRank algorithm is an important component in effective web search. At the core of this algorithm are repeated sparse matrix-vector multiplications where the involved web matrices grow in parallel with the growth of the web and are stored in a distributed manner due to space limitations. Hence, the PageRank computation, which is frequently repeated, must be performed in parallel with high-efficiency and low-preprocessing overhead while considering the initial distributed nature of the web matrices. Our contributions in this work are twofold. We first investigate the application of state-of-the-art sparse matrix partitioning models in order to attain high efficiency in parallel PageRank computations with a particular focus on reducing the preprocessing overhead they introduce. For this purpose, we evaluate two different compression schemes on the web matrix using the site information inherently available in links. Second, we consider the more realistic scenario of starting with an initially distributed data and extend our algorithms to cover the repartitioning of such data for efficient PageRank computation. We report performance results using our parallelization of a state-of-the-art PageRank algorithm on two different PC clusters with 40 and 64 processors. Experiments show that the proposed techniques achieve considerably high speedups while incurring a preprocessing overhead of several iterations (for some instances even less than a single iteration) of the underlying sequential PageRank algorithm.
机译:PageRank算法是有效Web搜索中的重要组成部分。该算法的核心是重复的稀疏矩阵矢量乘法,其中涉及的Web矩阵与Web的增长并行增长,并且由于空间限制而以分布式方式存储。因此,在考虑网络矩阵的初始分布式特性的同时,必须以高效和低预处理开销并行执行的频繁重复的PageRank计算。我们在这项工作中的贡献是双重的。我们首先研究最新的稀疏矩阵分区模型的应用,以便在并行PageRank计算中获得高效率,特别着重于减少它们引入的预处理开销。为此,我们使用链接中固有的站点信息在Web矩阵上评估两种不同的压缩方案。其次,我们考虑从初始分配的数据开始的更现实的情况,并扩展我们的算法以覆盖此类数据的重新分区以进行有效的PageRank计算。我们在两个具有40和64个处理器的不同PC群集上并行使用最新的PageRank算法,报告了性能结果。实验表明,所提出的技术可实现相当高的速度,同时又会产生底层顺序PageRank算法的多次迭代(对于某些情况甚至少于一次迭代)的预处理开销。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号