首页> 外文会议>International conference on very large data bases >An Experimental Evaluation of SimRank-based Similarity Search Algorithms
【24h】

An Experimental Evaluation of SimRank-based Similarity Search Algorithms

机译:基于SimRank的相似性搜索算法的实验评估

获取原文

摘要

Given a graph, SimRank is one of the most popular measures of the similarity between two vertices. We focus on efficiently calculating SimRank, which has been studied intensively over the last decade. This has led to many algorithms that efficiently calculate or approximate SimRank being proposed by researchers. Despite these abundant research efforts, there is no systematic comparison of these algorithms. In this paper, we conduct a study to compare these algorithms to understand their pros and cons. We first introduce a taxonomy for different algorithms that calculate SimRank and classify each algorithm into one of the following three classes, namely, iterative-, non-iterative-, and random walk-based method. We implement ten algorithms published from 2002 to 2015, and compare them using synthetic and real-world graphs. To ensure the fairness of our study, our implementations use the same data structure and execution framework, and we try our best to optimize each of these algorithms. Our study reveal-s that none of these algorithms dominates the others: algorithms based on iterative method often have higher accuracy while algorithms based on random walk can be more scalable. One non-iterative algorithm has good effectiveness and efficiency on graphs with medium size. Thus, depending on the requirements of different applications, the optimal choice of algorithms differs. This paper provides an empirical guideline for making such choices.
机译:给定一个图,SimRank是两个顶点之间相似度最受欢迎的度量之一。我们专注于有效地计算SimRank,在过去十年中对此进行了深入研究。这导致研究人员提出了许多有效计算或近似SimRank的算法。尽管进行了大量的研究工作,但这些算法尚无系统的比较。在本文中,我们进行了一项比较这些算法的研究,以了解它们的优缺点。我们首先介绍用于计算SimRank的不同算法的分类法,并将每种算法分为以下三类之一,即基于迭代,非迭代和基于随机游走的方法。我们实施了从2002年到2015年发布的十种算法,并使用合成图和真实图进行比较。为了确保我们研究的公平性,我们的实现使用相同的数据结构和执行框架,并且我们会尽力优化每种算法。我们的研究揭示了-这些算法中没有一个能在其他算法中占主导地位:基于迭代方法的算法通常具有更高的准确性,而基于随机游走的算法可以具有更高的可扩展性。一种非迭代算法对中等大小的图具有良好的有效性和效率。因此,根据不同应用程序的需求,算法的最佳选择会有所不同。本文提供了进行此类选择的经验指导。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号