首页> 外文会议>IEEE international conference on data engineering >Keyword-based correlated network computation over large social media
【24h】

Keyword-based correlated network computation over large social media

机译:大型社交媒体基于关键字相关网络计算

获取原文

摘要

Recent years have witnessed an unprecedented proliferation of social media, e.g., millions of blog posts, micro-blog posts, and social networks on the Internet. This kind of social media data can be modeled in a large graph where nodes represent the entities and edges represent relationships between entities of the social media. Discovering keyword-based correlated networks of these large graphs is an important primitive in data analysis, from which users can pay more attention about their concerned information in the large graph. In this paper, we propose and define the problem of keyword-based correlated network computation over a massive graph. To do this, we first present a novel tree data structure that only maintains the shortest path of any two graph nodes, by which the massive graph can be equivalently transformed into a tree data structure for addressing our proposed problem. After that, we design efficient algorithms to build the transformed tree data structure from a graph offline and compute the γ-bounded keyword matched subgraphs based on the pre-built tree data structure on the fly. To further improve the efficiency, we propose weighted shingle-based approximation approaches to measure the correlation among a large number of γ-bounded keyword matched subgraphs. At last, we develop a merge-sort based approach to efficiently generate the correlated networks. Our extensive experiments demonstrate the efficiency of our algorithms on reducing time and space cost. The experimental results also justify the effectiveness of our method in discovering correlated networks from three real datasets.
机译:近年来,互联网上有数百万博客职位,微博职位,微博职位和社交网络的前所未有的社会媒体增殖。这种社交媒体数据可以在一个大图中建模,其中节点代表实体和边缘代表社交媒体的实体之间的关系。发现这些大图中基于关键字的相关网络是数据分析中的重要原始,用户可以从中能够更多地关注其在大图中的相关信息。在本文中,我们提出并定义了大量图表上基于关键字相关网络计算的问题。为此,我们首先介绍一种仅维护任何两个图表节点的最短路径的新建树数据结构,通过该路径可以等同地转换为树数据结构,以解决我们提出的问题。之后,我们设计高效的算法,以从脱机的图形构建变换的树数据结构,并根据飞行的预构建的树数据结构计算γ界关键字匹配的子图。为了进一步提高效率,我们提出基于加权的瓦片的近似方法来测量大量γ界关键字匹配子图之间的相关性。最后,我们开发了基于合并的方法,以有效地生成相关网络。我们广泛的实验表明了我们对减少时间和空间成本的算法的效率。实验结果还可以证明我们在从三个真实数据集中发现相关网络时的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号