【24h】

Extracting Local Web Communities Using Lexical Similarity

机译:使用词法相似性提取本地Web社区

获取原文

摘要

The World Wide Web contains rich textual contents that are interconnected via complex hyperlinks. Most studies on web community extraction only focus on graph structures. Consequently, web communities are discovered purely in terms of explicit link information without considering textual properties of web pages. This paper proposes an improved algorithm based on Flake's method using the maximum flow algorithm. The improved algorithm considers the differences between edges in terms of importance, and assigns a well-designed capacity to each edge via the lexical similarity of web pages. Given a specific query, it also lends itself to a new and efficient ranking scheme for members in the extracted community. The experimental results indicate that our approach efficiently handles a variety of data sets across a novel optimization strategy of similarity computation.
机译:万维网包含丰富的文本内容,这些内容通过复杂的超链接相互连接。关于网络社区提取的大多数研究都只关注图结构。因此,纯粹根据显式链接信息发现网络社区,而无需考虑网页的文本属性。本文提出了一种基于Flake方法的最大流量算法。改进的算法在重要性方面考虑了边缘之间的差异,并通过网页的词汇相似性为每个边缘分配了精心设计的容量。给定一个特定的查询,它还可以为提取的社区中的成员提供一种新的高效的排名方案。实验结果表明,我们的方法通过一种新型的相似度计算优化策略有效地处理了各种数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号