首页> 中文期刊> 《计算机工程与应用》 >主题搜索引擎中爬虫搜索策略的研究

主题搜索引擎中爬虫搜索策略的研究

         

摘要

In order to solve the low efficiency problem of traditional focused crawler, web spider always selects the most valuable links to visit, so how to focus the search around a given topic is a key problem. The traditional method always only computes the relevance of the links, but ignores the relevance among the unlabeled URL, now it proposes the algorithm based on link model which combines the seed URL with unlabeled URL to compute the relevance of the other URL, and it deduces the point that initial iterative is insensitivity of the results. Compared with the methods based on traditional algorithm, experimental result proves the performance of the new algorithm is more efficient than the traditional ones.%为了解决传统主题爬虫效率偏低的问题,传统主题爬虫会选择最有价值的链接进行访问,仅简单地计算链接的相关性,却忽视待分析URL之间的相关性关系,致使主题爬虫爬取效率较低。提出一种基于链接模型的相关性判别算法,综合利用有标种子URL和无标的待判别URL实现对无标URL的相关性判别,并推导出迭代初值选取对结果的不敏感性。实验结果表明,与传统的网络爬虫算法相关性判别方法相比,提出的方法效率更高。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号