首页> 外文会议>Annual ACM symposium on applied computing >Addressing the Limited Scope Problem of Focused Crawling using a Result Merging Approach
【24h】

Addressing the Limited Scope Problem of Focused Crawling using a Result Merging Approach

机译:使用结果合并方法解决重点爬行的有限范围问题

获取原文

摘要

Focused crawling refers to a process of fetching domain-specific pages from the Web. It is an important method to build domain-specific document collections, but it suffers from low recall due to the local nature of crawling algorithms associated with Web's community structure. In this study, we address the problem of limited crawling scope of focused crawling using a result merging approach. The results of crawling processes based on different start URL sets and focused crawling methods were merged. We found that merging improves considerably the effectiveness of focused crawling. The results reported here are based on 10 test topics and 140 crawls in the domains of genomics and genetics.
机译:聚焦爬网是指从网上获取特定域的页面的过程。它是构建域的文档集合的重要方法,但由于与Web社区结构相关的爬行算法的本地性质,它受到低召回。在这项研究中,我们解决了使用结果合并方法的集中爬行的有限爬行范围的问题。合并了基于不同启动URL集和聚焦爬网方法的爬行过程的结果。我们发现合并提高了重点爬行的有效性。这里报告的结果基于10个测试主题和140个基因组和遗传学领域的爬行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号