...
首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Link contexts in classifier-guided topical crawlers
【24h】

Link contexts in classifier-guided topical crawlers

机译:在分类器指导的主题搜寻器中链接上下文

获取原文
获取原文并翻译 | 示例
           

摘要

Context of a hyperlink or link context is defined as the terms that appear in the text around a hyperlink within a Web page. Link contexts have been applied to a variety of Web information retrieval and categorization tasks. Topical or focused Web crawlers have a special reliance on link contexts. These crawlers automatically navigate the hyperlinked structure of the Web while using link contexts to predict the benefit of following the corresponding hyperlinks with respect to some initiating topic or theme. Using topical crawlers that are guided by a support vector machine, we investigate the effects of various definitions of link contexts on the crawling performance. We find that a crawler that exploits words both in the immediate vicinity of a hyperlink as well as the entire parent page performs significantly better than a crawler that depends on just one of those cues. Also, we find that a crawler that uses the tag tree hierarchy within Web pages provides effective coverage. We analyze our results along various dimensions such as link context quality, topic difficulty, length of crawl, training data, and topic domain. The study was done using multiple crawls over 100 topics covering millions of pages allowing us to derive statistically strong results.
机译:超链接或链接上下文的上下文定义为出现在网页内超链接周围的文本中的术语。链接上下文已应用于各种Web信息检索和分类任务。主题或重点突出的Web爬网程序特别依赖链接上下文。这些搜寻器会自动导航Web的超链接结构,同时使用链接上下文来预测相对于某些启动主题或主题遵循相应超链接的好处。使用由支持向量机引导的主题爬网程序,我们研究了链接上下文的各种定义对爬网性能的影响。我们发现,利用仅在超链接附近以及整个父页面中使用单词的爬网程序比仅依赖于其中一个线索的爬网程序的性能要好得多。此外,我们发现在网页中使用标记树层次结构的爬网程序提供了有效的覆盖范围。我们从各个方面分析结果,例如链接上下文质量,主题难度,爬网长度,训练数据和主题域。这项研究使用100多个主题的多个爬网完成,覆盖了数百万个页面,使我们能够得出具有统计意义的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号