A General Evaluation Framework for Topical Crawlers

P. SRINIVASAN; F. MENCZER; G. PANT

首页> 外文期刊>Information retrieval >A General Evaluation Framework for Topical Crawlers

【24h】

A General Evaluation Framework for Topical Crawlers

机译：主题搜寻者的一般评估框架

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Topical crawlers are becoming important tools to support applications such as specialized Web portals, online searching, and competitive intelligence. As the Web mining field matures, the disparate crawling strategies proposed in the literature will have to be evaluated and compared on common tasks through well-defined performance measures. This paper presents a general framework to evaluate topical crawlers. We identify a Class of tasks that model crawling applications of different nature and difficulty. We then introduce a set of performance measures for fair comparative evaluations of crawlers along several dimensions including generalized notions of precision, recall, and efficiency that are appropriate and practical for the Web. The framework relies on independent relevance judgements compiled by human editors and available from public directories. Two sources of evidence are proposed to assess crawled pages, capturing different relevance criteria. Finally we introduce a set of topic characterizations to analyze the variability in crawling effectiveness across topics. The proposed evaluation framework synthesizes a number of methodologies in the topical crawlers literature and many lessons learned from several studies conducted by our group. The general framework is described in detail and then illustrated in practice by a case study that evaluates four public crawling algorithms. We found that the proposed framework is effective at evaluating, comparing, differentiating and interpreting the performance of the four crawlers. For example, we found the IS crawler to be most sensitive to the popularity of topics.

机译：主题搜寻器正在成为支持诸如专用Web门户，在线搜索和竞争情报之类的应用程序的重要工具。随着Web挖掘领域的成熟，将必须通过定义明确的性能指标评估和比较文献中提出的不同爬网策略。本文提出了评估主题爬虫的通用框架。我们确定了一类任务，该任务为不同性质和难度的爬网应用程序建模。然后，我们引入了一组性能度量，用于沿多个维度对爬虫进行公平的比较评估，包括适用于Web的，实用的广义精度，召回率和效率概念。该框架依赖于人工编辑汇编的独立相关性判断，并且可以从公共目录中获得。提出了两种证据来评估爬网的页面，以捕获不同的相关性标准。最后，我们介绍了一组主题特征，以分析跨主题爬网有效性的可变性。拟议的评估框架综合了局部爬虫文献中的许多方法，以及从我们小组进行的多项研究中吸取的许多教训。对该通用框架进行了详细描述，然后通过案例研究对实际框架进行了说明，该案例评估了四个公共爬网算法。我们发现，提出的框架可以有效地评估，比较，区分和解释这四个爬虫的性能。例如，我们发现IS搜寻器对主题的流行度最为敏感。

著录项

来源
《Information retrieval》 |2005年第3期|p.417-447|共31页
作者
P. SRINIVASAN; F. MENCZER; G. PANT;
展开▼
作者单位

School of Library & Information Science and Department of Management Sciences, The University of Iowa, Iowa City, IA 52242, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类图书馆学、图书馆事业;
关键词
web crawlers; evaluation; tasks; topics; precision; recall; efficiency;

机译：网络爬虫;评估;任务;主题;精度;召回率;效率;

相似文献

外文文献
中文文献
专利

1. Topical Web Crawlers: Evaluating Adaptive Algorithms [J] . FILIPPO MENCZER, GAUTAM PANT, PADMINI SRINIVASAN ACM Transactions on Internet Technology . 2004,第4期

机译：主题Web爬虫：评估自适应算法
2. Machine Learning-Based Topical Web Crawler: An Ensemble Approach Incorporating Meta-Features [J] . Tae Jun Kim, Han- Joon Kim Journal of Engineering & Applied Sciences . 2017,第18期

机译：基于机器学习的主题Web履带：一个包含元特征的合并方法
3. LSI Based Relevance Computation for Topical Web Crawler [J] . Gurmeen Minhas, Mukesh Kumar Journal of Emerging Technologies in Web Intelligence . 2013,第4期

机译：基于LSI的主题网页爬虫的相关性计算
4. Topical Crawler based on multi-level vector space model and optimized hyperlink chosen strategy [C] . Xu Yang, Ai-na Sui, Zhan-kun Tang 9th IEEE International Conference on Cognitive Informatics . 2010

机译：基于多层向量空间模型和优化的超链接选择策略的主题爬虫
5. Learning to crawl: Classifier-guided topical crawlers. [D] . Pant, Gautam. 2004

机译：学习爬网：分类器指导的主题爬网程序。
6. Quantitative evaluation of recall and precision of CAT Crawler a search engine specialized on retrieval of Critically Appraised Topics [O] . Peng Dong, Ling Ling Wong, Sarah Ng, 2004

机译：CAT Crawler的召回率和准确性的定量评估CAT Crawler是专门检索关键评估主题的搜索引擎
7. A General Evaluation Framework for Topical Crawlers [O] . P. Srinivasan, F. Menczer, G. Pant 2003

机译：专题爬虫的一般评估框架
8. Tank 19F Folding Crawler Final Evaluation, Rev. O [R] . Nance, T. 2000

机译：Tank 19F Folding Crawler Final Evaluation，Rev。O.

A General Evaluation Framework for Topical Crawlers

摘要

著录项

相似文献

相关主题

期刊订阅