首页> 外文期刊>Journal of Intelligent Information Systems >Optimizing Recursive Information Gathering Plans in EMERAC
【24h】

Optimizing Recursive Information Gathering Plans in EMERAC

机译:在EMERAC中优化递归信息收集计划

获取原文
获取原文并翻译 | 示例
       

摘要

In this paper we describe two optimization techniques that are specially tailored for information gathering. The first is a greedy minimization algorithm that minimizes an information gathering plan by removing redundant and overlapping information sources without loss of completeness. We then discuss a set of heuristics that guide the greedy minimization algorithm so as to remove costlier information sources first. In contrast to previous work, our approach can handle recursive query plans that arise commonly in the presence of constrained sources. Second, we present a method for ordering the access to sources to reduce the execution cost. This problem differs significantly from the traditional database query optimization problem as sources on the Internet have a variety of access limitations and the execution cost in information gathering is affected both by network traffic and by the connection setup costs. Furthermore, because of the autonomous and decentralized nature of the Web, very little cost statistics about the sources may be available. In this paper, we propose a heuristic algorithm for ordering source calls that takes these constraints into account. Specifically, our algorithm takes both access costs and traffic costs into account, and is able to operate with very coarse statistics about sources (i.e., without depending on full source statistics). Finally, we will discuss implementation and empirical evaluation of these methods in Emerac, our prototype information gathering system.
机译:在本文中,我们描述了两种专门为信息收集量身定制的优化技术。第一种是贪婪的最小化算法,它通过删除冗余和重叠的信息源而又不损失完整性,从而使信息收集计划最小化。然后,我们讨论了一组指导贪婪最小化算法的启发式方法,以便首先删除较昂贵的信息源。与以前的工作相比,我们的方法可以处理在存在受限源的情况下经常出现的递归查询计划。其次,我们提出了一种用于订购对源的访问权限以降低执行成本的方法。这个问题与传统的数据库查询优化问题有很大的不同,因为Internet上的源具有各种访问限制,并且信息收集的执行成本受网络流量和连接建立成本的影响。此外,由于Web的自治和分散性质,关于源的成本统计信息很少。在本文中,我们提出了一种启发式算法,用于在排序源调用时考虑了这些约束。具体而言,我们的算法同时考虑了访问成本和流量成本,并且能够在非常粗略的来源统计信息下运行(即,不依赖于完整的来源统计信息)。最后,我们将在原型信息收集系统Emerac中讨论这些方法的实现和经验评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号