【24h】

Towards Building a Comprehensive Data Mart

机译:建立一个综合的数据集市

获取原文
获取原文并翻译 | 示例

摘要

To uncover new relationships or patterns one must first build a corpus of data or what some call a data mart. However, when we use the internet to build this corpus we must question how we make sure we have collected all the pertinent data and have maximized coverage? There are hundreds of search engines that are available for use on the Internet today. Which one is best? Is one better for one problem and a second better for another? Are meta-search engines better than individual search engines? In this paper we look at one possible approach in developing a methodology to maximize coverage. Before we present this methodology, we first provide motivation towards the need for increased coverage. We next investigate how we can obtain ground truth and what the ground truth can provide us in the way of some insight into the size of the Internet and search engine capabilities. We then conclude our discussion by developing a methodology in which we compare a number of the search engines and how we can increase overall coverage and thus develop a more inclusive data mart.
机译:要发现新的关系或模式,必须首先建立数据语料库或某些人所谓的数据集市。但是,当我们使用互联网建立语料库时,我们必须质疑如何确保我们收集了所有相关数据并最大限度地扩大了覆盖范围?今天,有数百种搜索引擎可在Internet上使用。哪一个最好?一个对一个问题更好,第二个对另一个问题更好吗?元搜索引擎比单个搜索引擎好吗?在本文中,我们着眼于开发一种最大化覆盖率的方法。在介绍这种方法之前,我们首先提供增加覆盖范围的动力。接下来,我们将研究如何获取基本事实以及基本事实可以以某种方式洞察互联网的规模和搜索引擎功能为我们提供什么。然后,我们通过开发一种方法来结束我们的讨论,在该方法中,我们将比较多个搜索引擎以及如何增加总体覆盖率,从而开发更具包容性的数据集市。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号