...
首页> 外文期刊>Data & Knowledge Engineering >QETL: An approach to on-demand ETL from non-owned data sources
【24h】

QETL: An approach to on-demand ETL from non-owned data sources

机译:QETL:一种从非自有数据源进行按需ETL的方法

获取原文
获取原文并翻译 | 示例
           

摘要

In traditional OLAP systems, the ETL process loads all available data in the data warehouse before users start querying them. In some cases, this may be either inconvenient (because data are supplied from a provider for a fee) or unfeasible (because of their size); on the other hand, directly launching each analysis query on source data would not enable data reuse, leading to poor performance and high costs. The alternative investigated in this paper is that of fetching and storing data on-demand, i.e., as they are needed during the analysis process. In this direction we propose the Query-Extract-Transform-Load (QETL) paradigm to feed a multidimensional cube; the idea is to fetch facts from the source data provider, load them into the cube only when they are needed to answer some OLAP query, and drop them when some free space is needed to load other facts. Remarkably, QETL includes an optimization step to cheaply extract the required data based on the specific features of the data provider. The experimental tests, made on a real case study in the genomics area, show that QETL effectively reuses data to cut extraction costs, thus leading to significant performance improvements.
机译:在传统的OLAP系统中,ETL流程在用户开始查询之前将所有可用数据加载到数据仓库中。在某些情况下,这可能是不方便的(因为从提供者处付费提供数据)或不可行的(由于其大小);另一方面,直接对源数据启动每个分析查询将无法实现数据重用,从而导致性能低下和成本高昂。本文研究的替代方案是按需获取和存储数据,即在分析过程中需要的数据。在这个方向上,我们提出了查询-提取-转换-加载(QETL)范例来提供多维多维数据集。想法是从源数据提供程序中获取事实,仅在需要它们来回答某些OLAP查询时才将它们加载到多维数据集中,并在需要一些可用空间来加载其他事实时将其删除。值得注意的是,QETL包含一个优化步骤,可以根据数据提供者的特定功能廉价地提取所需数据。在基因组学领域的实际案例研究上进行的实验测试表明,QETL有效地重用了数据以降低提取成本,从而显着改善了性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号