...
首页> 外文期刊>Data & Knowledge Engineering >Design and implementation of ETL processes using BPMN and relational algebra
【24h】

Design and implementation of ETL processes using BPMN and relational algebra

机译:使用BPMN和关系代数的ETL流程的设计与实现

获取原文
获取原文并翻译 | 示例
           

摘要

Extraction, transformation, and loading (ETL) processes are used to extract data from internal and external sources of an organization, transform these data, and load them into a data warehouse. The Business Process Modeling and Notation (BPMN) has been proposed for expressing ETL processes at a conceptual level. A different approach is studied in this paper, where relational algebra (RA), extended with update operations, is used for specifying ETL processes. In this approach, data tasks in an ETL workflow can be automatically translated into SQL queries to be executed over a DBMS. To illustrate this study, the paper addresses the problem of updating Slowly Changing Dimensions (SCDs) with dependencies, that is, the case when updating a SCD table impacts on associated SCD tables. Tackling this problem requires extending the classic RA with update operations. The paper also shows the implementation of a portion of the TPC-DI benchmark that results from both approaches. Thus, the paper presents three implementations: (a) An SQL implementation based on the extended RA-based specification of an ETL process expressed in BPMN4ETL; and (b) Two implementations of workflows that follow from BPMN4ETL, one that uses the Pentaho DI tool, and another one that uses Talend Open Studio for DI. Experiments over these implementations of the TPC-DI benchmark for different scale factors were carried out, and are described and discussed in the paper, showing that the extended RA approach results in more efficient processes than the ones produced by implementing the BPMN4ETL specification over the mentioned ETL tools. The reasons for this result are also discussed.
机译:提取,转换和加载(ETL)进程用于从组织的内部和外部源中提取数据,转换这些数据,并将其加载到数据仓库中。已经提出了业务流程建模和符号(BPMN)在概念层面表达ETL过程。在本文中研究了一种不同的方法,其中使用更新操作扩展的关系代数(RA)用于指定ETL过程。在这种方法中,ETL工作流中的数据任务可以自动转换为在DBMS上执行的SQL查询。为了说明这项研究,该文件解决了利用依赖关系更新缓慢更改尺寸(SCDS)的问题,即,在更新关联的SCD表上的SCD表的影响时,就是这种情况。解决此问题需要使用更新操作扩展经典RA。本文还示出了由两种方法产生的TPC-DI基准的一部分的实现。因此,本文呈现了三种实现:(a)基于BPMN4ETL中表达的基于ETL过程的扩展RA的SQL实现; (b)来自BPMN4ETL的两种工作流程,其中一个使用Pentaho DI工具的工作流程,以及另一个使用Talend Open Studio for Di的另一个。对不同比例因子的TPC-DI基准的这些实现的实验进行了描述,并在论文中描述和讨论,表明扩展的RA方法导致比通过通过所提及的BPMN4ETL规范所生产的更有效的过程ETL工具。还讨论了该结果的原因。

著录项

  • 来源
    《Data & Knowledge Engineering》 |2020年第9期|101837.1-101837.14|共14页
  • 作者单位

    Univ Libre Bruxelles Dept Comp & Decis Engn Ave Roosevelt 50 B-1050 Brussels Belgium;

    Inst Tecnol Buenos Aires Buenos Aires DF Argentina;

    Univ Libre Bruxelles Dept Comp & Decis Engn Ave Roosevelt 50 B-1050 Brussels Belgium;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Data Warehousing; OLAP; ETL; BPMN;

    机译:数据仓库;OLAP;ETL;BPMN;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号