首页> 中文期刊> 《计算机应用与软件》 >对数据仓库中迟到数据的研究

对数据仓库中迟到数据的研究

         

摘要

在过去的十年,有越来越多的企业开始建造自己的数据仓库。由于数据源的各种原因,数据可能会晚于预期加载到数据仓库中。来自数据源的迟到数据会使得对应的报表和分析变得不正确,同时由于对迟到数据的处理会对数据仓库的日常加载产生很大的影响。介绍在数据仓库中两类迟到数据(维度表和事实表迟到数据)及其处理方法。特别针对周期性快照事实表,提出删除后插入( DELETE-THEN-INSERT)和截除后插入( TRUNCATE-THEN-INSERT)两种刷新数据的方法,并用实验论证了两种方法的特点和适用范围。最后针对截除后插入的方法提出了进一步的方案以提高刷新数据的效率和可用性。%More and more companies have been building their own data warehouses in past decade .Due to all kinds of reasons in regard to data source , data might be loaded to data warehouse later than expectation .The corresponded reports and analysis based on underlying late-arriving data from source systems will become inaccurate , while to process the late-arriving data will have a great impact on data warehouse in its regular loading .We introduce two kinds of late-arriving data in data warehouse ( late-arriving data of dimension table and fact table ) and their solution .Especially for periodic snapshot fact table , we bring up two kinds of data refreshing approach:“DELETE-THEN-INSERT” and“TRUNCATE-THEN-INSERT”, and demonstrate for their characteristics and scope of application with experiment .In end of this paper we present a further solution in light of the approach of “TRUNCATE-THEN-INSERT” for improving its efficiency and availability in data refreshing.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号