首页> 外文期刊>Journal of Intelligent Information Systems >Optimization in Data Cube System Design
【24h】

Optimization in Data Cube System Design

机译:数据立方体系统设计中的优化

获取原文
获取原文并翻译 | 示例
       

摘要

The design of an OLAP system for supporting real-time queries is one of the major research issues. One approach is to use data cubes, which are materialized precomputed multidimensional views of data in a data warehouse. We can derive a set of data cubes to answer each frequently asked query directly. However, there are two practical problems: (1) the maintenance cost of the data cubes, and (2) the query cost to answer those queries. Maintaining a data cube requires disk storage and CPU computation, so the maintenance cost is related to the total size as well as the total number of data cubes materialized. In most cases, materializing all data cubes is impractical. The maintenance cost may be reduced by merging some data cubes. However, the resulting larger data cubes will increase the query cost of answering some queries. If the bounds on the maintenance cost and the query cost are too strict, we help the user decide which queries to be sacrificed and not taken into consideration. We have defined an optimization problem in data cube system design. Given a maintenance-cost bound, a query-cost bound and a set of frequently asked queries, it is necessary to determine a set of data cubes such that the system can answer a largest subset of the queries without violating the two bounds. This is an NP-hard problem. We propose approximate Greedy algorithms GR, 2GM and 2GMM, which are shown to be both effective and efficient by experiments done on a census data set and a forest-cover-type data set.
机译:用于支持实时查询的OLAP系统的设计是主要的研究问题之一。一种方法是使用数据多维数据集,数据多维数据集是数据仓库中数据的物化预计算多维视图。我们可以派生一组数据立方体来直接回答每个常见查询。但是,存在两个实际问题:(1)数据多维数据集的维护成本,以及(2)回答这些查询的查询成本。维护数据立方体需要磁盘存储和CPU计算,因此维护成本与实现的总大小以及数据立方体的总数有关。在大多数情况下,实现所有数据多维数据集是不切实际的。通过合并一些数据立方体可以降低维护成本。但是,结果较大的数据立方体将增加回答某些查询的查询成本。如果维护成本和查询成本的界限太严格,我们可以帮助用户确定要牺牲哪些查询而不予以考虑。我们在数据立方体系统设计中定义了一个优化问题。在给定维护成本界限,查询成本界限和一组常见查询的情况下,有必要确定一组数据多维数据集,以便系统可以回答查询的最大子集而不会违反这两个界限。这是一个NP难题。我们提出了近似的贪婪算法GR,2GM和2GMM,通过对普查数据集和森林覆盖类型数据集进行的实验表明,它们既有效又有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号