首页> 外文期刊>Intelligent decision technologies >An improved approach for automatic selection of multi-tables indexes in ralational data warehouses using maximal frequent itemsets
【24h】

An improved approach for automatic selection of multi-tables indexes in ralational data warehouses using maximal frequent itemsets

机译:一种使用最大频繁项集自动选择关系数据仓库中多表索引的方法

获取原文
获取原文并翻译 | 示例
           

摘要

System performance for data warehouses is crucially dependent on its physical design in which one of the most challenging tasks is the selection of an appropriate set of indexes for a representative workload under storage constraint. The problem becomes even more complex for multi-tables indexes such as bitmap join indexes, since it involves searching a vast space of possible configurations. Queries references to attributes and their frequencies play an important role in determining the efficiency of the selected indexes. In this paper, we consider the index selection as a typical frequent itemsets mining problem. The indexes are built with combinations of attributes, viewed as items. The queries in the workload, viewed as transactions, are described by the attributes they involve. The foundation of our approach is the concept of maximal frequent itemsets. This data mining technique helps to discover strong correlations among attributes such that the presence of some attributes in a query will imply the presence of some other attributes. Moreover, by avoiding the generation of redundent indexes, the proposed approach leads to a solution that expresses the set of relevant indexes in a more succinct way. Consequently, it guarantees the reduction of the storage space requirements. Unlike previous approaches that focus on the configuration leading to the minimum workload cost, we suggest to consider a set of optimized solutions and we propose a metric for measuring profit-effectiveness that helps to pick up the most promising one. Through a set of experiments on the ABP-1 benchmark, we show that our approach achieves better performance compared to similar methods, with significant savings in index storage.
机译:数据仓库的系统性能主要取决于其物理设计,其中最具挑战性的任务之一是为存储受限的代表性工作负载选择一组合适的索引。对于多表索引(例如位图联接索引),此问题变得更加复杂,因为它涉及到搜索可能配置的巨大空间。对属性及其频率的查询对确定所选索引的效率起着重要作用。在本文中,我们认为索引选择是一个典型的频繁项目集挖掘问题。索引是用属性的组合构建的,被视为项目。工作负载中的查询(被视为事务)由它们所涉及的属性来描述。我们方法的基础是最大频繁项集的概念。这种数据挖掘技术有助于发现属性之间的强相关性,以使查询中某些属性的存在将暗示某些其他属性的存在。此外,通过避免冗余索引的生成,所提出的方法导致了一种解决方案,该解决方案以更简洁的方式表达了相关索引的集合。因此,它保证了存储空间需求的减少。与以前的方法着重于使工作量成本降至最低的配置不同,我们建议考虑一套优化的解决方案,并提出一种衡量利润效益的指标,以帮助选择最有前途的方案。通过在ABP-1基准上进行的一组实验,我们表明,与类似方法相比,我们的方法可实现更好的性能,并显着节省了索引存储量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号