【24h】

Query-Log Aware Replicated Declustering

机译:查询日志感知复制聚类

获取原文
获取原文并翻译 | 示例
           

摘要

Data declustering and replication can be used to reduce I/O times related with processing of data intensive queries. Declustering parallelizes the query retrieval process by distributing the data items requested by queries among several disks. Replication enables alternative disk choices for individual disk items and thus provides better query parallelism options. In general, existing replicated declustering schemes do not consider query log information and try to optimize all possible queries for a specific query type, such as range or spatial queries. In such schemes, it is assumed that two or more copies of all data items are to be generated and scheduling of these copies to disks are discussed. However, in some applications, generation of even two copies of all of the data items is not feasible, since data items tend to have very large sizes. In this work, we assume that there is a given limit on disk capacities and thus on replication amounts. We utilize existing query-log information to propose a selective replicated declustering scheme, in which we select the data items to be replicated and decide on their scheduling onto disks while respecting disk capacities. We propose and implement an iterative improvement algorithm to obtain a two-way replicated declustering and use this algorithm in a recursive framework to generate a multiway replicated declustering. Then we improve the obtained multiway replicated declustering by efficient refinement heuristics. Experiments conducted on realistic data sets show that the proposed scheme yields better performance results compared to existing replicated declustering schemes.
机译:数据分簇和复制可用于减少与数据密集型查询的处理相关的I / O时间。群集通过在多个磁盘之间分配查询所请求的数据项来并行化查询检索过程。通过复制,可以为单个磁盘项目选择其他磁盘,从而提供更好的查询并行性选项。通常,现有的复制分簇方案不考虑查询日志信息,而是尝试针对特定查询类型(例如范围或空间查询)优化所有可能的查询。在这样的方案中,假定要生成所有数据项的两个或更多副本,并讨论将这些副本调度到磁盘的过程。但是,在某些应用中,生成所有数据项的两个副本甚至是不可行的,因为数据项往往具有非常大的大小。在这项工作中,我们假设磁盘容量以及复制量受到一定的限制。我们利用现有的查询日志信息来提出选择性的复制分簇方案,在该方案中,我们选择要复制的数据项,并在考虑磁盘容量的同时决定它们在磁盘上的调度。我们提出并实现了一种迭代改进算法,以获取双向复制聚类,并在递归框架中使用该算法来生成多路复制聚类。然后,我们通过有效的细化启发式算法改进获得的多路复制去簇。在实际数据集上进行的实验表明,与现有的复制分簇方案相比,该方案产生了更好的性能结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号