首页> 外文期刊>Information Sciences: An International Journal >A parallel query processing system based on graph-based database partitioning
【24h】

A parallel query processing system based on graph-based database partitioning

机译:基于基于图形的数据库分区的并行查询处理系统

获取原文
获取原文并翻译 | 示例
           

摘要

As parallel database systems have large amounts of data to process, it is important to utilize a scalable and efficient horizontal database partitioning method. The existing partitioning methods have major drawbacks that not only cause large amounts of data redundancy but also still require expensive shuffle operations for join queries in many cases-despite their high data redundancy. We elucidate upon the drawbacks originating from the tree-based partitioning schemes and propose a novel graph-based database partitioning method called GPT that both improves the query performance and reduces data redundancy. We integrate the proposed GPT method into a parallel query processing system, Spark SQL, across all the relevant layers and modules, including the query plan generator and the scan operator. Through extensive experiments using three benchmarks, TPC-DS, IMDB and BioWarehouse, we show that GPT significantly outperforms the state-of-the-art method in terms of both storage overhead and query performance. (C) 2018 Elsevier Inc. All rights reserved.
机译:作为并行数据库系统具有大量数据来处理,重要的是利用可扩展和高效的水平数据库分区方法。现有的分区方法具有主要的缺点,不仅导致大量数据冗余,而且还需要在许多情况下为加入查询进行昂贵的Shuffle操作 - 尽管他们的数据冗余高。我们阐明源自基于树的分区方案的缺点,并提出了一种名为GPT的基于图形的数据库分区方法,其均提高了查询性能并降低了数据冗余。我们将所提出的GPT方法集成到并行查询处理系统中,Spark SQL,跨所有相关层和模块,包括查询计划生成器和扫描操作员。通过使用三个基准测试,TPC-DS,IMDB和BiowareHouse的广泛实验,我们表明GPT在存储开销和查询性能方面显着优于最先进的方法。 (c)2018年Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号