首页> 外文学位 >Distributed Query Processing Over Incomplete, Sampled, and Locality-Aware Data

【24h】

Distributed Query Processing Over Incomplete, Sampled, and Locality-Aware Data

机译：对不完整，采样和位置感知的数据进行分布式查询处理

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

There are numerous challenges in distributed query processing. The focus of this thesis is to provide solutions to three problem areas: (a) querying incomplete data, (b) approximate query processing (AQP) over subsets of data, and (c) high cost of shuffling data while processing distributed queries.;In distributed databases, large volumes of data are generally stored partitioned across multiple nodes and a user query typically spans many nodes. As the number of nodes accessed by a query increases, the probability of nodes being unavailable also increases; additionally, the amount of data shuffled across nodes also increases, thus increasing communication costs.;To provide fast responses to queries over distributed databases, AQP has been proposed. In AQP, queries are processed over a representative subset of the database and estimates of the query result are provided along with confidence bounds. While AQP provides estimates of query results in a fraction of the time required to run the query over all data, quickly obtaining representative samples for a query in a distributed setting is challenging.;We first consider the problem of querying over incomplete data. In failure and straggler scenarios, parts of the database that are still available form an incomplete database. We propose m-tables, a new representation system for representing and querying over incomplete databases.;Next, we consider the problem of AQP over subsets of data. We propose the ASAP (Approximation Strategies for Aggregate queries through Partitioning) framework to provide estimates and confidence bounds for aggregate queries using any subset of a database when the database is co-hash partitioned. A database is co-hash partitioned when some tables are hash partitioned, and the remaining tables are co-located through join predicates.;Finally, we study the problem of high cost of shuffling data across nodes for distributed query processing. Ideally, given a query and data distribution, we want to execute the query without any communication: in this case, the query is said to be parallel-correct w.r.t. the distribution. We again consider co-hash distribution schemes and as our main result, we determine the conditions for a given query to be parallel-correct for a given co-hash distribution scheme.

机译：分布式查询处理中存在许多挑战。本文的重点是为三个问题领域提供解决方案：（a）查询不完整的数据，（b）对数据子集进行近似查询处理（AQP），以及（c）处理分布式查询时改组数据的高成本。在分布式数据库中，通常在多个节点之间分区存储大量数据，并且用户查询通常跨越多个节点。随着查询访问的节点数量的增加，节点不可用的可能性也随之增加。此外，跨节点重排的数据量也增加了，从而增加了通信成本。为了提供对分布式数据库中查询的快速响应，已经提出了AQP。在AQP中，在数据库的代表性子集上处理查询，并提供查询结果的估计值以及置信范围。虽然AQP可以在对所有数据运行查询所需的时间的一小部分时间内提供查询结果的估计，但在分布式设置中快速获取查询的代表性样本却具有挑战性。我们首先考虑对不完整数据进行查询的问题。在失败和混乱的情况下，仍然可用的部分数据库将形成不完整的数据库。我们提出了m-tables，这是一种用于表示和查询不完整数据库的新表示系统。接下来，我们考虑数据子集上的AQP问题。我们提出了ASAP（通过分区进行聚合查询的近似策略）框架，以在对数据库进行共哈希分区时使用数据库的任何子集提供聚合查询的估计值和置信范围。当对某些表进行哈希分区时，将对数据库进行共哈希分区，而其余的表通过连接谓词进行共定位。最后，我们研究了跨节点进行分布式查询处理的数据重排成本高的问题。理想情况下，给定查询和数据分布，我们希望在没有任何通信的情况下执行查询：在这种情况下，该查询被称为并行正确w.r.t.分布。我们再次考虑共同哈希分配方案，并且作为我们的主要结果，我们确定给定查询的条件对于给定共同哈希分配方案是并行正确的。

著录项

作者
Sundarmurthy, Bruhathi.;
展开▼
作者单位

The University of Wisconsin - Madison.;

展开▼
授予单位 The University of Wisconsin - Madison.;
学科 Computer science.
学位 Ph.D.
年度 2018
页码 154 p.
总页数 154
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Data Locality-Aware Big Data Query Evaluation in Distributed Clouds [J] . Qiufen Xia, Weifa Liang, Zichuan Xu The Computer journal . 2017,第6期

机译：分布式云中数据本地感知的大数据查询评估
2. Processing skyline queries in incomplete distributed databases [J] . Alwan Ali A., Ibrahim Hamidah, Udzir Nur Izura, Journal of Intelligent Information Systems . 2017,第2期

机译：在不完整的分布式数据库中处理天际线查询
3. Query processing over incomplete autonomous databases: query rewriting using learned data dependencies [J] . Garrett Wolf, Aravind Kalavagattu, Hemal Khatri, VLDB journal . 2009,第5期

机译：在不完整的自治数据库上进行查询处理：使用学习到的数据依赖项进行查询重写
4. Data Locality-Aware Query Evaluation for Big Data Analytics in Distributed Clouds [C] . Xia Qiufen, Liang Weifa, Xu Zichuan International Conference on Advanced Cloud and Big Data . 2015

机译：分布式云中大数据分析的数据位置感知查询评估
5. Efficient Processing of Skyline Queries on Static Data Sources, Data Streams and Incomplete Datasets. [D] . Nagendra, Mithila. 2014

机译：有效处理静态数据源，数据流和不完整数据集上的天际线查询。
6. A Query Workflow Design to Perform Automatable Distributed Regression Analysis in Large Distributed Data Networks [O] . Qoua L. Her, Jessica M. Malenfant, Sarah Malek, -1

机译：在大型分布式数据网络中执行自动化分布式回归分析的查询工作流设计
7. Locality-Aware Fair Scheduling in the Distributed Query Processing Framework [O] . Eom Youngmoon 2015

机译：分布式查询处理框架中的位置感知公平调度

Distributed Query Processing Over Incomplete, Sampled, and Locality-Aware Data

摘要

著录项

相似文献

相关主题

期刊订阅