GeoSpark SQL: An Effective Framework Enabling Spatial Queries on Spark

Zhou Huang; Yiran Chen; Lin Wan; Xia Peng

首页> 外文期刊>ISPRS International Journal of Geo-Information >GeoSpark SQL: An Effective Framework Enabling Spatial Queries on Spark

【24h】

GeoSpark SQL: An Effective Framework Enabling Spatial Queries on Spark

机译：GeoSpark SQL：在Spark上启用空间查询的有效框架

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the era of big data, Internet-based geospatial information services such as various LBS apps are deployed everywhere, followed by an increasing number of queries against the massive spatial data. As a result, the traditional relational spatial database (e.g., PostgreSQL with PostGIS and Oracle Spatial) cannot adapt well to the needs of large-scale spatial query processing. Spark is an emerging outstanding distributed computing framework in the Hadoop ecosystem. This paper aims to address the increasingly large-scale spatial query-processing requirement in the era of big data, and proposes an effective framework GeoSpark SQL, which enables spatial queries on Spark. On the one hand, GeoSpark SQL provides a convenient SQL interface; on the other hand, GeoSpark SQL achieves both efficient storage management and high-performance parallel computing through integrating Hive and Spark. In this study, the following key issues are discussed and addressed: (1) storage management methods under the GeoSpark SQL framework, (2) the spatial operator implementation approach in the Spark environment, and (3) spatial query optimization methods under Spark. Experimental evaluation is also performed and the results show that GeoSpark SQL is able to achieve real-time query processing. It should be noted that Spark is not a panacea. It is observed that the traditional spatial database PostGIS/PostgreSQL performs better than GeoSpark SQL in some query scenarios, especially for the spatial queries with high selectivity, such as the point query and the window query. In general, GeoSpark SQL performs better when dealing with compute-intensive spatial queries such as the kNN query and the spatial join query.

机译：在大数据时代，基于Internet的地理空间信息服务（例如各种LBS应用程序）被部署到各处，随后对海量空间数据的查询也越来越多。结果，传统的关系空间数据库（例如具有PostGIS的PostgreSQL和Oracle Spatial）无法很好地适应大规模空间查询处理的需求。 Spark是Hadoop生态系统中新兴的杰出分布式计算框架。本文旨在解决大数据时代日益庞大的空间查询处理需求，并提出一种有效的框架GeoSpark SQL，该框架可在Spark上实现空间查询。一方面，GeoSpark SQL提供了方便的SQL界面;另一方面，GeoSpark SQL通过集成Hive和Spark实现了高效的存储管理和高性能的并行计算。在本研究中，讨论并解决了以下关键问题：（1）GeoSpark SQL框架下的存储管理方法;（2）Spark环境中的空间运算符实现方法;（3）Spark下的空间查询优化方法。还进行了实验评估，结果表明，GeoSpark SQL能够实现实时查询处理。应当指出，Spark不是万能药。可以看出，在某些查询场景中，传统的空间数据库PostGIS / PostgreSQL的性能优于GeoSpark SQL，尤其是对于具有高选择性的空间查询（例如点查询和窗口查询）。通常，在处理计算密集型空间查询（例如kNN查询和空间联接查询）时，GeoSpark SQL的性能更好。

著录项

来源
《ISPRS International Journal of Geo-Information》 |2017年第9期|共页
作者
Zhou Huang; Yiran Chen; Lin Wan; Xia Peng;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类 F9;
关键词

相似文献

外文文献
中文文献
专利

1. An adaptive spark-based framework for querying large-scale NoSQL and relational databases [J] . Eman Khashan, Ali Eldesouky, Sally Elghamrawy PLoS One . 2021,第8期

机译：用于查询大型NoSQL和关系数据库的自适应火花框架
2. Spatial data management in apache spark: the GeoSpark perspective and beyond [J] . Yu Jia, Zhang Zongsi, Sarwat Mohamed Geoinformatica: An international journal of advances of computer science for geographic . 2019,第1期

机译：Apache Spark中的空间数据管理：Geospart透视和超越
3. Parallel algorithm for improving the performance of spatial queries in SQL: The use cases of SQLite/SpatiaLite and PostgreSQL/PostGIS databases [J] . Ilba Mateusz Computers & geosciences . 2021,第Octa期

机译：并行算法，用于提高SQL中的空间查询性能：SQLite / Spatialite和PostgreSQL / Postgis数据库的用例
4. A demonstration of GeoSpark: A cluster computing framework for processing big spatial data [C] . Jia Yu, Jinxuan Wu, Mohamed Sarwat IEEE International Conference on Data Engineering . 2016

机译：GeoSpark的演示：用于处理大空间数据的集群计算框架
5. SQL query disassembler: An approach to managing the execution of large SQL queries. [D] . Meng, Yabin. 2007

机译：SQL查询反汇编程序：一种管理大型SQL查询执行的方法。
6. An adaptive spark-based framework for querying large-scale NoSQL and relational databases [O] . Eman Khashan, Ali Eldesouky, Sally Elghamrawy 2021

机译：用于查询大型NoSQL和关系数据库的自适应火花基框架
7. GeoSpark SQL: An Effective Framework Enabling Spatial Queries on Spark [O] . Zhou Huang, Yiran Chen, Lin Wan, 2017

机译：Geospark sQL：在spark上启用空间查询的有效框架
8. SQTTEXT: A tool for editing Structured Query Language (SQL) text within ORACLE SQL*Forms applications [R] . Daugherty, P. F., Singley, P. T. 1990

机译：sQTTEXT：在ORaCLE sQL * Forms应用程序中编辑结构化查询语言（sQL）文本的工具

GeoSpark SQL: An Effective Framework Enabling Spatial Queries on Spark

摘要

著录项

相似文献

相关主题

期刊订阅