...
首页> 外文期刊>ISPRS International Journal of Geo-Information >GeoSpark SQL: An Effective Framework Enabling Spatial Queries on Spark
【24h】

GeoSpark SQL: An Effective Framework Enabling Spatial Queries on Spark

机译:GeoSpark SQL:在Spark上启用空间查询的有效框架

获取原文
   

获取外文期刊封面封底 >>

       

摘要

In the era of big data, Internet-based geospatial information services such as various LBS apps are deployed everywhere, followed by an increasing number of queries against the massive spatial data. As a result, the traditional relational spatial database (e.g., PostgreSQL with PostGIS and Oracle Spatial) cannot adapt well to the needs of large-scale spatial query processing. Spark is an emerging outstanding distributed computing framework in the Hadoop ecosystem. This paper aims to address the increasingly large-scale spatial query-processing requirement in the era of big data, and proposes an effective framework GeoSpark SQL, which enables spatial queries on Spark. On the one hand, GeoSpark SQL provides a convenient SQL interface; on the other hand, GeoSpark SQL achieves both efficient storage management and high-performance parallel computing through integrating Hive and Spark. In this study, the following key issues are discussed and addressed: (1) storage management methods under the GeoSpark SQL framework, (2) the spatial operator implementation approach in the Spark environment, and (3) spatial query optimization methods under Spark. Experimental evaluation is also performed and the results show that GeoSpark SQL is able to achieve real-time query processing. It should be noted that Spark is not a panacea. It is observed that the traditional spatial database PostGIS/PostgreSQL performs better than GeoSpark SQL in some query scenarios, especially for the spatial queries with high selectivity, such as the point query and the window query. In general, GeoSpark SQL performs better when dealing with compute-intensive spatial queries such as the kNN query and the spatial join query.
机译:在大数据时代,基于Internet的地理空间信息服务(例如各种LBS应用程序)被部署到各处,随后对海量空间数据的查询也越来越多。结果,传统的关系空间数据库(例如具有PostGIS的PostgreSQL和Oracle Spatial)无法很好地适应大规模空间查询处理的需求。 Spark是Hadoop生态系统中新兴的杰出分布式计算框架。本文旨在解决大数据时代日益庞大的空间查询处理需求,并提出一种有效的框架GeoSpark SQL,该框架可在Spark上实现空间查询。一方面,GeoSpark SQL提供了方便的SQL界面;另一方面,GeoSpark SQL通过集成Hive和Spark实现了高效的存储管理和高性能的并行计算。在本研究中,讨论并解决了以下关键问题:(1)GeoSpark SQL框架下的存储管理方法;(2)Spark环境中的空间运算符实现方法;(3)Spark下的空间查询优化方法。还进行了实验评估,结果表明,GeoSpark SQL能够实现实时查询处理。应当指出,Spark不是万能药。可以看出,在某些查询场景中,传统的空间数据库PostGIS / PostgreSQL的性能优于GeoSpark SQL,尤其是对于具有高选择性的空间查询(例如点查询和窗口查询)。通常,在处理计算密集型空间查询(例如kNN查询和空间联接查询)时,GeoSpark SQL的性能更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号