首页> 外文会议>IEEE International Symposium on Computer Architecture and High Performance Computing >Planning Your SQL-on-Hadoop Deployment Using a Low-Cost Simulation-Based Approach
【24h】

Planning Your SQL-on-Hadoop Deployment Using a Low-Cost Simulation-Based Approach

机译:使用基于低成本仿真的方法来计划SQL-on-Hadoop部署

获取原文

摘要

The term "SQL-on-Hadoop" has recently gained significant traction [19]. Impala represents a new emerging class of SQL-on-Hadoop systems that exploit a shared-nothing parallel database architecture over Hadoop. Impala was designed to close the gap of near real time data analytics on Hadoop stack and it has shown itself to be significantly more efficient than other SQL-on-Hadoop solutions [13]. However, it is not a trivial task to leverage Impala for handling queries with different business demands [12]. Improperly deploying an Impala cluster may not give you the expected performance you want. In this paper, we propose a novel Impala simulation framework to help IT professionals to understand its performance behavior. This would simplify the deployment planning work required to enable big data analytics on SQL-on-Hadoop systems. An Impala simulator models the behavior of a complete software stack and simulates the activities of cluster components such as storage, network, processors and memory. Moreover, the accuracy of the simulation remain high in response to both software configuration and hardware changes, it reflects the expected scaling trend with low cost overhead and fast simulation speed. The Impala simulator has been validated against various S/W and H/W configurations, using the well-known TPC-DS benchmark [15], and the simulation results are valid and expected. A use case is provided to show how one would use the simulator to solve their performance and deployment issues.
机译:术语“ SQL-on-Hadoop”最近获得了广泛的关注[19]。 Impala代表了一种新兴的基于SQL的Hadoop系统,该类在Hadoop上利用了无共享并行数据库体系结构。 Impala旨在缩小Hadoop堆栈上近实时数据分析的差距,并且已证明其自身比其他SQL-on-Hadoop解决方案效率更高[13]。但是,利用Impala处理具有不同业务需求的查询并不是一件容易的事[12]。不正确地部署Impala群集可能无法为您提供所需的预期性能。在本文中,我们提出了一种新颖的Impala仿真框架,以帮助IT专业人员了解其性能行为。这将简化在SQL-on-Hadoop系统上启用大数据分析所需的部署计划工作。 Impala模拟器对完整软件堆栈的行为进行建模,并模拟群集组件(如存储,网络,处理器和内存)的活动。此外,响应于软件配置和硬件更改,仿真的准确性仍然很高,它以低成本开销和快速仿真速度反映了预期的缩放趋势。使用著名的TPC-DS基准测试[15],Impala仿真器已针对各种软件和硬件配置进行了验证,仿真结果是有效且可预期的。提供了一个用例,以说明如何使用模拟器来解决其性能和部署问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号