首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Traffic-Aware Geo-Distributed Big Data Analytics with Predictable Job Completion Time
【24h】

Traffic-Aware Geo-Distributed Big Data Analytics with Predictable Job Completion Time

机译:具有可预测作业完成时间的流量感知地理分布大数据分析

获取原文
获取原文并翻译 | 示例
           

摘要

Big data analytics has attracted close attention from both industry and academic because of its great benefits in cost reduction and better decision making. As the fast growth of various global services, there is an increasing need for big data analytics across multiple data centers (DCs) located in different countries or regions. It asks for the support of a cross-DC data processing platform optimized for the geo-distributed computing environment. Although some recent efforts have been made for geo-distributed big data analytics, they cannot guarantee predictable job completion time, and would incur excessive traffic over the inter-DC network that is a scarce resource shared by many applications. In this paper, we study to minimize the inter-DC traffic generated by MapReduce jobs targeting on geo-distributed big data, while providing predicted job completion time. To achieve this goal, we formulate an optimization problem by jointly considering input data movement and task placement. Furthermore, we guarantee predictable job completion time by applying the chance-constrained optimization technique, such that the MapReduce job can finish within a predefined job completion time with high probability. To evaluate the performance of our proposal, we conduct extensive simulations using real traces generated by a set of queries on Hive. The results show that our proposal can reduce 55 percent inter-DC traffic compared with centralized processing by aggregating all data to a single data center.
机译:大数据分析因其在降低成本和更好的决策方面的巨大优势而引起了业界和学术界的密切关注。随着各种全球服务的快速增长,越来越需要跨不同国家或地区的多个数据中心(DC)进行大数据分析。它要求为地理分布式计算环境优化的跨DC数据处理平台的支持。尽管最近为地理分布大数据分析做出了一些努力,但它们不能保证可预测的作业完成时间,并且会在DC间网络上产生过多的流量,这是许多应用程序共享的稀缺资源。在本文中,我们研究将MapReduce作业针对地理分布的大数据而产生的DC间流量最小化,同时提供预计的作业完成时间。为了实现此目标,我们通过共同考虑输入数据移动和任务放置来制定优化问题。此外,我们通过应用机会受限的优化技术来保证可预测的作业完成时间,从而使MapReduce作业可以在预定义的作业完成时间内极有可能完成。为了评估我们提案的性能,我们使用由Hive上的一组查询生成的真实跟踪进行了广泛的模拟。结果表明,与集中式处理相比,通过将所有数据聚合到单个数据中心,我们的建议可以减少55%的DC间流量。

著录项

  • 来源
  • 作者单位

    School of Computer Science and Engineering, Fukushima-ken, University of Aizu, Japan;

    Department of Computing, The Hong Kong Polytechnic University, Hong Kong;

    School of Computer Science and Engineering, Fukushima-ken, University of Aizu, Japan;

    Service Computing Technology and System Lab, School of Computer Science and Technology, Cluster and Grid Computing Lab, Huazhong University of Science and Technology, Wuhan, China;

    Service Computing Technology and System Lab, School of Computer Science and Technology, Cluster and Grid Computing Lab, Huazhong University of Science and Technology, Wuhan, China;

    School of Information Technologies, University of Sydney, Camperdown, NSW, Australia;

    Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks, Nanjing University of Posts and Telecommunications, Nanjing, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Big data; Optimization; Distributed databases; Scheduling; Data models; Proposals;

    机译:大数据;优化;分布式数据库;调度;数据模型;建议;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号