首页> 外文期刊>Computers, IEEE Transactions on >A General Communication Cost Optimization Framework for Big Data Stream Processing in Geo-Distributed Data Centers
【24h】

A General Communication Cost Optimization Framework for Big Data Stream Processing in Geo-Distributed Data Centers

机译:地理分布数据中心中用于大数据流处理的通用通信成本优化框架

获取原文
获取原文并翻译 | 示例
           

摘要

With the explosion of big data, processing large numbers of continuous data streams, i.e., big data stream processing (BDSP), has become a crucial requirement for many scientific and industrial applications in recent years. By offering a pool of computation, communication and storage resources, public clouds, like Amazon's EC2, are undoubtedly the most efficient platforms to meet the ever-growing needs of BDSP. Public cloud service providers usually operate a number of geo-distributed datacenters across the globe. Different datacenter pairs are with different inter-datacenter network costs charged by Internet Service Providers (ISPs). While, inter-datacenter traffic in BDSP constitutes a large portion of a cloud provider's traffic demand over the Internet and incurs substantial communication cost, which may even become the dominant operational expenditure factor. As the datacenter resources are provided in a virtualized way, the virtual machines (VMs) for stream processing tasks can be freely deployed onto any datacenters, provided that the Service Level Agreement (SLA, e.g., quality-of-information) is obeyed. This raises the opportunity, but also a challenge, to explore the inter-datacenter network cost diversities to optimize both VM placement and load balancing towards network cost minimization with guaranteed SLA. In this paper, we first propose a general modeling framework that describes all representative inter-task relationship semantics in BDSP. Based on our novel framework, we then formulate the communication cost minimization problem for BDSP into a mixed-integer linear programming (MILP) problem and prove it to be NP-hard. We then propose a computation-efficient solution based on MILP. The high efficiency of our proposal is validated by extensive simulation based studies.
机译:随着大数据的爆炸式增长,近年来,处理大量连续数据流,即大数据流处理(BDSP),已成为许多科学和工业应用的关键要求。通过提供计算,通信和存储资源池,像Amazon EC2这样的公共云无疑是满足BDSP不断增长的需求的最有效平台。公共云服务提供商通常在全球范围内运营许多地理分布的数据中心。不同的数据中心对由Internet服务提供商(ISP)收取不同的数据中心间网络费用。同时,BDSP中的数据中心间流量构成了云提供商在Internet上的流量需求的很大一部分,并产生了大量的通信成本,这甚至可能成为主要的运营支出因素。由于以虚拟化方式提供数据中心资源,因此,只要遵守服务级别协议(SLA,例如信息质量),就可以将用于流处理任务的虚拟机(VM)自由部署到任何数据中心上。这带来了探索数据中心间网络成本差异的机会,但同时也带来了挑战,以优化VM放置和负载平衡,从而在保证SLA的情况下最大限度地降低网络成本。在本文中,我们首先提出一个通用的建模框架,该框架描述了BDSP中所有代表性的任务间关系语义。然后,基于我们新颖的框架,我们将BDSP的通信成本最小化问题公式化为混合整数线性规划(MILP)问题,并证明它是NP难解的。然后,我们提出了一种基于MILP的高效计算解决方案。我们的建议的高效率已通过广泛的基于模拟的研究得到验证。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号