Cost-Aware Big Data Processing Across Geo-Distributed Datacenters

Wenhua Xiao; Weidong Bao; Xiaomin Zhu; Ling Liu

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Cost-Aware Big Data Processing Across Geo-Distributed Datacenters

【24h】

Cost-Aware Big Data Processing Across Geo-Distributed Datacenters

机译：跨地理分布数据中心的可感知成本的大数据处理

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the globalization of service, organizations continuously produce large volumes of data that need to be analysed over geo-dispersed locations. Traditionally central approach that moving all data to a single cluster is inefficient or infeasible due to the limitations such as the scarcity of wide-area bandwidth and the low latency requirement of data processing. Processing big data across geo-distributed datacenters continues to gain popularity in recent years. However, managing distributed MapReduce computations across geo-distributed datacenters poses a number of technical challenges: how to allocate data among a selection of geo-distributed datacenters to reduce the communication cost, how to determine the Virtual Machine (VM) provisioning strategy that offers high performance and low cost, and what criteria should be used to select a datacenter as the final reducer for big data analytics jobs. In this paper, these challenges is addressed by balancing bandwidth cost, storage cost, computing cost, migration cost, and latency cost, between the two MapReduce phases across datacenters. We formulate this complex cost optimization problem for data movement, resource provisioning and reducer selection into a joint stochastic integer nonlinear optimization problem by minimizing the five cost factors simultaneously. The Lyapunov framework is integrated into our study and an efficient online algorithm that is able to minimize the long-term time-averaged operation cost is further designed. Theoretical analysis shows that our online algorithm can provide a near optimum solution with a provable gap and can guarantee that the data processing can be completed within pre-defined bounded delays. Experiments on WorldCup98 web site trace validate the theoretical analysis results and demonstrate that our approach is close to the offline-optimum performance and superior to some representative approaches.

机译：随着服务的全球化，组织不断产生大量需要在地理位置分散的位置进行分析的数据。传统上，由于诸如广域带宽的稀缺性和数据处理的低延迟要求之类的限制，将所有数据移动到单个群集的集中式方法效率低下或不可行。近年来，跨地理分布的数据中心处理大数据继续受到欢迎。但是，跨地理分布的数据中心管理分布式MapReduce计算带来了许多技术挑战：如何在多个地理分布的数据中心之间分配数据以降低通信成本，如何确定可提供较高性能的虚拟机（VM）供应策略性能和低成本，以及应使用什么标准来选择数据中心作为大数据分析工作的最终归宿。在本文中，这些挑战通过在数据中心的两个MapReduce阶段之间平衡带宽成本，存储成本，计算成本，迁移成本和延迟成本来解决。通过同时最小化五个成本因素，我们将用于数据移动，资源供应和Reducer选择的复杂成本优化问题表述为联合随机整数非线性优化问题。 Lyapunov框架已集成到我们的研究中，并且进一步设计了一种有效的在线算法，该算法能够最大程度地减少长期平均时间的运营成本。理论分析表明，我们的在线算法可以提供具有可证明间隙的近乎最优的解决方案，并且可以保证数据处理可以在预定的有界延迟内完成。在WorldCup98网站上进行的跟踪实验验证了理论分析结果，并证明了我们的方法接近脱机最佳性能，并且优于某些代表性方法。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2017年第11期|3114-3127|共14页
作者
Wenhua Xiao; Weidong Bao; Xiaomin Zhu; Ling Liu;
展开▼
作者单位

Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha, Hunan, P.R. China;

Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha, Hunan, P.R. China;

Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha, Hunan, P.R. China;

College of Computing, Georgia Institute of Technology, 266 Ferst Drive, Atlanta, GA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Distributed databases; Cloud computing; Bandwidth; Big Data; Algorithm design and analysis; Optimization;

机译：分布式数据库;云计算;带宽;大数据;算法设计与分析;优化;

相似文献

外文文献
中文文献
专利

1. Cost-Aware Partitioning for Efficient Large Graph Processing in Geo-Distributed Datacenters [J] . IEEE Transactions on Parallel and Distributed Systems . 2020,第7期

机译：地理分布数据中心中用于高效大图处理的成本感知分区
2. Transformation-Based Streaming Workflow Allocation on Geo-Distributed Datacenters for Streaming Big Data Processing [J] . Chen Wuhui, Paik Incheon, Hung Patrick C. K. Services Computing, IEEE Transactions on . 2019,第4期

机译：地理分布数据中心上基于转换的流工作流分配，用于流式处理大数据
3. Cost-Aware Streaming Workflow Allocation on Geo-Distributed Data Centers [J] . Wuhui Chen, Incheon Paik, Zhenni Li IEEE Transactions on Computers . 2017,第2期

机译：地理分布数据中心上的成本感知流工作流分配
4. On Achieving Efficient Data Transfer for Graph Processing in Geo-Distributed Datacenters [C] . Amelie Chi Zhou, Shadi Ibrahim, Bingsheng He IEEE International Conference on Distributed Computing Systems . 2017

机译：关于在地理分布式数据中心中实现图形处理的有效数据传输
5. Geo-distributed big data processing. [D] . Jayalath, Chamikara Madhusanka. 2014

机译：地理分布式大数据处理。
6. SDTCP: Towards Datacenter TCP Congestion Control with SDN for IoT Applications [O] . Yifei Lu, Zhen Ling, Shuhong Zhu, 2017

机译：SDTCP：使用SDN实现IoT数据中心TCP拥塞控制
7. On Achieving Efficient Data Transfer for Graph Processing in Geo-Distributed Datacenters [O] . Zhou, Amelie,, Ibrahim, Shadi, He, Bingsheng 2017

机译：关于在地理分布数据中心中实现图形处理的有效数据传输

Cost-Aware Big Data Processing Across Geo-Distributed Datacenters

摘要

著录项

相似文献

相关主题

期刊订阅