首页> 外文学位 >Investigating MapReduce framework extensions for efficient processing of geographically scattered datasets.
【24h】

Investigating MapReduce framework extensions for efficient processing of geographically scattered datasets.

机译:研究MapReduce框架扩展,以有效处理地理上分散的数据集。

获取原文
获取原文并翻译 | 示例

摘要

We observe two important trends brought about by the evolution of Internet in recent years. Firstly to improve end-to-end application performance in presence of bottlenecks in the wide-area Internet communication, modern day Internet services are designed in a decentralized fashion involving geographically distributed datacenters connected through the Internet. Secondly the pervasive nature of Internet services has resulted into an exponential growth in the size of digital information created, captured or replicated. Organizations are keenly interested in mining this information to uncover trends, statistics and other actionable information which can give them competitive advantage. These two trends necessitate the design of a large-scale data processing system which can operate efficiently in a distributed environment involving multiple datacenters connected through the Internet.;In recent years, MapReduce programming model and specifically its open source implementation Hadoop is gaining a lot of traction for performing large-scale data processing in a centralized environment. Our evaluation of different real-world usage scenarios of Hadoop deployments revealed that the organizations with the distributed datasets are required to copy the entire dataset to a centralized location so that it can be efficiently processed by the Hadoop MapReduce framework. As the Internet evolves growth in the size of distributed datasets would outpace the improvements in the network bandwidth available in the Internet. At that point the approach of copying the entire dataset to a single location using Internet would become infeasible.;In this thesis, we have investigated the possibility of extending the MapReduce and specifically Hadoop framework to operate in a distributed environment involving multiple datacenters connected through the Internet. We also have proposed policies to improve the performance of Hadoop MapReduce framework in a distributed environment. We have observed that our policies improve the performance of Hadoop framework substantially.
机译:我们观察到近年来互联网发展带来的两个重要趋势。首先,为了在广域Internet通信中存在瓶颈的情况下提高端到端应用程序性能,现代Internet服务是以分散方式设计的,涉及通过Internet连接的地理分布的数据中心。其次,互联网服务的普遍性导致创建,捕获或复制的数字信息的大小呈指数增长。组织对挖掘这些信息以发现趋势,统计数据和其他可为他们带来竞争优势的可行信息非常感兴趣。这两个趋势需要设计一个大型数据处理系统,该系统可以在包含通过Internet连接的多个数据中心的分布式环境中高效运行。;近年来,MapReduce编程模型,尤其是其开源实现Hadoop正在获得很大的发展。在集中式环境中执行大规模数据处理的牵引力。我们对Hadoop部署的不同实际使用场景的评估显示,要求具有分布式数据集的组织将整个数据集复制到一个集中的位置,以便Hadoop MapReduce框架可以对其进行有效处理。随着Internet的发展,分布式数据集大小的增长将超过Internet可用网络带宽的提高。到那时,使用Internet将整个数据集复制到单个位置的方法将变得不可行。;在本文中,我们研究了扩展MapReduce尤其是Hadoop框架以在涉及通过该数据库连接的多个数据中心的分布式环境中运行的可能性。互联网。我们还提出了提高分布式环境中Hadoop MapReduce框架性能的策略。我们观察到,我们的策略极大地提高了Hadoop框架的性能。

著录项

  • 作者

    Gadre, Hrishikesh.;

  • 作者单位

    Rutgers The State University of New Jersey - New Brunswick.;

  • 授予单位 Rutgers The State University of New Jersey - New Brunswick.;
  • 学科 Engineering Computer.
  • 学位 M.S.
  • 年度 2011
  • 页码 92 p.
  • 总页数 92
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号