首页> 外文会议>Symposium on Mass Storage Systems and Technologies >Pfimbi: Accelerating big data jobs through flow-controlled data replication
【24h】

Pfimbi: Accelerating big data jobs through flow-controlled data replication

机译:Pfimbi:通过流控制的数据复制来加速大数据作业

获取原文

摘要

The performance of HDFS is critical to big data software stacks and has been at the forefront of recent efforts from the industry and the open source community. A key problem is the lack of flexibility in how data replication is performed. To address this problem, this paper presents Pfimbi, the first alternative to HDFS that supports both synchronous and flow-controlled asynchronous data replication. Pfimbi has numerous benefits: It accelerates jobs, exploits under-utilized storage I/O bandwidth, and supports hierarchical storage I/O bandwidth allocation policies. We demonstrate that for a job trace derived from a Facebook workload, Pfimbi improves the average job runtime by 18% and by up to 46% in the best case. We also demonstrate that flow control is crucial to fully exploiting the benefits of asynchronous replication; removing Pfimbi's flow control mechanisms resulted in a 2.7× increase in job runtime.
机译:HDFS的性能对于大数据软件堆栈至关重要,并且一直处于行业和开放源代码社区最近的努力的最前沿。一个关键问题是数据复制的执行方式缺乏灵活性。为了解决这个问题,本文介绍了Pfimbi,它是HDFS的第一个替代方案,它支持同步和流控制的异步数据复制。 Pfimbi有许多好处:加速工作,利用未充分利用的存储I / O带宽,并支持分层存储I / O带宽分配策略。我们证明,对于源自Facebook工作负载的作业跟踪,Pfimbi将平均作业运行时间提高了18%,在最佳情况下提高了46%。我们还证明了流量控制对于充分利用异步复制的优势至关重要。删除Pfimbi的流控制机制后,作业时间增加了2.7倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号