首页> 外文会议>Symposium on Mass Storage Systems and Technologies >Impact of data placement on resilience in large-scale object storage systems
【24h】

Impact of data placement on resilience in large-scale object storage systems

机译:数据放置对大型对象存储系统中弹性的影响

获取原文

摘要

Distributed object storage architectures have become the de facto standard for high-performance storage in big data, cloud, and HPC computing. Object storage deployments using commodity hardware to reduce costs often employ object replication as a method to achieve data resilience. Repairing object replicas after failure is a daunting task for systems with thousands of servers and billions of objects, however, and it is increasingly difficult to evaluate such scenarios at scale on real-world systems. Resilience and availability are both compromised if objects are not repaired in a timely manner. In this work we leverage a high-fidelity discrete-event simulation model to investigate replica reconstruction on large-scale object storage systems with thousands of servers, billions of objects, and petabytes of data. We evaluate the behavior of CRUSH, a well-known object placement algorithm, and identify configuration scenarios in which aggregate rebuild performance is constrained by object placement policies. After determining the root cause of this bottleneck, we then propose enhancements to CRUSH and the usage policies atop it to enable scalable replica reconstruction. We use these methods to demonstrate a simulated aggregate rebuild rate of 410 GiB/s (within 5% of projected ideal linear scaling) on a 1,024-node commodity storage system. We also uncover an unexpected phenomenon in rebuild performance based on the characteristics of the data stored on the system.
机译:分布式对象存储体系结构已成为大数据,云和HPC计算中高性能存储的事实上的标准。使用商品硬件降低成本的对象存储部署通常采用对象复制作为实现数据弹性的方法。对于具有成千上万个服务器和数十亿个对象的系统,故障后修复对象副本是一项艰巨的任务,但是,在现实世界的系统上大规模评估此类方案变得越来越困难。如果未及时修复对象,则弹性和可用性都会受到损害。在这项工作中,我们利用高保真离散事件仿真模型来研究具有数千台服务器,数十亿个对象和PB级数据的大规模对象存储系统上的副本重建。我们评估了一种众所周知的对象放置算法CRUSH的行为,并确定了一些配置方案,其中聚合重建性能受到对象放置策略的限制。在确定此瓶颈的根本原因之后,我们然后建议对CRUSH及其顶部的使用策略进行增强,以实现可伸缩副本重建。我们使用这些方法来证明在1,024个节点的商品存储系统上模拟的410 GiB / s的总重建速率(在预计的理想线性比例的5%内)。根据存储在系统上的数据的特征,我们还发现了重建性能中的意外现象。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号