首页> 外文会议>Proceedints of the 6th USENIX Conference on File and Storage Technologies(FAST'08) >Are Disks the Dominant Contributor for Storage Failures? A Comprehensive Study of Storage Subsystem Failure Characteristics
【24h】

Are Disks the Dominant Contributor for Storage Failures? A Comprehensive Study of Storage Subsystem Failure Characteristics

机译:磁盘是导致存储故障的主要因素吗?存储子系统故障特征的综合研究

获取原文
获取原文并翻译 | 示例

摘要

Building reliable storage systems becomes increasingly challenging as the complexity of modern storage systems continues to grow. Understanding storage failure characteristics is crucially important for designing and building a reliable storage system. While several recent studies have been conducted on understanding storage failures, almost all of them focus on the failure characteristics of one component - disks - and do not study other storage component failures. This paper analyzes the failure characteristics of storage subsystems. More specifically, we analyzed the storage logs collected from about 39,000 storage systems commercially deployed at various customer sites. The data set covers a period of 44 months and includes about 1,800,000 disks hosted in about 155,000 storage shelf enclosures. Our study reveals many interesting findings, providing useful guideline for designing reliable storage systems. Some of our major findings include: (1) In addition to disk failures that contribute to 20-55% of storage subsystem failures, other components such as physical interconnects and protocol stacks also account for significant percentages of storage subsystem failures. (2) Each individual storage subsystem failure type and storage subsystem failure as a whole exhibit strong self-correlations. In addition, these failures exhibit "bursty" patterns. (3) Storage subsystems configured with redundant interconnects experience 30-40% lower failure rates than those with a single interconnect. (4) Spanning disks of a RAID group across multiple shelves provides a more resilient solution for storage subsystems than within a single shelf.
机译:随着现代存储系统的复杂性不断增长,构建可靠的存储系统变得越来越具有挑战性。了解存储故障特征对于设计和构建可靠的存储系统至关重要。尽管最近进行了一些有关了解存储故障的研究,但几乎所有研究都集中在一个组件(磁盘)的故障特征上,而不研究其他存储组件的故障。本文分析了存储子系统的故障特征。更具体地说,我们分析了从在各个客户站点进行商业部署的大约39,000个存储系统收集的存储日志。该数据集涵盖了44个月的时间,包括大约15,000个磁盘架中托管的大约1,800,000个磁盘。我们的研究揭示了许多有趣的发现,为设计可靠的存储系统提供了有用的指导。我们的一些主要发现包括:(1)除了磁盘故障占存储子系统故障的20-55%外,物理互连和协议栈等其他组件也占存储子系统故障的很大百分比。 (2)每个单独的存储子系统故障类型和整个存储子系统故障都表现出很强的自相关性。另外,这些故障表现出“突发”模式。 (3)配置有冗余互连的存储子系统的故障率比具有单个互连的存储子系统低30-40%。 (4)跨多个机架的RAID组的跨磁盘为存储子系统提供了比单个机架内更具弹性的解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号