首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >A Highly Reliable Metadata Service for Large-Scale Distributed File Systems
【24h】

A Highly Reliable Metadata Service for Large-Scale Distributed File Systems

机译:大规模分布式文件系统的高度可靠的元数据服务

获取原文
获取原文并翻译 | 示例
           

摘要

Many massive data processing applications nowadays often need long, continuous, and uninterrupted data accesses. Distributed file systems are used as the back-end storage to provide the global namespace management and reliability guarantee. Due to increasing hardware failures and software issues with the growing system scale, metadata service reliability has become a critical issue as it has a direct impact on file and directory operations. Existing metadata management mechanisms can provide fault tolerance capability to some level but are inadequate. They often have limitations in system availability, state consistence, and performance overhead and lack an effective mechanism to offer metadata reliability. This paper introduces a novel highly reliable metadata service to address these issues in large-scale file systems. Different from traditional strategies, this proposed reliable metadata service adopts a new active-standby architecture for fault tolerance and uses a holistic approach to improve file system availability. A new shared storage pool (SSP) is designed for transparent metadata synchronization and replication between active and standby servers. Based on the SSP, a new policy called multiple actives multiple standbys (MAMS) is presented to perform metadata service recovery in case of failures. A new global state recovery strategy and a smart client fault tolerance mechanism are achieved to maintain the continuity of metadata service. We have implemented such highly reliable metadata service in a prototype file system CFS (Clover file system) and conducted extensive tests to evaluate it. Experimental results confirm that it can significantly improve file system reliability with fast failover under different failure scenarios while having negligible influence on performance. Compared with typical reliability designs in Hadoop Avatar, Hadoop HA, and Boom-FS file systems, the mean-time-to-recovery (MTTR) with the highly reliable metadata service was reduced by 80.23, 65.46 and 28.13 percent, respectively.
机译:如今,许多大型数据处理应用程序经常需要长时间,连续且不间断的数据访问。分布式文件系统用作后端存储,以提供全局名称空间管理和可靠性保证。由于随着系统规模的扩大而出现的硬件故障和软件问题不断增加,元数据服务的可靠性已成为至关重要的问题,因为它直接影响文件和目录的操作。现有的元数据管理机制可以在一定程度上提供容错能力,但是还不够。它们通常在系统可用性,状态一致性和性能开销方面存在限制,并且缺乏提供元数据可靠性的有效机制。本文介绍了一种新颖的高度可靠的元数据服务,以解决大规模文件系统中的这些问题。与传统策略不同,此建议的可靠元数据服务采用了新的主备结构以实现容错功能,并使用整体方法来提高文件系统的可用性。新的共享存储池(SSP)设计用于透明的元数据同步以及活动服务器和备用服务器之间的复制。基于SSP,提出了一种称为多活动多备用(MAMS)的新策略,以在发生故障时执行元数据服务恢复。实现了新的全局状态恢复策略和智能客户端容错机制,以维持元数据服务的连续性。我们已经在原型文件系统CFS(三叶草文件系统)中实现了这种高度可靠的元数据服务,并进行了广泛的测试以对其进行评估。实验结果证实,它可以通过在不同故障情况下进行快速故障转移来显着提高文件系统的可靠性,同时对性能的影响可以忽略不计。与Hadoop Avatar,Hadoop HA和Boom-FS文件系统中的典型可靠性设计相比,具有高度可靠的元数据服务的平均恢复时间(MTTR)分别降低了80.23%,65.46%和28.13%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号