首页> 外文会议>Symposium on Mass Storage Systems and Technologies >Sorted deduplication: How to process thousands of backup streams
【24h】

Sorted deduplication: How to process thousands of backup streams

机译:排序重复数据删除:如何处理数千个备份流

获取原文

摘要

The requirements of deduplication systems have changed in the last years. Early deduplication systems had to process dozens to hundreds of backup streams at the same time while today they are able to process hundreds to thousands of them. Traditional approaches rely on stream-locality, which supports parallelism, but which easily leads to many non-contiguous disk accesses, as each stream competes with all other streams for the available resources. This paper presents a new exact deduplication approach designed for processing thousands of backup streams at the same time on the same fingerprint index. The underlying approach destroys the traditionally exploited temporal chunk locality and creates a new one by sorting fingerprints. The sorting leads to perfectly sequential disk access patterns on the backup servers, while only slightly increasing the load on the clients. In our experiments, the new approach generates up to 113 times less I/Os than the exact Data Domain deduplication file system and up to 12 times less I/Os than the approximate Sparse Indexing, while consuming less memory at the same time.
机译:过去几年中,重复数据删除系统的需求已发生变化。早期的重复数据删除系统必须同时处理数十至数百个备份流,而如今,它们能够处理数百至数千个备份流。传统方法依赖于流局部性,该局部性支持并行性,但由于每个流与所有其他流争用可用资源,因此很容易导致许多不连续的磁盘访问。本文提出了一种新的精确重复数据删除方法,该方法旨在在同一指纹索引上同时处理数千个备份流。底层方法破坏了传统上利用的时间块局部性,并通过对指纹进行分类创建了一个新的局部性。排序会导致备份服务器上的磁盘访问模式顺序完美,同时只会稍微增加客户端的负载。在我们的实验中,新方法生成的I / O数量比精确的Data Domain重复数据删除文件系统少113倍,而I / O数量比近似的稀疏索引少12倍,同时消耗的内存更少。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号