Sorted deduplication: How to process thousands of backup streams

机译：排序重复数据删除：如何处理数千个备份流

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The requirements of deduplication systems have changed in the last years. Early deduplication systems had to process dozens to hundreds of backup streams at the same time while today they are able to process hundreds to thousands of them. Traditional approaches rely on stream-locality, which supports parallelism, but which easily leads to many non-contiguous disk accesses, as each stream competes with all other streams for the available resources. This paper presents a new exact deduplication approach designed for processing thousands of backup streams at the same time on the same fingerprint index. The underlying approach destroys the traditionally exploited temporal chunk locality and creates a new one by sorting fingerprints. The sorting leads to perfectly sequential disk access patterns on the backup servers, while only slightly increasing the load on the clients. In our experiments, the new approach generates up to 113 times less I/Os than the exact Data Domain deduplication file system and up to 12 times less I/Os than the approximate Sparse Indexing, while consuming less memory at the same time.

机译：过去几年中，重复数据删除系统的需求已发生变化。早期的重复数据删除系统必须同时处理数十至数百个备份流，而如今，它们能够处理数百至数千个备份流。传统方法依赖于流局部性，该局部性支持并行性，但由于每个流与所有其他流争用可用资源，因此很容易导致许多不连续的磁盘访问。本文提出了一种新的精确重复数据删除方法，该方法旨在在同一指纹索引上同时处理数千个备份流。底层方法破坏了传统上利用的时间块局部性，并通过对指纹进行分类创建了一个新的局部性。排序会导致备份服务器上的磁盘访问模式顺序完美，同时只会稍微增加客户端的负载。在我们的实验中，新方法生成的I / O数量比精确的Data Domain重复数据删除文件系统少113倍，而I / O数量比近似的稀疏索引少12倍，同时消耗的内存更少。

著录项

来源
《Symposium on Mass Storage Systems and Technologies》|2016年|1-14|共14页
会议地点
作者
Jürgen Kaiser; Tim Süß; Lars Nagel; André Brinkmann;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Servers; Fingerprint recognition;

机译：服务器;指纹识别;

相似文献

外文文献
中文文献
专利

1. Minimum Backups for Stream Processing With Recovery Latency Guarantees [J] . Hongliang Li, Jie Wu, Zhen Jiang, IEEE Transactions on Reliability . 2017,第3期

机译：具有恢复延迟保证的流处理的最小备份
2. Adaptive deduplication of virtual machine images using AKKA stream to accelerate live migration process in cloud environment [J] . Naga Malleswari TYJ, Vadivu G Journal of Cloud Computing: Advances, Systems and Applications . 2019,第1期

机译：使用AKKA流对虚拟机映像进行自适应重复数据删除，以加快云环境中的实时迁移过程
3. Correct and stable sorting for overflow streaming data with a limited storage size and a uniprocessor [J] . Suluk Chaikhan, Suphakant Phimoltares, Chidchanok Lursinsap PeerJ Computer Science . 2021,第a期

机译：正确稳定地排序，用于溢出具有有限的存储大小和单处理器的溢出流数据
4. Sorted deduplication: How to process thousands of backup streams [C] . Jürgen Kaiser, Tim Sü?, Lars Nagel, Symposium on Mass Storage Systems and Technologies . 2016

机译：重复数据删除：如何处理数千个备份流
5. Efficient and secure deduplication for cloud-based backups. [D] . Wang, Yufeng. 2015

机译：针对基于云的备份的高效，安全的重复数据删除。
6. DOMe: A deduplication optimization method for the NewSQL database backups [O] . Longxiang Wang, Zhengdong Zhu, Xingjun Zhang, -1

机译：DOMe：NewSQL数据库备份的重复数据删除优化方法
7. Adaptive deduplication of virtual machine images using AKKA stream to accelerate live migration process in cloud environment [O] . Naga Malleswari TYJ, Vadivu G 2019

机译：使用AKKA流的虚拟机图像的自适应重复数据删除，从而在云环境中加速实时迁移过程
8. Highly Scalable Linear Solvers on Thousands of Processors [R] . Hu, J. J., Siefert, C. M., Karlin, I., 2009

机译：数千个处理器上的高度可扩展线性求解器

Sorted deduplication: How to process thousands of backup streams

摘要

著录项

相似文献

相关主题

期刊订阅