Efficient Data Deduplication for Big Data Storage Systems

机译：高数据存储系统的高效数据重复数据删除

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

For efficient chunking, we propose Differential Evolution (DE) based approach which is optimized Two Thresholds Two Divisors (TTTD-P) Content Defined Chunking (CDC) to reduce the number of computing operations using single dynamic optimal parameter divisor D with optimal threshold value exploiting multi-operations nature of TTTD. To reduce chunk size variance, TTTD algorithm introduces an additional backup divisor D' that has a higher probability of finding cut points, however, adding an additional divisor decreases chunking throughput. To this end. Asymmetric Extremum (AE) significantly improves chunking throughput by using local extreme value in a variable-sized asymmetric window to overcome Rabin and TTTD boundaries shift problem, while achieving nearby same deduplication ratio (DR). Therefore, we propose DE-based TTTD-P optimized chunking to maximize chunking throughput with increased DR; and scalable bucket indexing approach reduces hash values judgment time to identify and declare redundant chunks about 16 times than Rabin CDC, 5 times than AE CDC, 1.6 times than FAST CDC on Hadoop Distributed File System (HDFS).

机译：为了有效的块，我们提出了基于差分演进（DE）的方法，该方法优化了两个阈值两除数（TTTD-P）内容定义的块（CDC），以减少使用单个动态最佳参数除数D的计算操作的数量，具有最佳阈值利用TTTD的多功能性质。为了减少块尺寸方差，TTTD算法介绍了一个额外的备份除数D'，其具有较高的查找切片概率，但是，添加附加除数会降低块吞吐量。为此。不对称极值（AE）通过在可变尺寸的不对称窗口中使用局部极值来显着提高块吞吐量，以克服Rabin和TTTD边界换档问题，同时实现附近的重复数据删除比（DR）。因此，我们提出了基于DE的TTTD-P优化的块，以使博士增加最大化吞吐量;可扩展的桶索引方法可减少哈希值判断时间，以识别和声明比Rabin CDC的冗余块大约为16次，而不是AE CDC，比Hadoop分布式文件系统（HDFS）上的快速CDC快3.6倍。

著录项

来源
《International Conference on Advanced Computing and Intelligent Engineering》|2019年|xvi 609 pages :|共21页
会议地点
作者
Naresh Kumar; Shobha; S. C. Jain;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP301.4-532;
关键词
Data deduplication; Content defined chunking; TTTD HDFS;

机译：重复数据删除;内容定义的分块;TTTD HDFS;

相似文献

外文文献
中文文献
专利

1. Updatable block-level deduplication of encrypted data with efficient auditing in cloud storage [J] . Dang Qianlong, Xie Ying, Li Donghao, 中国邮电高校学报（英文版） . 2019,第003期

机译：云存储中的有效审计可更新的加密数据块级重复数据删除
2. An Efficient Inline Data Deduplication with Data Relationship Manager for Cloud Storage [J] . Venish A., Sivasankar K. International Journal of Applied Engineering Research . 2017,第15aPta2期

机译：用于云存储数据关系管理器的高效内联数据删除
3. Towards Efficient Big Data Storage With MapReduce Deduplication System [J] . Vijesh Joe, Jennifer S. Raj, Smys S. International journal of information technology and web engineering . 2021,第2期

机译：使用MapReduce重复数据删除系统实现高效的大数据存储
4. Efficient Data Deduplication for Big Data Storage Systems [C] . Naresh Kumar, Shobha, S. C. Jain International Conference on Advanced Computing and Intelligent Engineering . 2019

机译：高数据存储系统的高效数据重复数据删除
5. Statistical Characterization of Storage System Workloads for Data Deduplication and Load Placement in Heterogeneous Storage Environments. [D] . Park, Nohhyun. 2013

机译：异构存储环境中用于重复数据删除和负载放置的存储系统工作负载的统计特性。
6. Handling the data management needs of high-throughput sequencing data: SpeedGene a compression algorithm for the efficient storage of genetic data [O] . Dandi Qiao, Wai-Ki Yip, Christoph Lange 2012

机译：处理高通量测序数据的数据管理需求：SpeedGene一种用于有效存储遗传数据的压缩算法
7. STRENGTHNING THE PRODUCTIVITY OF STORAGE FOR BIG DATA STORAGE SYSTEMS USING DISTRIBUTED DEDUPLICATION [O] . 2020

机译：使用分布式重复数据删除加强对大数据存储系统的存储的生产率

Efficient Data Deduplication for Big Data Storage Systems

摘要

著录项

相似文献

相关主题

期刊订阅