首页> 外文会议>International Conference on Advanced Computing and Intelligent Engineering >Efficient Data Deduplication for Big Data Storage Systems
【24h】

Efficient Data Deduplication for Big Data Storage Systems

机译:高数据存储系统的高效数据重复数据删除

获取原文

摘要

For efficient chunking, we propose Differential Evolution (DE) based approach which is optimized Two Thresholds Two Divisors (TTTD-P) Content Defined Chunking (CDC) to reduce the number of computing operations using single dynamic optimal parameter divisor D with optimal threshold value exploiting multi-operations nature of TTTD. To reduce chunk size variance, TTTD algorithm introduces an additional backup divisor D' that has a higher probability of finding cut points, however, adding an additional divisor decreases chunking throughput. To this end. Asymmetric Extremum (AE) significantly improves chunking throughput by using local extreme value in a variable-sized asymmetric window to overcome Rabin and TTTD boundaries shift problem, while achieving nearby same deduplication ratio (DR). Therefore, we propose DE-based TTTD-P optimized chunking to maximize chunking throughput with increased DR; and scalable bucket indexing approach reduces hash values judgment time to identify and declare redundant chunks about 16 times than Rabin CDC, 5 times than AE CDC, 1.6 times than FAST CDC on Hadoop Distributed File System (HDFS).
机译:为了有效的块,我们提出了基于差分演进(DE)的方法,该方法优化了两个阈值两除数(TTTD-P)内容定义的块(CDC),以减少使用单个动态最佳参数除数D的计算操作的数量,具有最佳阈值利用TTTD的多功能性质。为了减少块尺寸方差,TTTD算法介绍了一个额外的备份除数D',其具有较高的查找切片概率,但是,添加附加除数会降低块吞吐量。为此。不对称极值(AE)通过在可变尺寸的不对称窗口中使用局部极值来显着提高块吞吐量,以克服Rabin和TTTD边界换档问题,同时实现附近的重复数据删除比(DR)。因此,我们提出了基于DE的TTTD-P优化的块,以使博士增加最大化吞吐量;可扩展的桶索引方法可减少哈希值判断时间,以识别和声明比Rabin CDC的冗余块大约为16次,而不是AE CDC,比Hadoop分布式文件系统(HDFS)上的快速CDC快3.6倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号