...
首页> 外文期刊>International Journal of Applied Engineering Research >COVERED: Content-Version based Removal of Duplicates
【24h】

COVERED: Content-Version based Removal of Duplicates

机译:涵盖:基于Content-Version的删除重复

获取原文
获取原文并翻译 | 示例
           

摘要

Nowadays deduplication is becoming a promising way to provide more storage space by wiping out the unwanted data, particularly duplicate and similar data copies. The similar data copies, an integral part of data versioning is seen everywhere. This paper presents a post process key to integrate data versioning, deduplication and data archiving. Version and content-based similarity detection are attained by finding the similarity scores using shingles and cosine similarity. Data from primary storage is shedded by devolving the older versions as ghost entries to archive. A novel probability model is presented which decides the permanent removal of ghost entries as per their access probabilities. The described work is evaluated on the real and synthetic datasets and active storage space was successfully released by diverting unwanted data to archive.
机译:如今重复数据删除是通过擦除不需要的数据,特别是重复和类似的数据副本来提供更多存储空间的有希望的方法。 目前,可以看到类似的数据副本,数据版本的组成部分。 本文介绍了集成数据版本控制,重复数据删除和数据归档的后处理密钥。 通过使用带状疱疹和余弦相似度找到相似性分数来实现基于版本和基于内容的相似性检测。 通过将较旧版本作为归档的幽灵条目,通过将旧版本Shedded进行归档。 提出了一种新的概率模型,其决定根据其访问概率预先删除鬼魂条目。 通过将不需要的数据转移到存档,在实际和合成数据集中评估所描述的工作,并通过将不需要的数据转移到存档来成功释放活动存储空间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号