首页> 外国专利> Utilizing global digests caching in similarity based data deduplication

Utilizing global digests caching in similarity based data deduplication

机译：在基于相似性的重复数据删除中利用全局摘要缓存

页面导航

摘要
著录项
相似文献

摘要

Input data is partitioned into data chunks and digest values are calculated for each of the data chunks. The positions of similar repository data are found in a repository of data for each of the data chunks. The input digests of the input data are matched with the repository digests contained in the global digests cache for locating data matches. The processor prefers to match the input digests of the input data with the repository digests contained in the global digests cache which are of the similar repository data, rather than repository digests which are of other repository data that was not determined as similar to the input data chunks. The positions of the similar repository data are used to locate and linearly load into the global digests cache, digests and digest block boundaries of the similar repository data.

机译：输入数据被划分为数据块，并为每个数据块计算摘要值。在每个数据块的数据存储库中都可以找到相似的存储库数据的位置。输入数据的输入摘要与全局摘要缓存中包含的存储库摘要匹配，以查找数据匹配项。处理器更喜欢将输入数据的输入摘要与全局摘要缓存中包含的，类似于相似数据仓库数据的知识库摘要进行匹配，而不是将其他未确定为与输入数据相似的其他知识库数据的知识库摘要进行匹配大块。相似存储库数据的位置用于定位并线性加载到全局摘要缓存中，以摘要存储相似存储库数据的块边界。

著录项

公开/公告号US10013202B2

专利类型
公开/公告日2018-07-03

原文格式PDF
申请/专利权人 INTERNATIONAL BUSINESS MACHINES CORPORATION;
展开▼

申请/专利号US201715826951
发明设计人 SHAY H. AKIRAV;LIOR ARONOVICH;
展开▼

申请日2017-11-30
分类号G06F12/08;G06F3/06;G06F17/30;G06F12/0875;G06F12/0846;
国家 US
入库时间 2022-08-21 13:04:47

相似文献

专利
外文文献
中文文献