首页> 外文会议>IEEE International Congress on Big Data >Using Inter-file Similarity to Improve Intra-file Compression
【24h】

Using Inter-file Similarity to Improve Intra-file Compression

机译:使用文件间相似性改善文件内压缩

获取原文

摘要

In storage systems with vast numbers of files, compression techniques should exploit of inter-file similarity, while allowing for near-atomic access to individual files. In differential compression, collections of files are compressed by identifying shared common strings. Therefore, some files are represented largely by references to strings in other files. In addition, a file in the collection can be (further) compressed by identifying common strings within the file itself. At the cost of decompression latency, but a possible gain in compression effectiveness, an LZ-style within-file compressor could resolve these references to other files. To quantify the compression gain, we experiment with a variety of file collections, from emails to source code, and test against multiple measures. If the LZ scheme honors the inter-file references, then there is only minimal improvement. If the LZ algorithm replaces inter-file references with intra-file references, then up to 3% compression improvement is witnessed for mildly similar files, and over 200% improvement for highly similar files.
机译:在具有大量文件的存储系统中,压缩技术应利用文件间的相似性,同时允许对单个文件进行近原子访问。在差异压缩中,通过标识共享的公共字符串来压缩文件集合。因此,某些文件主要由对其他文件中字符串的引用来表示。另外,可以通过标识文件本身中的公共字符串来(进一步)压缩集合中的文件。 LZ样式的文件内压缩器可能会以解压缩延迟为代价,但可能会获得压缩效果,因此可以将这些引用解析为其他文件。为了量化压缩增益,我们尝试了从电子邮件到源代码的各种文件集合,并针对多种措施进行了测试。如果LZ方案遵循文件间引用,则仅会有最小的改进。如果LZ算法将文件间引用替换为文件内引用,那么对于轻度相似的文件,压缩率最高可提高3%,对于高度相似的文件,则可提高200%以上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号