...
首页> 外文期刊>Journal of Computers >Efficient Cross User Client Side Data Deduplication in Hadoop
【24h】

Efficient Cross User Client Side Data Deduplication in Hadoop

机译:高效交叉用户客户端数据删除在Hadoop中

获取原文
           

摘要

—Hadoop is widely used for applications like Aadhaar card, Healthcare, Media, Ad Platform, Fraud Detection & Crime, and Education etc. However, it does not provide efficient and optimized data storage solution. One interesting thing we found that when user uploads the same file twice with same file name it doesn’t allow saving the same file. But when user uploads the same file content with different file name Hadoop allows uploading that file. In general same files are uploaded by many users (cross user) with different name with same contents so this leads to wastage of storage space. So we provided the solution of above problem and provide Data Deduplication in Hadoop. Before uploading data to HDFS we calculate Hash Value of File and stored that Hash Value in Database for later use. Now same or other user wants to upload the same content file but with same content, our DeDup module will calculate Hash value and verify it to HBase. Now if Hash Value is matched so it will give message that “File is already exits”. Experimental analysis demonstrates (i.e. Text, Audio, Video, Zip files etc.) that proposed solution gives more optimized storage acquiring very small computation overhead and having optimized storage space.
机译:-Hadoop广泛用于Aadhaar卡,医疗保健,媒体,广告平台,欺诈检测和犯罪和教育等的应用。但是,它没有提供有效和优化的数据存储解决方案。一个有趣的事情我们发现当用户以相同的文件名上传相同的文件时,它不允许保存相同的文件。但是,当用户上传具有不同文件名的相同文件内容时,Hadoop允许上载该文件。通常,许多用户(交叉用户)上载了相同的文件,其不同的名称具有相同的内容,因此这导致存储空间的浪费。因此,我们提供了上述问题的解决方案,并在Hadoop中提供数据重复数据删除。在将数据上传到HDF之前,我们计算文件的哈希值并将该散列值存储在数据库中以供以后使用。现在相同或其他用户想要上传相同的内容文件,但具有相同的内容,我们的Dedup模块将计算哈希值并验证为HBase。现在,如果哈希值匹配,所以它将发出消息“文件已退出”。实验分析演示(即文本,音频,视频,ZIP文件等),所提出的解决方案提供了更优化的存储器获取非常小的计算开销并具有优化的存储空间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号