【24h】

An improved HDFS for small file

机译:改进的小文件HDFS

获取原文

摘要

Hadoop is an open source distributed computing platform, and HDFS is Hadoop distributed file system. The HDFS has a powerful data storage capacity. Therefore, it is suitable for cloud storage system. However, HDFS was originally developed for the streaming access on large software, it has low storage efficiency for massive small files. To solve this problem, the HDFS file storage process is improved. The files are judged before uploading to HDFS clusters. If the file is a small file, it is merged and the index information of the small file is stored in the index file with the form of key-value pairs. The simulation shows that the improved HDFS has lower NameNode memory consumption than original HDFS and Hadoop Archives (HAR files). Thus, it can improve the access efficiency.
机译:Hadoop是一个开源的分布式计算平台,HDFS是Hadoop的分布式文件系统。 HDFS具有强大的数据存储容量。因此,它适用于云存储系统。但是,HDFS最初是为大型软件上的流访问而开发的,它对大量小文件的存储效率较低。为解决此问题,改进了HDFS文件存储过程。在将文件上传到HDFS群集之前,将对文件进行判断。如果文件是小文件,则将其合并,并将小文件的索引信息以键值对的形式存储在索引文件中。仿真显示,改进的HDFS比原始HDFS和Hadoop存档(HAR文件)具有更低的NameNode内存消耗。因此,可以提高访问效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号