首页> 中文期刊> 《计算机技术与发展》 >一种Hadoop小文件存储优化方案

一种Hadoop小文件存储优化方案

         

摘要

Hadoop分布式文件系统( HDFS)适合处理和存储大文件,在处理的文件体积较大时表现出色,但是在处理海量的小文件时效率和性能下降明显,过多的小文件将会导致整个集群的负载过高。为了提高HDFS处理小文件的性能,提出了双重合并算法-即基于文件之间的关联关系和基于数据块平衡的小文件合并算法,能够将小文件的文件体积大小进行均匀分布。通过该算法能够进一步提升小文件的合并效果,减少HDFS集群主节点内存消耗,降低负载,有效降低合并所需的数据块数量,最终能够提高HDFS处理海量小文件的性能。%The performance of Hadoop Distributed File System ( HDFS) in dealing with the problem of storing and handling large files is excellent,but the performance and efficiency is down significantly when dealing with a huge number of small files. Too many small files will lead to the high load of entire cluster. In order to improve the performance of HDFS handling small files,the double-merging algo-rithm is put forward based on the relationship between files and merging algorithm based on the data block balance and used for uniform distribution of file size for small file. The program can further improve the effect of small file merging,decreasing the memory spending of HDFS cluster master node,reducing workload,and effectively declining the number of data blocks combined. Eventually the perform-ance of HDFS is improved when dealing with large small files.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号