首页> 外文会议>IEEE International Congress on Big Data >A Compatible LZMA ORC-Based Optimization for High Performance Big Data Load
【24h】

A Compatible LZMA ORC-Based Optimization for High Performance Big Data Load

机译:基于LZMA ORC的兼容优化可实现高性能大数据加载

获取原文

摘要

This paper presents several efficient ways to improve data loading and storage optimization in Hadoop cluster. We design a new method to leverage LZMA and ORC to gain performance edge, also improve ORC implementation in HDFS to have a higher compression ratio and better IO throughput. A complete optimization strategy for efficient big data loading, including byte array-oriented, record split, less serialization and shuffle, reducing middle data landing to earn great performance boost is presented. This paper provides preliminary results and analytics. Evaluation results indicate that our method achieves significant performance improvement for big data load.
机译:本文提出了几种有效的方法来改善Hadoop集群中的数据加载和存储优化。我们设计了一种利用LZMA和ORC来获得性能优势的新方法,还改进了HDFS中的ORC实施,以具有更高的压缩率和更好的IO吞吐量。提出了一种用于高效大数据加载的完整优化策略,包括面向字节数组,记录拆分,较少的序列化和混洗,减少中间数据着陆以提高性能。本文提供了初步结果和分析。评估结果表明,对于大数据负载,我们的方法可以显着提高性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号