首页> 外文期刊>Proceedings of the IEEE >Parallel Processing Systems for Big Data: A Survey
【24h】

Parallel Processing Systems for Big Data: A Survey

机译:大数据并行处理系统:一项调查

获取原文
获取原文并翻译 | 示例
           

摘要

The volume, variety, and velocity properties of big data and the valuable information it contains have motivated the investigation of many new parallel data processing systems in addition to the approaches using traditional database management systems (DBMSs). MapReduce pioneered this paradigm change and rapidly became the primary big data processing system for its simplicity, scalability, and fine-grain fault tolerance. However, compared with DBMSs, MapReduce also arouses controversy in processing efficiency, low-level abstraction, and rigid dataflow. Inspired by MapReduce, nowadays the big data systems are blooming. Some of them follow MapReduce's idea, but with more flexible models for general-purpose usage. Some absorb the advantages of DBMSs with higher abstraction. There are also specific systems for certain applications, such as machine learning and stream data processing. To explore new research opportunities and assist users in selecting suitable processing systems for specific applications, this survey paper will give a high-level overview of the existing parallel data processing systems categorized by the data input as batch processing, stream processing, graph processing, and machine learning processing and introduce representative projects in each category. As the pioneer, the original MapReduce system, as well as its active variants and extensions on dataflow, data access, parameter tuning, communication, and energy optimizations will be discussed at first. System benchmarks and open issues for big data processing will also be studied in this survey.
机译:除了使用传统数据库管理系统(DBMS)的方法外,大数据的数量,种类和速度属性以及其中包含的有价值的信息还推动了许多新型并行数据处理系统的研究。 MapReduce率先进行了这种范式更改,并以其简单性,可伸缩性和细粒度的容错能力迅速成为主要的大数据处理系统。但是,与DBMS相比,MapReduce在处理效率,低级抽象和严格的数据流方面也引起了争议。受MapReduce的启发,当今的大数据系统正在蓬勃发展。其中一些遵循MapReduce的想法,但具有用于通用用途的更灵活的模型。有些吸收了更高抽象度的DBMS的优势。对于某些应用,也有特定的系统,例如机器学习和流数据处理。为了探索新的研究机会并帮助用户选择适合特定应用的处理系统,本调查报告将对现有并行数据处理系统进行高级别概述,这些并行数据处理系统按数据输入分类为批处理,流处理,图形处理和机器学习处理并介绍每个类别中的代表性项目。作为先驱,首先将讨论原始的MapReduce系统及其在数据流,数据访问,参数调整,通信和能源优化方面的有效变体和扩展。本次调查还将研究大数据处理的系统基准和开放性问题。

著录项

  • 来源
    《Proceedings of the IEEE》 |2016年第11期|2114-2136|共23页
  • 作者单位

    State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;

    State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;

    State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;

    Advanced Computer Systems Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;

    State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;

    State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;

    Department of Computer Science, Electrical and Space Engineering, Lulea University of Technology, Lulea, Sweden;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Big data; Computer applications; Programming; Parallel processing; Data models; Benchmark testing; Machine learning; Structured Query Language;

    机译:大数据;计算机应用;编程;并行处理;数据模型;基准测试;机器学习;结构化查询语言;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号