Parallel Processing Systems for Big Data: A Survey

Yunquan Zhang; Ting Cao; Shigang Li; Xinhui Tian; Liang Yuan; Haipeng Jia; Athanasios V. Vasilakos

首页> 外文期刊>Proceedings of the IEEE >Parallel Processing Systems for Big Data: A Survey

【24h】

Parallel Processing Systems for Big Data: A Survey

机译：大数据并行处理系统：一项调查

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The volume, variety, and velocity properties of big data and the valuable information it contains have motivated the investigation of many new parallel data processing systems in addition to the approaches using traditional database management systems (DBMSs). MapReduce pioneered this paradigm change and rapidly became the primary big data processing system for its simplicity, scalability, and fine-grain fault tolerance. However, compared with DBMSs, MapReduce also arouses controversy in processing efficiency, low-level abstraction, and rigid dataflow. Inspired by MapReduce, nowadays the big data systems are blooming. Some of them follow MapReduce's idea, but with more flexible models for general-purpose usage. Some absorb the advantages of DBMSs with higher abstraction. There are also specific systems for certain applications, such as machine learning and stream data processing. To explore new research opportunities and assist users in selecting suitable processing systems for specific applications, this survey paper will give a high-level overview of the existing parallel data processing systems categorized by the data input as batch processing, stream processing, graph processing, and machine learning processing and introduce representative projects in each category. As the pioneer, the original MapReduce system, as well as its active variants and extensions on dataflow, data access, parameter tuning, communication, and energy optimizations will be discussed at first. System benchmarks and open issues for big data processing will also be studied in this survey.

机译：除了使用传统数据库管理系统（DBMS）的方法外，大数据的数量，种类和速度属性以及其中包含的有价值的信息还推动了许多新型并行数据处理系统的研究。 MapReduce率先进行了这种范式更改，并以其简单性，可伸缩性和细粒度的容错能力迅速成为主要的大数据处理系统。但是，与DBMS相比，MapReduce在处理效率，低级抽象和严格的数据流方面也引起了争议。受MapReduce的启发，当今的大数据系统正在蓬勃发展。其中一些遵循MapReduce的想法，但具有用于通用用途的更灵活的模型。有些吸收了更高抽象度的DBMS的优势。对于某些应用，也有特定的系统，例如机器学习和流数据处理。为了探索新的研究机会并帮助用户选择适合特定应用的处理系统，本调查报告将对现有并行数据处理系统进行高级别概述，这些并行数据处理系统按数据输入分类为批处理，流处理，图形处理和机器学习处理并介绍每个类别中的代表性项目。作为先驱，首先将讨论原始的MapReduce系统及其在数据流，数据访问，参数调整，通信和能源优化方面的有效变体和扩展。本次调查还将研究大数据处理的系统基准和开放性问题。

著录项

来源
《Proceedings of the IEEE》 |2016年第11期|2114-2136|共23页
作者
Yunquan Zhang; Ting Cao; Shigang Li; Xinhui Tian; Liang Yuan; Haipeng Jia; Athanasios V. Vasilakos;
展开▼
作者单位

State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;

State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;

State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;

Advanced Computer Systems Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;

State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;

State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;

Department of Computer Science, Electrical and Space Engineering, Lulea University of Technology, Lulea, Sweden;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Big data; Computer applications; Programming; Parallel processing; Data models; Benchmark testing; Machine learning; Structured Query Language;

机译：大数据;计算机应用;编程;并行处理;数据模型;基准测试;机器学习;结构化查询语言;

相似文献

外文文献
中文文献
专利

1. A survey of current challenges in partitioning and processing of graph-structured data in parallel and distributed systems [J] . Adoni Hamilton Wilfried Yves, Nahhal Tarik, Krichen Moez, Distributed and Parallel Databases . 2020,第2期

机译：并行分布式系统中划分和处理图形结构数据的当前挑战的调查
2. A survey of current challenges in partitioning and processing of graph-structured data in parallel and distributed systems [J] . Ecological restoration . 2020,第2期

机译：在并行和分布式系统中对图形结构数据进行分区和处理的当前挑战的调查
3. Tightly integrated single-and multi-crystal data collection strategy calculation and parallelized data processing in JBluIce beamline control system [J] . Sudhir Babu Pothineni, Nagarajan Venugopalan, Craig M. Ogata, Journal of Applied Crystallography . 2014,第6期

机译：JBluIce流水线控制系统中紧密集成的单晶和多晶数据收集策略计算和并行数据处理
4. A SIMPLE AND EFFECTIVE DATA ACQUISITION PLANNING, PROCESSING AND GUIDANCE SYSTEM FOR CONDUCTING PARALLEL SWATH ELECTROMAGNETIC SURVEYS USING GEM 2 IN SUBSURFACE DRD? IRRIGATION MONITORING, POWDER RIVER BASIN, WYOMING [C] . Garret Veloski Symposium on the Application Of Geophysics To Engineering And Environmental Problems . 2012

机译：一种简单有效的数据采集规划，加工和指导系统，用于在地下DRD中使用GEM 2进行平行的SWATH电磁调查？灌溉监测，粉河流域，怀俄明州
5. Parallel Processing Systems for Data and Computation Efficiency with Applications to Graph Computing and Machine Learning [D] . ?Zhou, Li 2019

机译：用于数据和计算效率的并行处理系统，具有图形计算和机器学习的应用
6. Tightly integrated single- and multi-crystal data collection strategy calculation and parallelized data processing in JBluIce beamline control system [O] . Sudhir Babu Pothineni, Nagarajan Venugopalan, Craig M. Ogata, -1

机译：JBluIce流水线控制系统中紧密集成的单晶和多晶数据收集策略计算和并行数据处理
7. Model of parallel data processing systems and parallel processes [O] . Динако М., Dynako M. 2015

机译：并行数据处理系统和并行过程的模型
8. A Survey of Interconnection Methods for Reconfigurable Parallel Processing Systems. [R] . Siegel, H. J., McMillen, R. J., Mueller, P. T. 1979

机译：可重构并行处理系统互连方法综述。

Parallel Processing Systems for Big Data: A Survey

摘要

著录项

相似文献

相关主题

期刊订阅