RABID: A Distributed Parallel R for Large Datasets

机译：RABID：适用于大型数据集的分布式并行R

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Large-scale data mining and deep data analysis are increasingly important for both enterprise and scientific applications. Statistical languages provide rich functionality and ease of use for data analysis and modeling and have a large user base. R is one of the most widely used of these languages, but is limited to a single threaded execution model and problem sizes that fit in a single node. This paper describes highly parallel R system called RABID (R Analytics for BIg Data) that maintains R compatibility, leverages the MapReducelike distributed Spark and achieves high performance and scaling across clusters. Our experimental evaluation shows that RABID performs up to 5x faster than Hadoop and 20x faster than RHIPE on two data mining applications.

机译：大规模数据挖掘和深度数据分析对于企业和科学应用都越来越重要。统计语言为数据分析和建模提供了丰富的功能和易用性，并且拥有庞大的用户群。 R是这些语言中使用最广泛的语言之一，但仅限于单线程执行模型和适合单个节点的问题大小。本文介绍了高度并行的R系统，称为RABID（用于BIg数据的R Analytics），该系统保持R兼容性，利用MapReducelike分布式Spark，并实现跨集群的高性能和可伸缩性。我们的实验评估表明，在两个数据挖掘应用程序上，RABID的性能比Hadoop快5倍，比RHIPE快20倍。

著录项

来源
《IEEE International Congress on Big Data》|2014年|725-732|共8页
会议地点
作者
Lin Hao; Yang Shuo; Midkiff Samuel P.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Data structures; Distributed databases; Fault tolerance; Fault tolerant systems; Programming; Servers; Sparks; Big Data analytics; Data mining; Distributed Computing; R;

机译：数据结构;分布式数据库;容错能力容错系统;编程;服务器;火花;大数据分析;数据挖掘;分布式计算[R;

相似文献

外文文献
中文文献
专利

1. Cloud-based parallel power flow calculation using resilient distributed datasets and directed acyclic graph [J] . Dewen WANG, Fangfang ZHOU, Jiangman LI Journal of Modern Power Systems and Clean Energy . 2019,第1期

机译：基于云的并行电流计算使用弹性分布式数据集和定向非循环图
2. Cloud-based parallel power flow calculation using resilient distributed datasets and directed acyclic graph [J] . Dewen WANG1, Fangfang ZHOU1, Jiangman LI1 现代电力系统与清洁能源学报(英文) . 2019,第001期

机译：使用弹性分布式数据集和有向无环图的基于云的并行潮流计算
3. Distributed and Parallel Decision Forest for Human Activities Prediction: Experimental Analysis on HAR-Smartphones Dataset [J] . Budi Padmaja, Venkata Rama Prasad Vaddella, Kota Venkata Naga Sunitha Journal of computer sciences . 2019,第5期

机译：用于人类活动预测的分布式并行决策森林：HAR智能手机数据集的实验分析
4. RABID: A Distributed Parallel R for Large Datasets [C] . Lin Hao, Yang Shuo, Midkiff Samuel P. IEEE International Congress on Big Data . 2014

机译：rabid：大型数据集的分布式并行R
5. Combinatorial Optimization on Massive Datasets: Streaming, Distributed, and Massively Parallel Computation [D] . Assadi, Sepehr. 2018

机译：大规模数据集的组合优化：流式，分布式和大规模并行计算
6. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets [O] . D. D. Shrimankar, S. R. Sathe 2016

机译：大型生物数据集基于新图块的并行编程模型对SMP节点和工作站集群的并行算法进行分析
7. Parallel and Distributed Approach for Processing Large-Scale XML Datasets [O] . Zacharia Fadika, Michael R. Head, Madhusudhan Govindaraju 2012

机译：处理大规模XML数据集的并行和分布式方法

RABID: A Distributed Parallel R for Large Datasets

摘要

著录项

相似文献

相关主题

期刊订阅