首页> 外文会议>IEEE International Congress on Big Data >RABID: A Distributed Parallel R for Large Datasets
【24h】

RABID: A Distributed Parallel R for Large Datasets

机译:RABID:适用于大型数据集的分布式并行R

获取原文

摘要

Large-scale data mining and deep data analysis are increasingly important for both enterprise and scientific applications. Statistical languages provide rich functionality and ease of use for data analysis and modeling and have a large user base. R is one of the most widely used of these languages, but is limited to a single threaded execution model and problem sizes that fit in a single node. This paper describes highly parallel R system called RABID (R Analytics for BIg Data) that maintains R compatibility, leverages the MapReducelike distributed Spark and achieves high performance and scaling across clusters. Our experimental evaluation shows that RABID performs up to 5x faster than Hadoop and 20x faster than RHIPE on two data mining applications.
机译:大规模数据挖掘和深度数据分析对于企业和科学应用都越来越重要。统计语言为数据分析和建模提供了丰富的功能和易用性,并且拥有庞大的用户群。 R是这些语言中使用最广泛的语言之一,但仅限于单线程执行模型和适合单个节点的问题大小。本文介绍了高度并行的R系统,称为RABID(用于BIg数据的R Analytics),该系统保持R兼容性,利用MapReducelike分布式Spark,并实现跨集群的高性能和可伸缩性。我们的实验评估表明,在两个数据挖掘应用程序上,RABID的性能比Hadoop快5倍,比RHIPE快20倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号