首页> 外文期刊>Journal of Intelligent Information Systems >In-memory, distributed content-based recommender system
【24h】

In-memory, distributed content-based recommender system

机译:内存中基于内容的分布式推荐系统

获取原文
获取原文并翻译 | 示例
           

摘要

Burdened by their popularity, recommender systems increasingly take on larger datasets while they are expected to deliver high quality results within reasonable time. To meet these ever growing requirements, industrial recommender systems often turn to parallel hardware and distributed computing. While the MapReduce paradigm is generally accepted for massive parallel data processing, it often entails complex algorithm reorganization and suboptimal efficiency because mid-computation values are typically read from and written to hard disk. This work implements an in-memory, content-based recommendation algorithm and shows how it can be parallelized and efficiently distributed across many homogeneous machines in a distributed-memory environment. By focusing on data parallelism and carefully constructing the definition of work in the context of recommender systems, we are able to partition the complete calculation process into any number of independent and equally sized jobs. An empirically validated performance model is developed to predict parallel speedup and promises high efficiencies for realistic hardware configurations. For the MovieLens 10 M dataset we note efficiency values up to 71 % for a configuration of 200 computing nodes (eight cores per node).
机译:由于其受欢迎程度,推荐系统越来越多地采用更大的数据集,而预计它们将在合理的时间内提供高质量的结果。为了满足这些不断增长的需求,工业推荐系统通常转向并行硬件和分布式计算。尽管MapReduce范式通常被海量并行数据处理所接受,但由于通常会从硬盘读取和写入中间计算值,因此通常需要进行复杂的算法重组和次优效率。这项工作实现了基于内容的内存中推荐算法,并展示了如何在分布式内存环境中将其并行化并有效地分布在许多同类机器上。通过关注数据并行性并在推荐器系统的上下文中精心构造工作的定义,我们能够将完整的计算过程划分为任意数量的独立且大小相等的作业。建立了经过经验验证的性能模型,以预测并行速度,并有望为实际的硬件配置带来更高的效率。对于MovieLens 10 M数据集,我们注意到200个计算节点(每个节点八个核心)的配置的效率值高达71%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号