...
【24h】

Multimodal data fusion framework based on autoencoders for top-N recommender systems

机译:基于AutoEncoders的Top-N推荐系统的多模式数据融合框架

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we present a novel multimodal framework for video recommendation based on deep learning. Unlike most common solutions, we formulate video recommendations by exploiting simultaneously two data modalities, particularly: (i) the visual (i.e., image sequence) and (ii) the textual modalities, which in conjunction with the audio stream constitute the elementary data of a video document. More specifically, our framework firstly describe textual data by using the bag-of-words and TF-IDF models, fusing those features with deep convolutional descriptors extracted from the visual data. As result, we obtain a multimodal descriptor for each video document, from which we construct a low-dimensional sparse representation by using autoencoders. To qualify the recommendation task, we extend a sparse linear method with side information (SSLIM), by taking into account the sparse representations of video descriptors previously computed. By doing this, we are able to produce a ranking of the top-N most relevant videos to the user. Note that our framework is flexible, i.e., one may use other types of modalities, autoencoders, and fusion architectures. Experimental results obtained on three real datasets (MovieLens-1M, MovieLens-10M and Vine), containing 3,320, 8,400 and 18,576 videos, respectively, show that our framework can improve up to 60.6% the recommendation results, when compared to a single modality recommendation model and up to 31%, when compared to state-of-the art methods used as baselines in our study, demonstrating the effectiveness of our framework and highlighting the usefulness of multimodal information in recommender system.
机译:在本文中,我们为基于深度学习的视频推荐提供了一种新的多模态框架。与最常见的解决方案不同,我们通过同时分配两个数据模型来制定视频建议,特别是:(i)视觉(即图像序列)和(ii)与音频流结合的文本方式构成a的基本数据视频文档。更具体地,我们的框架首先通过使用文字袋和TF-IDF模型来描述文本数据,融合那些具有从视觉数据中提取的深度卷积描述符的功能。结果,我们获得每个视频文档的多模式描述符,从中,我们通过使用AutoEncoders构造低维稀疏表示。为了限定推荐任务,我们通过考虑先前计算的视频描述符的稀疏表示,扩展了一种稀疏的线性方法(SSLIM)。通过这样做,我们能够向用户生成顶级最相关的视频的排名。请注意,我们的框架是灵活的,即,可以使用其他类型的模态,自动码器和融合架构。在三个真实数据集(MOVIELENS-1M,MOVIELENS-10M和藤上)获得的实验结果,分别含有3,320,8,400和18,576个视频,表明,与单个模态推荐相比,我们的框架可以提高高达60.6%的推荐结果。模型和高达31%的,与我们研究中使用的最新方法相比,展示了我们框架的有效性并突出了推荐系统中多式联信息的有用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号