【24h】

Recurrent Memory Addressing for Describing Videos

机译:用于描述视频的经常性存储器寻址

获取原文

摘要

In this paper, we introduce Key-Value Memory Networks to a multimodal setting and a novel key-addressing mechanism to deal with sequence-to-sequence models. The proposed model naturally decomposes the problem of video captioning into vision and language segments, dealing with them as key-value pairs. More specifically, we learn a semantic embedding (v) corresponding to each frame (k) in the video, thereby creating (k, v) memory slots. We propose to find the next step attention weights conditioned on the previous attention distributions for the key-value memory slots in the memory addressing schema. Exploiting this flexibility of the framework, we additionally capture spatial dependencies while mapping from the visual to semantic embedding. Experiments done on the Youtube2Text dataset demonstrate usefulness of recurrent key-addressing, while achieving competitive scores on BLEU@4, METEOR metrics against state-of-the-art models.
机译:在本文中,我们将键值存储器网络引入多模式设置和新的键寻址机制,以处理序列到序列模型。该建议的模型自然地将视频字幕问题分解为视觉和语言段,处理它们作为键值对。更具体地,我们学习对应于视频中的每个帧(k)的语义嵌入(v),从而创建(k,v)存储器槽。我们建议在存储器寻址模式中的键值存储器插槽中找到下一步注意力调节,以便在内存寻址模式中的键值存储器插槽。利用该框架的灵活性,我们还在从Visual映射到语义嵌入时拍摄空间依赖性。在YouTube2Text DataSet上完成的实验表明了经常性关键地址的有用性,同时在Bleu @ 4,流星指标上实现了竞争分数,反对最先进的模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号