Recurrent Memory Addressing for Describing Videos

机译：用于描述视频的经常性存储器寻址

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we introduce Key-Value Memory Networks to a multimodal setting and a novel key-addressing mechanism to deal with sequence-to-sequence models. The proposed model naturally decomposes the problem of video captioning into vision and language segments, dealing with them as key-value pairs. More specifically, we learn a semantic embedding (v) corresponding to each frame (k) in the video, thereby creating (k, v) memory slots. We propose to find the next step attention weights conditioned on the previous attention distributions for the key-value memory slots in the memory addressing schema. Exploiting this flexibility of the framework, we additionally capture spatial dependencies while mapping from the visual to semantic embedding. Experiments done on the Youtube2Text dataset demonstrate usefulness of recurrent key-addressing, while achieving competitive scores on BLEU@4, METEOR metrics against state-of-the-art models.

机译：在本文中，我们将键值存储器网络引入多模式设置和新的键寻址机制，以处理序列到序列模型。该建议的模型自然地将视频字幕问题分解为视觉和语言段，处理它们作为键值对。更具体地，我们学习对应于视频中的每个帧（k）的语义嵌入（v），从而创建（k，v）存储器槽。我们建议在存储器寻址模式中的键值存储器插槽中找到下一步注意力调节，以便在内存寻址模式中的键值存储器插槽。利用该框架的灵活性，我们还在从Visual映射到语义嵌入时拍摄空间依赖性。在YouTube2Text DataSet上完成的实验表明了经常性关键地址的有用性，同时在Bleu @ 4，流星指标上实现了竞争分数，反对最先进的模型。

著录项

来源
《IEEE Conference on Computer Vision and Pattern Recognition Workshops》|2017年|1 v.|共8页
会议地点
作者
Arnav Kumar Jain; Abhinav Agarwalla; Kumar Krishna Agrawal; Pabitra Mitra;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.41;
关键词
Videos; Decoding; Visualization; Semantics; Feature extraction; Context modeling; Computational modeling;

机译：视频;解码;可视化;语义;特征提取;上下文建模;计算建模;

相似文献

外文文献
中文文献
专利

1. Recurrent Neural Networks With External Addressable Long-Term and Working Memory for Learning Long-Term Dependences [J] . Neural Networks and Learning Systems, IEEE Transactions on . 2020,第3期

机译：具有外部可寻址长期和工作记忆的递归神经网络，用于学习长期依赖性
2. Byte and modulo addressable parallel memory architecture for video coding [J] . Tanskanen J.K., Sihvo T., Niittylahti J. IEEE Transactions on Circuits and Systems for Video Technology . 2004,第11期

机译：字节和模可寻址并行存储器架构，用于视频编码
3. Memory allocation method for indirect addressing with ±2{sup}n auto-modification; Access graph; Address allocation; Indirect memory addressing; DSP compiler [J] . Masashi Hori, Nobuhiko Sugino, Akinori Nishihara 電子情報通信学会技術研究報告. 信号処理. Signal Processing . 2002,第545期

机译：具有±2 {sup} n自动修改的间接寻址的内存分配方法;访问图;地址分配;间接内存寻址; DSP编译器
4. Recurrent Memory Addressing for Describing Videos [C] . Arnav Kumar Jain, Abhinav Agarwalla, Kumar Krishna Agrawal, IEEE Conference on Computer Vision and Pattern Recognition Workshops . 2017

机译：用于描述视频的循环内存寻址
5. Identifying Sports Players in Broadcast Videos Using Recurrent and Convolutional Neural Networks [D] . Chan, Alvin. 2018

机译：使用反复和卷积神经网络识别广播视频中的体育运动者
6. Using Videos to Teach Medical Learners How to Address Common Breastfeeding Problems [O] . Kathryn McLeod, Jennifer Waller, Tasha R. Wyatt 2021

机译：使用视频来教导医学学习者如何解决常见的母乳喂养问题
7. Recurrent Memory Addressing for describing videos [O] . Jain, Arnav Kumar, Agarwalla, Abhinav, Agrawal, Kumar Krishna, 2017

机译：用于描述视频的循环内存寻址
8. Quadratic Hadamard Memories II. Adaptive Stochastic Content. Addressable Memory [R] . Loos, H. G. 1990

机译：二次哈达玛记忆II。自适应随机内容。可寻址内存

Recurrent Memory Addressing for Describing Videos

摘要

著录项

相似文献

相关主题

期刊订阅