Refocused Attention: Long Short-Term Rewards Guided Video Captioning

Jiarong Dong; Ke Gao; Xiaokai Chen; Juan Cao

首页> 外文期刊>Neural processing letters >Refocused Attention: Long Short-Term Rewards Guided Video Captioning

【24h】

Refocused Attention: Long Short-Term Rewards Guided Video Captioning

机译：重新转移注意：长期短期奖励引导视频标题

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The adaptive cooperation of visual model and language model is essential for video captioning. However, due to the lack of proper guidance for each time step in end-to-end training, the over-dependence of language model often results in the invalidation of attention-based visual model, which is called 'Attention Defocus' problem in this paper. Based on an important observation that the recognition precision of entity word can reflect the effectiveness of the visual model, we propose a novel strategy called refocused attention to optimize the training and cooperating of visual model and language model, using ingenious guidance at appropriate time step. The strategy consists of a short-term-reward guided local entity recognition and a long-term-reward guided global relation understanding, neither requires any external training data. Moreover, a framework with hierarchical visual representations and hierarchical attention is established to fully exploit the potential strength of the proposed learning strategy. Extensive experiments demonstrate that the ingenious guidance strategy together with the optimized structure outperform state-of-the-art video captioning methods with relative improvements 7.7% in BLEU-4 and 5.0% in CIDEr-D on MSVD dataset, even without multi-modal features.

机译：视觉模型和语言模型的自适应合作对于视频字幕至关重要。但是，由于每次训练中缺乏适当的指导，语言模型的过度依赖性导致关注的视觉模型的无效，这被称为“注意Defocus”问题纸。基于一个重要观察，实体词的识别精度可以反映视觉模型的有效性，我们提出了一种称为分叉注意的新型战略，以优化视觉模型和语言模型的培训和协作，在适当的时间步骤中使用巧妙的指导。该策略包括短期奖励引导的地方实体认可和长期奖励引导的全球关系理解，既不需要任何外部培训数据。此外，建立了具有分层视觉表示和分层关注的框架，以充分利用所提出的学习策略的潜在强度。广泛的实验表明，巧妙的指导策略与优化的结构优于最先进的视频字幕方法，即使没有多模态特征，在MSVD数据集中的BLE-4中的相对改善和5.0％的最先进的视频标题方法。。

著录项

来源
《Neural processing letters》 |2020年第2期|935-948|共14页
作者
Jiarong Dong; Ke Gao; Xiaokai Chen; Juan Cao;
展开▼
作者单位

Institute of Computing Technology Chinese Academy of Sciences Beijing China University of Chinese Academy of Sciences Beijing China;

Institute of Computing Technology Chinese Academy of Sciences Beijing China;

Institute of Computing Technology Chinese Academy of Sciences Beijing China University of Chinese Academy of Sciences Beijing China;

Institute of Computing Technology Chinese Academy of Sciences Beijing China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Video captioning; Hierarchical attention; Reinforcement learning; Reward;

机译：视频标题;分层注意;强化学习;报酬;

相似文献

外文文献
中文文献
专利

1. Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory [J] . Cao Pengfei, Yang Zhongyi, Sun Liang, Neural processing letters . 2019,第1期

机译：基于双向语义注意的长短期记忆引导图像字幕
2. Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory [J] . Cao Pengfei, Yang Zhongyi, Sun Liang, Neural processing letters . 2019,第1期

机译：具有双向语义关注的长短期记忆的图像标题
3. Multi-guiding long short-term memory for video captioning [J] . Xu Ning, Liu An-An, Nie Weizhi, Multimedia Systems . 2019,第6期

机译：用于视频字幕的多向导长短期记忆
4. Motion Guided Spatial Attention for Video Captioning [C] . Shaoxiang Chen, Yu-Gang Jiang AAAI Conference on Artificial Intelligence . 2019

机译：运动导致视频标题的空间注意
5. Attention-guided Algorithms to Retarget and Augment Animations, Stills, and Videos. [D] . Jain, Eakta. 2012

机译：注意力导向算法可重新定向和增强动画，剧照和视频的目标。
6. The Neural Underpinnings of How Reward Associations Can Both Guide and Misguide Attention [O] . Ruth M. Krebs, Carsten N. Boehler, Tobias Egner, 2011

机译：奖励协会如何引导和误导注意力的神经基础
7. Motion Guided Spatial Attention for Video Captioning [O] . Shaoxiang Chen, Yu-Gang Jiang 2019

机译：运动引导视频标题的空间注意

Refocused Attention: Long Short-Term Rewards Guided Video Captioning

摘要

著录项

相似文献

相关主题

期刊订阅