首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Memory Augmented Deep Recurrent Neural Network for Video Question Answering
【24h】

Memory Augmented Deep Recurrent Neural Network for Video Question Answering

机译:内存增强了用于视频问题的深度经常性神经网络

获取原文
获取原文并翻译 | 示例
           

摘要

Video question answering (VideoQA) is a very important but challenging multimedia task, which automatically analyzes questions and videos and generates accurate answers. However, research on VideoQA is still in its infancy. In this article, we propose a novel memory augmented deep recurrent neural network (MA-DRNN) model for VideoQA, which features a new method for encoding videos and questions, and memory augmentation using the emerging differentiable neural computer (DNC). Specifically, we encode textual (questions) information before visual (videos) information, which leads to better visual-textual representations. Moreover, we leverage DNC (with an external memory) for storing and retrieving useful information in questions and videos, and modeling the long-term visual-textual dependence. To evaluate the proposed model, we conducted extensive experiments using the VTW data set and MSVD-QA data set, which are both Widely used large-scale video data sets for language-level understanding. The experimental results have well validated the proposed model and showed that it outperforms the state-of-the-art in terms of various accuracy-related metrics.
机译:视频问题应答(VideoQA)是一个非常重要但充满挑战的多媒体任务,它会自动分析问题和视频,并产生准确的答案。但是,录像会的研究仍处于初期阶段。在本文中,我们提出了一种新的内存增强深度经常性神经网络(MA-DRNN)模型,用于使用新出现的可微分神经计算机(DNC)来编码视频和问题的新方法,以及内存增强。具体而言,我们在视觉(视频)信息之前编码文本(问题)信息,这导致更好的视觉文本表示。此外,我们利用DNC(带外部存储器)来存储和检索问题和视频中的有用信息,并建立长期视觉文本依赖性。为了评估所提出的模型,我们使用VTW数据集和MSVD-QA数据集进行了广泛的实验,这些实验均为广泛使用的语言级别了解的大型视频数据集。实验结果良好地验证了所提出的模型,并表明它在各种与精度相关的指标方面优于最先进的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号