首页> 外文会议>Pacific-Rim conference on multimedia >Spatio-Temporal Context Networks for Video Question Answering
【24h】

Spatio-Temporal Context Networks for Video Question Answering

机译:视频问答的时空上下文网络

获取原文

摘要

Video Question Answering (Video QA) is one of the important and challenging problems in multimedia and computer vision research. In this paper, we propose a novel framework, called spatio-temporal context networks (STCN). This framework uses long short term memory networks (LSTM) to encode spatial and temporal information of videos, then initializes language model by the encoded visual features. Based on the visual and semantic features, we can get an appropriate answer. In particular, in this STCN framework, we effectively fuse optical flow to capture more discriminative motion information of videos. In order to verify the effectiveness of the proposed framework, we conduct experiments on TACoS dataset. It achieves good performances on both hard level and easy level of TACoS dataset.
机译:视频问答(Video QA)是多媒体和计算机视觉研究中重要且具有挑战性的问题之一。在本文中,我们提出了一个新颖的框架,称为时空上下文网络(STCN)。该框架使用长期短期记忆网络(LSTM)对视频的时空信息进行编码,然后通过编码后的视觉特征初始化语言模型。根据视觉和语义特征,我们可以获得适当的答案。特别是,在这种STCN框架中,我们有效地融合了光流,以捕获更多具有区别性的视频运动信息。为了验证所提出框架的有效性,我们在TACoS数据集上进行了实验。它在TACoS数据集的硬性级别和简易性级别上均具有良好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号