Spatio-Temporal Context Networks for Video Question Answering

机译：视频问答的时空上下文网络

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Video Question Answering (Video QA) is one of the important and challenging problems in multimedia and computer vision research. In this paper, we propose a novel framework, called spatio-temporal context networks (STCN). This framework uses long short term memory networks (LSTM) to encode spatial and temporal information of videos, then initializes language model by the encoded visual features. Based on the visual and semantic features, we can get an appropriate answer. In particular, in this STCN framework, we effectively fuse optical flow to capture more discriminative motion information of videos. In order to verify the effectiveness of the proposed framework, we conduct experiments on TACoS dataset. It achieves good performances on both hard level and easy level of TACoS dataset.

机译：视频问答（Video QA）是多媒体和计算机视觉研究中重要且具有挑战性的问题之一。在本文中，我们提出了一个新颖的框架，称为时空上下文网络（STCN）。该框架使用长期短期记忆网络（LSTM）对视频的时空信息进行编码，然后通过编码后的视觉特征初始化语言模型。根据视觉和语义特征，我们可以获得适当的答案。特别是，在这种STCN框架中，我们有效地融合了光流，以捕获更多具有区别性的视频运动信息。为了验证所提出框架的有效性，我们在TACoS数据集上进行了实验。它在TACoS数据集的硬性级别和简易性级别上均具有良好的性能。

著录项

来源
《Pacific-Rim conference on multimedia》|2018年|108-118|共11页
会议地点
作者
Kun Gao; Yahong Han;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Spatial and temporal information; Language model; Optical flow;

机译：时空信息;语言模型;光流;

相似文献

外文文献
中文文献
专利

1. Video Question Answering with Spatio-Temporal Reasoning [J] . Jang Yunseok, Song Yale, Kim Chris Dongjoo, International Journal of Computer Vision . 2019,第10期

机译：视频问题用时空推理回答
2. Uncovering the Temporal Context for Video Question Answering [J] . Zhu Linchao, Xu Zhongwen, Yang Yi, International Journal of Computer Vision . 2017,第3期

机译：揭示视频问题应答的时间上下文
3. Long-Term Video Question Answering via Multimodal Hierarchical Memory Attentive Networks [J] . Yu Ting, Yu Jun, Yu Zhou, IEEE Transactions on Circuits and Systems for Video Technology . 2021,第3期

机译：通过多模式分层内存周度网络应答的长期视频问题
4. Spatio-Temporal Context Networks for Video Question Answering [C] . Kun Gao, Yahong Han Pacific-Rim Conference on Multimedia . 2018

机译：视频问题应答的时空上下文网络
5. Inferring answer quality, answerer expertise, and ranking in question answer social networks. [D] . Cai, Yuanzhe. 2014

机译：推断回答质量，回答者专业知识以及对问题进行回答的社交网络的排名。
6. The potential for automated question answering in the context of genomic medicine: an assessment of existing resources and properties of answers [O] . Casey Lynnette Overby, Peter Tarczy-Hornoch, Dina Demner-Fushman 2009

机译：基因组医学背景下自动问答的潜力：对现有资源和答案属性的评估
7. Multi-Turn Video Question Answering via Multi-Stream Hierarchical Attention Context Network [O] . Zhou Zhao, Xinghua Jiang, Deng Cai, 2018

机译：通过多流分层关注上下文网络回答多匝视频问题
8. First Steps Toward Linking Dialogues: Mediating Between Free-text Questions and Pre-recorded Video Answers [R] . Gandhe, S. , Gordon, A. , Leuski, A. , 2004

机译：连接对话的第一步：在自由文本问题和预先录制的视频答案之间进行调解

Spatio-Temporal Context Networks for Video Question Answering

摘要

著录项

相似文献

相关主题

期刊订阅