Visual Question Answering using Hierarchical Dynamic Memory Networks

机译：使用分层动态内存网络的视觉问答

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Visual Question Answering (VQA) is one of the most popular research fields in machine learning which aims to let the computer learn to answer natural language questions with images. In this paper, we propose a new method called hierarchical dynamic memory networks (HDMN), which takes both question attention and visual attention into consideration impressed by Co-Attention method, which is the best (or among the best) algorithm for now. Additionally, we use bi-directional LSTMs, which have a better capability to remain more information from the question and image, to replace the old unit so that we can capture information from both past and future sentences to be used. Then we rebuild the hierarchical architecture for not only question attention but also visual attention. What's more, we accelerate the algorithm via a new technic called Batch Normalization which helps the network converge more quickly than other algorithms. The experimental result shows that our model improves the state of the art on the large COCO-QA dataset, compared with other methods.

机译：视觉问答（VQA）是机器学习中最受欢迎的研究领域之一，旨在让计算机学会用图像回答自然语言的问题。在本文中，我们提出了一种称为分层动态存储网络（HDMN）的新方法，该方法同时考虑了问题注意和视觉注意，而Co-Attention方法是目前为止最好的（或最好的）算法。另外，我们使用双向LSTM，它具有更好的能力来保留问题和图像中的更多信息，以替换旧单元，以便我们可以从过去和将来要使用的句子中捕获信息。然后，我们重建层次结构，不仅要注意问题，还要注意视觉。更重要的是，我们通过称为批处理规范化的新技术加速了该算法，该技术比其他算法可以更快地收敛网络。实验结果表明，与其他方法相比，我们的模型改进了大型COCO-QA数据集的现有技术。

著录项

来源
《International conference on graphic and image processing》|2017年|106153V.1-106153V.9|共9页
会议地点
作者
Jiayu Shang; Shiren Li; Zhikui Duan; Junwei Huang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
VQA; Bi-directional LSTMs; question attention; visual attention; Batch Normalization;

机译：VQA;双向LSTM;问题关注视觉注意力;批量归一化;

相似文献

外文文献
中文文献
专利

1. Enhanced question understanding with dynamic memory networks for textual question answering [J] . Yue Chunyi, Cao Hanqiang, Xiong Kun, Expert Systems with Application . 2017,第SEPa期

机译：动态内存网络增强了对问题的理解，可用于文本问题解答
2. Long-Term Video Question Answering via Multimodal Hierarchical Memory Attentive Networks [J] . Yu Ting, Yu Jun, Yu Zhou, IEEE Transactions on Circuits and Systems for Video Technology . 2021,第3期

机译：通过多模式分层内存周度网络应答的长期视频问题
3. Long-Form Video Question Answering via Dynamic Hierarchical Reinforced Networks [J] . Zhou Zhao, Zhu Zhang, Shuwen Xiao, IEEE Transactions on Image Processing . 2019,第12期

机译：通过动态分层增强网络进行长视频提问
4. Visual Question Answering using Hierarchical Dynamic Memory Networks [C] . Jiayu Shang, Shiren Li, Zhikui Duan, International Conference on Graphic and Image Processing . 2017

机译：使用分层动态内存网络应答的视觉问题
5. Inferring answer quality, answerer expertise, and ranking in question answer social networks. [D] . Cai, Yuanzhe. 2014

机译：推断回答质量，回答者专业知识以及对问题进行回答的社交网络的排名。
6. An Effective Dense Co-Attention Networks for Visual Question Answering [O] . Shirong He, Dezhi Han 2020

机译：用于视觉问题的有效密集的联合网络
7. Learning Visual Knowledge Memory Networks for Visual Question Answering [O] . Zhou Su, Chen Zhu, Yinpeng Dong, 2018

机译：学习视觉知识记忆网络，用于视觉问题应答

Visual Question Answering using Hierarchical Dynamic Memory Networks

摘要

著录项

相似文献

相关主题

期刊订阅