首页> 外文会议>International conference on graphic and image processing >Visual Question Answering using Hierarchical Dynamic Memory Networks
【24h】

Visual Question Answering using Hierarchical Dynamic Memory Networks

机译:使用分层动态内存网络的视觉问答

获取原文

摘要

Visual Question Answering (VQA) is one of the most popular research fields in machine learning which aims to let the computer learn to answer natural language questions with images. In this paper, we propose a new method called hierarchical dynamic memory networks (HDMN), which takes both question attention and visual attention into consideration impressed by Co-Attention method, which is the best (or among the best) algorithm for now. Additionally, we use bi-directional LSTMs, which have a better capability to remain more information from the question and image, to replace the old unit so that we can capture information from both past and future sentences to be used. Then we rebuild the hierarchical architecture for not only question attention but also visual attention. What's more, we accelerate the algorithm via a new technic called Batch Normalization which helps the network converge more quickly than other algorithms. The experimental result shows that our model improves the state of the art on the large COCO-QA dataset, compared with other methods.
机译:视觉问答(VQA)是机器学习中最受欢迎的研究领域之一,旨在让计算机学会用图像回答自然语言的问题。在本文中,我们提出了一种称为分层动态存储网络(HDMN)的新方法,该方法同时考虑了问题注意和视觉注意,而Co-Attention方法是目前为止最好的(或最好的)算法。另外,我们使用双向LSTM,它具有更好的能力来保留问题和图像中的更多信息,以替换旧单元,以便我们可以从过去和将来要使用的句子中捕获信息。然后,我们重建层次结构,不仅要注意问题,还要注意视觉。更重要的是,我们通过称为批处理规范化的新技术加速了该算法,该技术比其他算法可以更快地收敛网络。实验结果表明,与其他方法相比,我们的模型改进了大型COCO-QA数据集的现有技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号