首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Generating Question Relevant Captions to Aid Visual Question Answering
【24h】

Generating Question Relevant Captions to Aid Visual Question Answering

机译:生成与问题相关的标题以辅助视觉问题解答

获取原文

摘要

Visual question answering (VQA) and image captioning require a shared body of general knowledge connecting language and vision. We present a novel approach to improve VQA performance that exploits this connection by jointly generating captions that are targeted to help answer a specific visual question. The model is trained using an existing caption dataset by automatically determining question-relevant captions using an online gradient-based method. Experimental results on the VQA v2 challenge demonstrates that our approach obtains state-of-the-art VQA performance (e.g. 68.4% on the Test-standard set using a single model) by simultaneously generating question-relevant captions.
机译:视觉问题解答(VQA)和图像字幕需要连接语言和视觉的共同知识体系。我们提出了一种改进VQA性能的新颖方法,该方法通过联合生成旨在帮助回答特定视觉问题的字幕来利用此连接。通过使用基于在线梯度的方法自动确定与问题相关的字幕,使用现有的字幕数据集对模型进行训练。针对VQA v2挑战的实验结果表明,我们的方法通过同时生成与问题相关的标题,获得了最先进的VQA性能(例如,使用单个模型的测试标准集为68.4%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号