...
首页> 外文期刊>Computer vision and image understanding >Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?
【24h】

Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

机译:视觉问题解答中的人类注意力:人类和深层网络是否看待同一地区?

获取原文
获取原文并翻译 | 示例
           

摘要

We conduct large-scale studies on 'human attention' in Visual Question Answering (VQA) to understand where humans choose to look to answer questions about images. We design and test multiple game-inspired novel attention-annotation interfaces that require the subject to sharpen regions of a blurred image to answer a question. Thus, we introduce the VQA-HAT (Human ATtention) dataset. We evaluate attention maps generated by state-of-the-art VQA models against human attention both qualitatively (via visualizations) and quantitatively (via rank-order correlation). Our experiments show that current attention models in VQA do not seem to be looking at the same regions as humans. Finally, we train VQA models with explicit attention supervision, and find that it improves VQA performance.
机译:我们在视觉问题解答(VQA)中对“人类注意力”进行了大规模研究,以了解人类在哪里选择寻找图像问题的答案。我们设计并测试了多个受游戏启发的新颖的注意-注释界面,这些界面要求对象锐化模糊图像的区域以回答问题。因此,我们介绍了VQA-HAT(人类注意力)数据集。我们通过定性(通过可视化)和定量(通过等级相关)评估由最新VQA模型生成的针对人类注意力的注意力图。我们的实验表明,VQA中当前的注意力模型似乎并没有关注与人类相同的区域。最后,我们通过显式的注意力监督训练VQA模型,发现它可以提高VQA性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号