Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

Abhishek Das; Harsh Agrawal; Larry Zitnick; Devi Parikh; Dhruv Batra

首页> 外文期刊>Computer vision and image understanding >Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

【24h】

Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

机译：视觉问题解答中的人类注意力：人类和深层网络是否看待同一地区？

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We conduct large-scale studies on 'human attention' in Visual Question Answering (VQA) to understand where humans choose to look to answer questions about images. We design and test multiple game-inspired novel attention-annotation interfaces that require the subject to sharpen regions of a blurred image to answer a question. Thus, we introduce the VQA-HAT (Human ATtention) dataset. We evaluate attention maps generated by state-of-the-art VQA models against human attention both qualitatively (via visualizations) and quantitatively (via rank-order correlation). Our experiments show that current attention models in VQA do not seem to be looking at the same regions as humans. Finally, we train VQA models with explicit attention supervision, and find that it improves VQA performance.

机译：我们在视觉问题解答（VQA）中对“人类注意力”进行了大规模研究，以了解人类在哪里选择寻找图像问题的答案。我们设计并测试了多个受游戏启发的新颖的注意-注释界面，这些界面要求对象锐化模糊图像的区域以回答问题。因此，我们介绍了VQA-HAT（人类注意力）数据集。我们通过定性（通过可视化）和定量（通过等级相关）评估由最新VQA模型生成的针对人类注意力的注意力图。我们的实验表明，VQA中当前的注意力模型似乎并没有关注与人类相同的区域。最后，我们通过显式的注意力监督训练VQA模型，发现它可以提高VQA性能。

著录项

来源
《Computer vision and image understanding》 |2017年第10期|90-100|共11页
作者
Abhishek Das; Harsh Agrawal; Larry Zitnick; Devi Parikh; Dhruv Batra;
展开▼
作者单位

Georgia Institute of Technology, Atlanta, GA, USA;

Virginia Tech, Blacksburg, VA, USA;

Facebook AI Research, Menlo Park, CA, USA;

Georgia Institute of Technology, Atlanta, GA, USA,Facebook AI Research, Menlo Park, CA, USA;

Georgia Institute of Technology, Atlanta, GA, USA,Facebook AI Research, Menlo Park, CA, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Visual Question Answering; Attention;

机译：视觉问答注意;

相似文献

外文文献
中文文献
专利

1. Word-to-region attention network for visual question answering [J] . Peng Liang, Yang Yang, Bin Yi, Multimedia Tools and Applications . 2019,第3期

机译：单词到区域的注意力网络，用于视觉提问
2. Multi-Tier Attention Network using Term-weighted Question Features for Visual Question Answering [J] . Manmadhan Sruthy, Kovoor Binsu C. Image and Vision Computing . 2021,第Nova期

机译：使用术语加权问题的多层关注网络，用于视觉问题应答
3. Deep Multimodal Reinforcement Network with Contextually Guided Recurrent Attention for Image Question Answering [J] . Ai-Wen Jiang, Bo Liu, Ming-Wen Wang 计算机科学技术学报（英文版） . 2017,第004期

机译：深度多模态强化网络，具有上下文指导的循环注意力，可回答图像问题
4. Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions? [C] . Abhishek Das, Harsh Agrawal, C. Lawrence Zitnick, Conference on empirical methods in natural language processing . 2016

机译：视觉问题解答中的人类注意力：人类和深层网络是否看待同一地区？
5. Leveraging Human Reasoning to Understand and Improve Visual Question Answering [D] . Ayyubi, Hammad Abdullah. 2020

机译：利用人类推理来理解和改进视觉问题的回答
6. An Effective Dense Co-Attention Networks for Visual Question Answering [O] . Shirong He, Dezhi Han 2020

机译：用于视觉问题的有效密集的联合网络
7. Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions? [O] . Das, Abhishek, Agrawal, Harsh, Zitnick, C. Lawrence, 2016

机译：视觉问题答疑中的人文关注：做人与人网络看同一个地区？

Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

摘要

著录项

相似文献

相关主题

期刊订阅