首页> 外文会议>Conference on empirical methods in natural language processing >Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions
【24h】

Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions

机译:VQA中的问题相关性:识别非视觉和错误前提问题

获取原文

摘要

Visual Question Answering (VQA) is the task of answering natural-language questions about images. We introduce the novel problem of determining the relevance of questions to images in VQA. Current VQA models do not reason about whether a question is even related to the given image (e.g., What is the capital of Argentina?) or if it requires information from external resources to answer correctly. This can break the continuity of a dialogue in human-machine interaction. Our approaches for determining relevance are composed of two stages. Given an image and a question, (1) we first determine whether the question is visual or not, (2) if visual, we determine whether the question is relevant to the given image or not. Our approaches, based on LSTM-RNNs, VQA model uncertainty, and caption-question similarity, are able to outperform strong baselines on both relevance tasks. We also present human studies showing that VQA models augmented with such question relevance reasoning are perceived as more intelligent, reasonable, and human-like.
机译:视觉问题应答(VQA)是回答有关图像的自然语言问题的任务。我们介绍了确定对VQA中的图像的相关性的新问题。目前的VQA模型不会有所理由是一个问题甚至与给定图像有关(例如,阿根廷的资本是什么?)或者如果它需要从外部资源的信息正确回答。这可以打破人机交互中对话的连续性。我们确定相关性的方法由两个阶段组成。给定图像和问题,(1)我们首先确定问题是否是视觉上的,(2)如果是视觉,我们确定问题是否与给定的图像相关。我们的方法基于LSTM-RNNS,VQA模型不确定性和标题 - 问题相似性,能够在相关任务中优于强大的基线。我们还提出了人类研究,表明VQA模型增强了这些问题相关推理被认为更聪明,合理和人类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号