...
首页> 外文期刊>Multimedia Tools and Applications >Object-difference drived graph convolutional networks for visual question answering
【24h】

Object-difference drived graph convolutional networks for visual question answering

机译:对象差异驱动的图表卷积网络,用于视觉问题应答

获取原文
获取原文并翻译 | 示例
           

摘要

Visual Question Answering(VQA), an important task to evaluate the cross-modal understanding capability of an Artificial Intelligence model, has been a hot research topic in both computer vision and natural language processing communities. Recently, graph-based models have received growing interest in VQA, for its potential of modeling the relationships between objects as well as its formidable interpretability. Nonetheless, those solutions mainly define the similarity between objects as their semantical relationships, while largely ignoring the critical point that the difference between objects can provide more information for establishing the relationship between nodes in the graph. To achieve this, we propose an object-difference based graph learner, which learns question-adaptive semantic relations by calculating inter-object difference under the guidance of questions. With the learned relationships, the input image can be represented as an object graph encoded with structural dependencies between objects. In addition, existing graph-based models leverage the pre-extracted object boxes by the object detection model as node features for convenience, but they are suffering from the redundancy problem. To reduce the redundant objects, we introduce a soft-attention mechanism to magnify the question-related objects. Moreover, we incorporate our object-difference based graph learner into the soft-attention based Graph Convolutional Networks to capture question-specific objects and their interactions for answer prediction. Our experimental results on the VQA 2.0 dataset demonstrate that our model gives significantly better performance than baseline methods.
机译:视觉问题应答(VQA),评估人工智能模型的跨模型理解能力的重要任务,这是计算机视觉和自然语言处理社区的热门研究主题。最近,基于图形的模型已经获得了对VQA的兴趣越来越感兴趣,因为它可能对物体之间的关系建模的可能性以及其强大的解释性。尽管如此,这些解决方案主要定义对象之间的相似性作为其语义关系,同时在很大程度上忽略了对象之间的差异可以提供更多信息,以便在图中建立节点之间的关系的更多信息。为实现这一目标,我们提出了一种基于对象差异的图形学习者,其通过在问题的指导下计算对象差异来学习问题自适应语义关系。利用学习的关系,输入图像可以表示为以对象之间的结构依赖性编码的对象图。此外,现有的基于图形的模型利用对象检测模型作为节点特征利用预提取的对象框,为方便起见,但它们遭受冗余问题。为了减少冗余对象,我们引入了一种柔软的注意机制来放大问题相关的对象。此外,我们将基于对象差异的图形学习者纳入了基于软关注的图形卷积网络,以捕获特定于质疑的对象及其交互以进行答案预测。我们在VQA 2.0数据集上的实验结果表明,我们的模型比基线方法显着提高了更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号