Object-difference drived graph convolutional networks for visual question answering

Zhu Xi; Mao Zhendong; Chen Zhineng; Li Yangyang; Wang Zhaohui; Wang Bin

首页> 外文期刊>Multimedia Tools and Applications >Object-difference drived graph convolutional networks for visual question answering

【24h】

Object-difference drived graph convolutional networks for visual question answering

机译：对象差异驱动的图表卷积网络，用于视觉问题应答

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Visual Question Answering(VQA), an important task to evaluate the cross-modal understanding capability of an Artificial Intelligence model, has been a hot research topic in both computer vision and natural language processing communities. Recently, graph-based models have received growing interest in VQA, for its potential of modeling the relationships between objects as well as its formidable interpretability. Nonetheless, those solutions mainly define the similarity between objects as their semantical relationships, while largely ignoring the critical point that the difference between objects can provide more information for establishing the relationship between nodes in the graph. To achieve this, we propose an object-difference based graph learner, which learns question-adaptive semantic relations by calculating inter-object difference under the guidance of questions. With the learned relationships, the input image can be represented as an object graph encoded with structural dependencies between objects. In addition, existing graph-based models leverage the pre-extracted object boxes by the object detection model as node features for convenience, but they are suffering from the redundancy problem. To reduce the redundant objects, we introduce a soft-attention mechanism to magnify the question-related objects. Moreover, we incorporate our object-difference based graph learner into the soft-attention based Graph Convolutional Networks to capture question-specific objects and their interactions for answer prediction. Our experimental results on the VQA 2.0 dataset demonstrate that our model gives significantly better performance than baseline methods.

机译：视觉问题应答（VQA），评估人工智能模型的跨模型理解能力的重要任务，这是计算机视觉和自然语言处理社区的热门研究主题。最近，基于图形的模型已经获得了对VQA的兴趣越来越感兴趣，因为它可能对物体之间的关系建模的可能性以及其强大的解释性。尽管如此，这些解决方案主要定义对象之间的相似性作为其语义关系，同时在很大程度上忽略了对象之间的差异可以提供更多信息，以便在图中建立节点之间的关系的更多信息。为实现这一目标，我们提出了一种基于对象差异的图形学习者，其通过在问题的指导下计算对象差异来学习问题自适应语义关系。利用学习的关系，输入图像可以表示为以对象之间的结构依赖性编码的对象图。此外，现有的基于图形的模型利用对象检测模型作为节点特征利用预提取的对象框，为方便起见，但它们遭受冗余问题。为了减少冗余对象，我们引入了一种柔软的注意机制来放大问题相关的对象。此外，我们将基于对象差异的图形学习者纳入了基于软关注的图形卷积网络，以捕获特定于质疑的对象及其交互以进行答案预测。我们在VQA 2.0数据集上的实验结果表明，我们的模型比基线方法显着提高了更好的性能。

著录项

来源
《Multimedia Tools and Applications》 |2021年第11期|16247-16265|共19页
作者
Zhu Xi; Mao Zhendong; Chen Zhineng; Li Yangyang; Wang Zhaohui; Wang Bin;
展开▼
作者单位

Chinese Acad Sci Inst Informat Engn Beijing Peoples R China|Univ Chinese Acad Sci Sch Cyber Secur Beijing Peoples R China;

Univ Sci & Technol China Hefei Peoples R China;

Chinese Acad Sci Inst Automat Beijing Peoples R China;

China Acad Elect & Informat Technol Beijing Peoples R China;

Chinese Acad Sci Inst Informat Engn Beijing Peoples R China|Univ Chinese Acad Sci Sch Cyber Secur Beijing Peoples R China;

Xiaomi Inc Xiaomi AI Lab Beijing Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Visual question answering; Graph convolutional networks; Object-difference;

机译：视觉问题应答;图形卷积网络;对象区别;

相似文献

外文文献
中文文献
专利

1. Predicting closed questions on community question answering sites using convolutional neural network [J] . Neural computing & applications . 2020,第14期

机译：预测使用卷积神经网络的社区问题应答网站的已关闭问题
2. Visual question answering model based on graph neural network and contextual attention [J] . Sharma Himanshu, Jalal Anand Singh Image and Vision Computing . 2021,第Juna期

机译：基于图形神经网络和语境关注的视觉问题应答模型
3. Recurrent convolutional neural network for answer selection in community question answering [J] . Zhou Xiaoqiang, Hu Baotian, Chen Qingcai, Neurocomputing . 2018,第jana24期

机译：循环卷积神经网络用于社区问答中的答案选择
4. Graph Convolutional Network for Visual Question Answering Based on Fine-grained Question Representation [C] . Ze Hu, Jielong Wei, Qingbao Huang, IEEE International Conference on Data Science in Cyberspace . 2020

机译：基于细粒度问题表示的图卷积网络用于视觉问题回答
5. Inferring answer quality, answerer expertise, and ranking in question answer social networks. [D] . Cai, Yuanzhe. 2014

机译：推断回答质量，回答者专业知识以及对问题进行回答的社交网络的排名。
6. An Effective Dense Co-Attention Networks for Visual Question Answering [O] . Shirong He, Dezhi Han 2020

机译：用于视觉问题的有效密集的联合网络
7. Visual Question Answering using Convolutional Neural Networks [O] . K. P. Moholkar Et. al. 2021

机译：使用卷积神经网络回答视觉问题

Object-difference drived graph convolutional networks for visual question answering

摘要

著录项

相似文献

相关主题

期刊订阅