Visual-Textual Alignment for Graph Inference in Visual Dialog

机译：Visual对话框中的图表推理的视觉文本对齐

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

As a conversational intelligence task, visual dialog entails answering a series of questions grounded in an image, using the dialog history as context. To generate correct answers, the comprehension of the semantic dependencies among implicit visual and textual contents is critical. Prior works usually ignored the underlying relation and failed to infer it reasonably. In this paper, we propose a Visual-Textual Alignment for Graph Inference (VTAGI) network. Compared with other approaches, it makes up the lack of structural inference in visual dialog. The whole system consists of two modules, Visual and Textual Alignment (VTA) and Visual Graph Attended by Text (VGAT). Specially, the VTA module aims at representing an image with a set of integrated visual regions and corresponding textual concepts, reflecting certain semantics. The VGAT module views the visual features with semantic information as observed nodes and each node learns the relationship with others in visual graph. We also qualitatively and quantitatively evaluate the model on VisDial v1.0 dataset, showing our VTAGI outperforms previous state-of-the-art models.

机译：作为会话智能任务，Visual Dialog需要在使用对话历史记录为上下文时应答在图像中接地的一系列问题。为了生成正确的答案，对隐式视觉和文本内容之间的语义依赖性的理解是至关重要的。事先作品通常忽略了潜在关系，并未合理推断出来。在本文中，我们提出了用于图表推理（VTAGI）网络的视觉文本对齐。与其他方法相比，它弥补了视觉对话中缺乏结构推理。整个系统由两个模块，视觉和文本对齐（VTA）和文本（VGAT）的视觉图组成。特别是，VTA模块旨在表示具有一组集成的视觉区域和相应文本概念的图像，反映了某些语义。 VGAT模块将具有语义信息的视觉功能视为观察到的节点，每个节点在Visual图表中与他人学习关系。我们还定性地和定量地评估了一体的Vandial V1.0数据集的模型，显示了我们的VTAGI优于先前的最先进模型。

著录项

来源
《International Conference on Computational Linguistics》|2020年|1874-1885|共12页
会议地点
作者
Tianling Jiang; Ji Yi; Chunping Liu; Hailin Shao;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. MAVA: Multi-Level Adaptive Visual-Textual Alignment by Cross-Media Bi-Attention Mechanism [J] . Peng Yuxin, Qi Jinwei, Zhuo Yunkan IEEE Transactions on Image Processing . 2020,第期

机译：Mava：跨媒体双关注机制的多级自适应视觉校准
2. MGRL: Graph neural network based inference in a Markov network with reinforcement learning for visual navigation [J] . Lu Yi, Chen Yaran, Zhao Dongbin, Neurocomputing . 2021,第Jana15期

机译：MGRL：图形基于Markov网络的神经网络推断，用于视觉导航的强化学习
3. Data visualization for inference in tomographic brain imaging [J] . The European Journal of Neuroscience . 2020,第3a4期

机译：断层脑成像推断的数据可视化
4. Multimodal Logical Inference System for Visual-Textual Entailment [C] . Riko Suzuki, Hitomi Yanaka, Masashi Yoshikawa, Annual meeting of the Association for Computational Linguistics . 2019

机译：视觉文本蕴涵的多模式逻辑推理系统
5. Symbolic and Visual Retrieval of Mathematical Notation using Formula Graph Symbol Pair Matching and Structural Alignment [D] . Castellanos, Kenny Davila. 2017

机译：使用公式图符号对匹配和结构对齐的数学符号的符号和视觉检索
6. Novel Models of Visual Topographic Map Alignment in the Superior Colliculus [O] . Ruben A. Tikidji-Hamburyan, Tarek A. El-Ghazawi, Jason W. Triplett 2016

机译：上丘囊膜视觉地形图对齐的新模型
7. Cross-modal video moment retrieval based on visual-textual relationship alignment [O] . Tong XU, Hao DU, Enhong CHEN, 2020

机译：基于视觉文本关系对齐的跨模型视频时刻检索
8. Scalable Inference and Learning in Very Large Graphical Models Patterned after the Primate Visual Cortex [R] . Dean, T. 2008

机译：在灵长类动态视觉皮层之后图案化的超大图形模型中的可扩展推理和学习

Visual-Textual Alignment for Graph Inference in Visual Dialog

摘要

著录项

相似文献

相关主题

期刊订阅