首页> 外文会议>International Conference on Computational Linguistics >Visual-Textual Alignment for Graph Inference in Visual Dialog
【24h】

Visual-Textual Alignment for Graph Inference in Visual Dialog

机译:Visual对话框中的图表推理的视觉文本对齐

获取原文

摘要

As a conversational intelligence task, visual dialog entails answering a series of questions grounded in an image, using the dialog history as context. To generate correct answers, the comprehension of the semantic dependencies among implicit visual and textual contents is critical. Prior works usually ignored the underlying relation and failed to infer it reasonably. In this paper, we propose a Visual-Textual Alignment for Graph Inference (VTAGI) network. Compared with other approaches, it makes up the lack of structural inference in visual dialog. The whole system consists of two modules, Visual and Textual Alignment (VTA) and Visual Graph Attended by Text (VGAT). Specially, the VTA module aims at representing an image with a set of integrated visual regions and corresponding textual concepts, reflecting certain semantics. The VGAT module views the visual features with semantic information as observed nodes and each node learns the relationship with others in visual graph. We also qualitatively and quantitatively evaluate the model on VisDial v1.0 dataset, showing our VTAGI outperforms previous state-of-the-art models.
机译:作为会话智能任务,Visual Dialog需要在使用对话历史记录为上下文时应答在图像中接地的一系列问题。为了生成正确的答案,对隐式视觉和文本内容之间的语义依赖性的理解是至关重要的。事先作品通常忽略了潜在关系,并未合理推断出来。在本文中,我们提出了用于图表推理(VTAGI)网络的视觉文本对齐。与其他方法相比,它弥补了视觉对话中缺乏结构推理。整个系统由两个模块,视觉和文本对齐(VTA)和文本(VGAT)的视觉图组成。特别是,VTA模块旨在表示具有一组集成的视觉区域和相应文本概念的图像,反映了某些语义。 VGAT模块将具有语义信息的视觉功能视为观察到的节点,每个节点在Visual图表中与他人学习关系。我们还定性地和定量地评估了一体的Vandial V1.0数据集的模型,显示了我们的VTAGI优于先前的最先进模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号