The World in My Mind: Visual Dialog with Adversarial Multi-modal Feature Encoding

机译：我脑海中的世界：对对抗的对话对话对话进行对抗多模态特征编码

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Visual Dialog is a multi-modal task that requires a model to participate in a multi-turn human dialog grounded on an image, and generate correct, human-like responses. In this paper, we propose a novel Adversarial Multi-modal Feature Encoding (AMFE) framework for effective and robust auxiliary training of visual dialog systems. AMFE can force the language-encoding part of a model to generate hidden states in a distribution closely related to the distribution of real-world images, resulting in language features containing general knowledge from both modalities by nature, which can help generate both more correct and more general responses with reasonably low time cost. Experimental results show that AMFE can steadily bring performance gains lo different models on different scales of data. Our method outperforms both the supervised learning baselines and other fine-tuning methods, achieving state-of-the-art results on most metrics of VisDial v0.5/v0.9 generative tasks.

机译：Visual对话框是一种多模态任务，需要一个模型参与在图像上接地的多转员对话，并生成正确的人类响应。在本文中，我们为视觉对话系统的有效和鲁棒辅助训练提出了一种新的对抗性多模态特征编码（AMFE）框架。 AMFE可以强制编码模型的语言部分，以在与现实世界图像分布密切相关的分发中生成隐藏状态，导致大自然中均有一般知识的语言特征，这可以帮助生成更正确的更一般的反应，具有相当低的时间成本。实验结果表明，AMFE可以在不同的数据尺度上稳步实现性能获益不同模型。我们的方法优于受监督的学习基线和其他微调方法，实现最先进的结果，在大多数vidial v0.5 / v0.9生成任务中的大多数度量。

著录项

来源
《Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies》|2019年|xciii p. 2102-2798|共11页
会议地点
作者
Yiqun Yao; Jiaming Xu; Bo Xu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Multi-Modal fusion with multi-level attention for Visual Dialog [J] . Jingping Zhang, Qiang Wang, Yahong Han Information Processing & Management . 2020,第4期

机译：多级别融合，对视觉对话的多级关注
2. Unsupervised discriminative feature representation via adversarial auto-encoder [J] . Guo Wenzhong, Cai Jinyu, Wang Shiping Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2020,第4期

机译：通过对抗自动编码器的无监督歧视特征表示
3. Using Multi-Modal Semantic Association Rules to fuse keywords and visual features automatically for Web image retrieval [J] . Ruhan He, Naixue Xiong, Laurence T. Yang, Information Fusion . 2011,第3期

机译：使用多模式语义关联规则自动融合关键字和视觉功能以进行Web图像检索
4. The World in My Mind: Visual Dialog with Adversarial Multi-modal Feature Encoding [C] . Yiqun Yao, Jiaming Xu, Bo Xu Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2019

机译：我的世界：具有对抗性多模式特征编码的可视对话框
5. A Bi-encoder LSTM Model for Learning Unstructured Dialogs [D] . Shekhar, Diwanshu. 2018

机译：用于学习非结构化对话框的双编码器LSTM模型
6. MTBI Identification From Diffusion MR Images Using Bag of Adversarial Visual Features [O] . Shervin Minaee, Yao Wang, Alp Aygar, -1

机译：使用对抗性视觉特征袋从扩散MR图像识别MTBI
7. GuessWhat?! Visual object discovery through multi-modal dialogue [O] . De Vries, Harm, Strub, Florian, Chandar, Sarath, 2017

机译：你猜怎么了？！通过多模式对话发现视觉对象

The World in My Mind: Visual Dialog with Adversarial Multi-modal Feature Encoding

摘要

著录项

相似文献

相关主题

期刊订阅