...
首页> 外文期刊>Computer speech and language >Transfer learning for multimodal dialog
【24h】

Transfer learning for multimodal dialog

机译:转移学习对多模式对话

获取原文
获取原文并翻译 | 示例
           

摘要

Audio-Visual Scene-Aware Dialog (AVSD) is best understood as an extension of Visual Question Answering, the task of generating a textual answer in response to a textual question on multi-media content In AVSD, the answer-relevant "context" is expanded to include past dialog turns, which we view as a specialized form of extra textual knowledge (in addition to the standard video features). We have developed a framework that uses hierarchical attention to fuse contributions from different modalities, and had shown how it can be used to generate textual summaries from multi-modal sources, specifically videos with accompanying commentary. In this paper, we transfer the algorithmic approach, models, and data from this background corpus of 2000 h of how-to videos to the AVSD task, and report our findings. Our approach uses dialog context, but makes no assumption about the ordering of the history. Our system achieves the best performance in both automatic and human evaluations in the 7th Dialog State Tracking Challenge (AVSD).
机译:视听场景感知对话框(AVSD)最好被理解为视觉问题应答的扩展,在AVSD中的多媒体内容上的文本问题中生成文本答案的任务,答案相关的“上下文”是扩展到包含过去的对话框,我们将视为专用形式的额外文本知识(除标准视频功能之外)。我们开发了一个框架,它使用分层关注来自不同模式的保险丝贡献,并展示了它如何用于从多模态源生成文本摘要,特别是具有附带评论的视频。在本文中,我们将算法方法,模型和数据传输到AVSD任务的2000小时的背景语料库,并报告我们的调查结果。我们的方法使用对话框上下文,但没有假设历史记录的排序。我们的系统在第7个对话状态跟踪挑战(AVSD)中实现了自动和人类评估中的最佳性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号