首页> 外文会议>International Conference on Information Fusion >Multimodal Fusion with Co-attention Mechanism
【24h】

Multimodal Fusion with Co-attention Mechanism

机译:具有共同注意机制的多峰融合

获取原文

摘要

Because the information from different modalities will complement each other when describing the same contents, multimodal information can be used to obtain better feature representations. Thus, how to represent and fuse the relevant information has become a current research topic. At present, most of the existing feature fusion methods consider the different levels of features representations, but they ignore the significant relevance between the local regions, especially in the high-level semantic representation. In this paper, a general multimodal fusion method based on the co-attention mechanism is proposed, which is similar to the transformer structure. We discuss two main issues: (1) Improving the applicability and generality of the transformer to different modal data; (2) By capturing and transmitting the relevant information between local features before fusion, the proposed method can allow for more robustness. We evaluate our model on the multimodal classification task, and the experiments demonstrate that our model can learn fused featnre representation effectively.
机译:由于来自不同模态的信息在描述相同内容时会相互补充,因此多模态信息可用于获得更好的特征表示。因此,如何表示和融合相关信息已成为当前的研究课题。当前,大多数现有的特征融合方法考虑了不同级别的特征表示,但是它们忽略了局部区域之间的显着相关性,尤其是在高级语义表示中。本文提出了一种基于共同注意机制的通用多峰融合方法,该方法与变压器结构相似。我们讨论了两个主要问题:(1)提高变压器对不同模态数据的适用性和通用性; (2)通过在融合之前在局部特征之间捕获和传输相关信息,所提出的方法可以提供更高的鲁棒性。我们在多模式分类任务上评估了我们的模型,实验表明我们的模型可以有效地学习融合的特征表示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号