首页> 外文会议>International Conference on Computational Linguistics >RIVA: A Pre-trained Tweet Multimodal Model Based on Text-image Relation for Multimodal NER
【24h】

RIVA: A Pre-trained Tweet Multimodal Model Based on Text-image Relation for Multimodal NER

机译:RIVA:基于多模式内的文本图像关系的预训练推文多峰模型

获取原文

摘要

Multimodal named entity recognition (MNER) for tweets has received increasing attention recently. Most of the multimodal methods used attention mechanisms to capture the text-related visual information. However, unrelated or weakly related text-image pairs account for a large proportion in tweets. Visual clues unrelated to the text would incur uncertain or even negative effects for multimodal model learning. In this paper, we propose a novel pre-trained multimodal model based on Relationship Inference and Visual Attention (RIVA) for tweets. The RIVA model controls the attention-based visual clues with a gate regarding the role of image to the semantics of text. We use a teacher-student semi-supervised paradigm to leverage a large unlabeled multimodal tweet corpus with a labeled data set for text-image relation classification. In the multimodal NER task, the experimental results show the significance of text-related visual features for the visual-linguistic model and our approach achieves SOTA performance on the MNER datasets.
机译:多峰名实体识别(MNER)对于推文最近受到了越来越关注。大多数多模型方法使用注意力机制来捕获与文本相关的可视信息。但是,无关或弱相关的文本图像对占推特中的很大比例。与文本无关的视觉线条将对多式式模型学习产生不确定甚至负面影响。在本文中,我们提出了一种基于关系推断和视觉关注(RIVA)的新型预训练的多模型模型。 RIVA模型控制基于关注的视觉线索与关于图像的角色到文本的语义的角色。我们使用教师学生半监督范式来利用具有标记为文本图像关系分类的标记数据集的大型未标记的多模式推文语料库。在多模式ner任务中,实验结果表明了与视觉语言模型的文本相关的视觉特征的意义,我们的方法在MNER数据集中实现了SOTA性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号