RIVA: A Pre-trained Tweet Multimodal Model Based on Text-image Relation for Multimodal NER

机译：RIVA：基于多模式内的文本图像关系的预训练推文多峰模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Multimodal named entity recognition (MNER) for tweets has received increasing attention recently. Most of the multimodal methods used attention mechanisms to capture the text-related visual information. However, unrelated or weakly related text-image pairs account for a large proportion in tweets. Visual clues unrelated to the text would incur uncertain or even negative effects for multimodal model learning. In this paper, we propose a novel pre-trained multimodal model based on Relationship Inference and Visual Attention (RIVA) for tweets. The RIVA model controls the attention-based visual clues with a gate regarding the role of image to the semantics of text. We use a teacher-student semi-supervised paradigm to leverage a large unlabeled multimodal tweet corpus with a labeled data set for text-image relation classification. In the multimodal NER task, the experimental results show the significance of text-related visual features for the visual-linguistic model and our approach achieves SOTA performance on the MNER datasets.

机译：多峰名实体识别（MNER）对于推文最近受到了越来越关注。大多数多模型方法使用注意力机制来捕获与文本相关的可视信息。但是，无关或弱相关的文本图像对占推特中的很大比例。与文本无关的视觉线条将对多式式模型学习产生不确定甚至负面影响。在本文中，我们提出了一种基于关系推断和视觉关注（RIVA）的新型预训练的多模型模型。 RIVA模型控制基于关注的视觉线索与关于图像的角色到文本的语义的角色。我们使用教师学生半监督范式来利用具有标记为文本图像关系分类的标记数据集的大型未标记的多模式推文语料库。在多模式ner任务中，实验结果表明了与视觉语言模型的文本相关的视觉特征的意义，我们的方法在MNER数据集中实现了SOTA性能。

著录项

来源
《International Conference on Computational Linguistics》|2020年|1852-1862|共11页
会议地点
作者
Lin Sun; Jiquan Wang; Yindu Su; Fangsheng Weng; Yuxuan Sun; Zengwei Zheng; Yuanyi Chen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Relation Prediction in Multilingual Data Based on Multimodal Relational Topic Models [J] . Yosuke SAKATA, Koji EGUCHI IEICE transactions on information and systems . 2017,第4期

机译：基于多模式关系主题模型的多语言数据关系预测
2. Graph-based Multimodal Ranking Models for Multimodal Summarization [J] . Zhu Junnan, Xiang Lu, Zhou Yu, ACM transactions on Asian and low-resource language information processing . 2021,第4期

机译：基于图的多式摘要的多峰排名模型
3. LOW-DOSE AMYLOID PET RECONSTRUCTION USING A PRE-TRAINED, MULTIMODAL DEEP LEARNING NETWORK [J] . Kevin T. Chen, Fabiola Macruz, Enhao Gong, Alzheimer’s & dementia: the journal of the Alzheimer’s Association . 2018,第7期

机译：使用预先培训的多峰深度学习网络进行低剂量淀粉样蛋白宠物重建
4. A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports [C] . Yikuan Li, Hanyin Wang, Yuan Luo IEEE International Conference on Bioinformatics and Biomedicine . 2020

机译：对医学图像和报告进行多式联代表学习的预训练视觉和语言模型的比较
5. Protein-Surface Interactions In Multimodal Chromatography: A Molecular Modeling Based Investigation. [D] . Banerjee, Suvrajit. 2016

机译：多峰色谱中的蛋白质-表面相互作用：基于分子模型的研究。
6. Multimodal Speaker Diarization Using a Pre-Trained Audio-Visual Synchronization Model [O] . Rehan Ahmad, Syed Zubair, Hani Alquhayz, 2019

机译：使用预训练的视听同步模型进行多模态扬声器二分法
7. Relation Prediction in Multilingual Data Based on Multimodal Relational Topic Models [O] . Sakata Yosuke, Eguchi Koji 2017

机译：基于多峰关系主题模型的多语言数据关系预测

RIVA: A Pre-trained Tweet Multimodal Model Based on Text-image Relation for Multimodal NER

摘要

著录项

相似文献

相关主题

期刊订阅